CUDA Fundamentals For Systems Interviews
A compact review of host/device code, grids, blocks, warps, memory hierarchy, synchronization, and CUDA compilation.
Read MoreTechnical notes, tutorials, and research musings.
A compact review of host/device code, grids, blocks, warps, memory hierarchy, synchronization, and CUDA compilation.
Read MoreReview notes for explaining GPU kernels through thread mapping, memory access, synchronization, and bottleneck hypotheses.
Read MoreA walkthrough of Michael-Scott queues, CAS linearization points, memory ordering, ABA, and hazard-pointer reclamation.
Read MoreThe thread-pool layer around a queue: packaged tasks, futures, stop tokens, condition variables, graceful shutdown, and lifecycle locks.
Read MoreHow bit-packing, M4RI-style table methods, layout experiments, and Fenwick-tree CTMC checks fit into one engineering loop.
Read MoreA systems-oriented refresher on reverse-mode autodiff, micrograd-style scalar graphs, PyTorch vocabulary, and GPU placement.
Read MoreA demonstration of rendering mathematical equations using MathJax.
Read More