systemshardware
CUDA basics
GPU architecture, memory hierarchy, kernels — understanding hardware to optimise deep learning code.
Уровни глубины
L0Intro~0ч
Knows GPUs have thousands of parallel cores and that `.to('cuda')` moves tensors there.
L1Basics~10ч
Understands CUDA thread hierarchy (block/grid), shared memory, and basic memory transfer patterns.
L2Working~25ч
Writes and profiles simple CUDA kernels in C++; uses CUDA profiler (nsight/nvprof); reduces memory bandwidth bottlenecks.
L3Advanced~40ч
Implements optimised GEMM/attention kernels; uses Triton for custom operations; analyses roofline model.
L4Research~80ч
Contributes to ML compilers, sparse kernel libraries, or hardware-aware training research.
Ресурсы
L2 — Working
L3 — Advanced