systemstraining
Distributed training
Data parallelism, tensor parallelism, pipeline parallelism — training large models across many GPUs.
Уровни глубины
L0Intro~1ч
Knows that large models need multiple GPUs and that different parallelism strategies exist.
L1Basics~12ч
Runs DDP training with PyTorch; understands data parallelism, gradient all-reduce, and synchronisation.
L2Working~25ч
Applies ZeRO (DeepSpeed), gradient checkpointing, mixed precision at scale; debugs communication bottlenecks.
L3Advanced~40ч
Designs 3D parallelism strategies (tensor + pipeline + data); profiles inter-node communication; uses NCCL tuning.
L4Research~80ч
Contributes to communication-efficient distributed optimisation, elastic training, or disaggregated compute.