optimizerstraining
Learning rate schedules
Warmup, cosine decay, one-cycle, LR finder — half the art of neural-network training.
Уровни глубины
L0Intro~1ч
Uses a fixed lr; knows lowering it later often helps.
L1Basics~5ч
Uses step / exponential / cosine decay; runs Smith's LR finder.
L2Working~10ч
Designs warmup + cosine for transformer pretraining; one-cycle for smaller models.
L3Advanced~20ч
Understands scaling laws for lr vs batch-size; stability-edge intuition.
L4Research~40ч
Schedule-free optimizers, learned schedules.
Ресурсы
L1 — Basics
L2 — Working