optimizerstraining
Adam, AdamW and adaptive optimizers
Per-parameter adaptive learning rates via running moments — the default optimiser for most modern networks.
Уровни глубины
L0Intro~1ч
Knows Adam "just works" for most neural networks; uses torch.optim.AdamW.
L1Basics~6ч
Writes Adam update (m, v, bias correction); tunes β1, β2, ε, lr.
L2Working~12ч
Understands decoupled weight decay (AdamW); compares Adam/RAdam/Lion on a real task.
L3Advanced~20ч
Analyses convergence failures (Reddi et al. AMSGrad); reads Lion, Sophia, Shampoo.
L4Research~50ч
Contributes new adaptive optimizers; second-order approximations for LLMs.
Ресурсы
L1 — Basics