optimizerstraining
Momentum and Nesterov methods
Heavy-ball and Nesterov accelerated gradient — smoothing the noisy descent trajectory.
Уровни глубины
L0Intro~1ч
Intuitive "ball rolling down a hill" picture; knows momentum accumulates gradients.
L1Basics~5ч
Writes update rule with momentum β; tunes β≈0.9; compares vanilla SGD vs momentum.
L2Working~10ч
Uses Nesterov variant; understands equivalence to Polyak's heavy ball for quadratics.
L3Advanced~20ч
Derives optimal momentum for quadratic objectives; accelerated methods of Nesterov.
L4Research~50ч
Momentum-based accelerated optimizers for non-convex / stochastic settings.
Ресурсы
L1 — Basics