optimizerstrainingfoundations
Stochastic gradient descent
The foundational optimiser of deep learning — noisy gradient steps that miraculously generalise.
Уровни глубины
L0Intro~2ч
Reads "update = weight − lr · grad" and runs a toy SGD loop.
L1Basics~8ч
Understands mini-batch size / noise tradeoff; picks an lr; knows about lr warmup.
L2Working~12ч
Diagnoses divergence / plateaus; uses weight decay correctly; compares with Adam.
L3Advanced~25ч
Reads SGD convergence bounds; analyses stochastic differential equation view; flatness & generalisation.
L4Research~60ч
Contributes to large-batch scaling, SGD implicit regularisation theory.
Ресурсы
L1 — Basics