optimizerstrainingfoundations

Stochastic gradient descent

The foundational optimiser of deep learning — noisy gradient steps that miraculously generalise.

Уровни глубины

L0Intro~2ч

Reads "update = weight − lr · grad" and runs a toy SGD loop.

L1Basics~8ч

Understands mini-batch size / noise tradeoff; picks an lr; knows about lr warmup.

L2Working~12ч

Diagnoses divergence / plateaus; uses weight decay correctly; compares with Adam.

L3Advanced~25ч

Reads SGD convergence bounds; analyses stochastic differential equation view; flatness & generalisation.

L4Research~60ч

Contributes to large-batch scaling, SGD implicit regularisation theory.

L1 — Basics

L2 — Working