mathtraining
Optimization
Gradient descent variants, adaptive optimizers, convergence theory — the engine of model training.
Уровни глубины
L0Intro~2ч
Knows that training minimises a loss function by adjusting parameters via gradient descent.
L1Basics~12ч
Implements SGD, momentum, Adam from scratch; understands learning rate schedules.
L2Working~20ч
Diagnoses and fixes common training instabilities; chooses optimisers based on problem type; applies gradient clipping and warm-up.
L3Advanced~40ч
Analyses convergence bounds; applies second-order methods, distributed optimisation, and constrained optimisation.
L4Research~80ч
Develops new optimisation algorithms or convergence proofs for ML settings.
Ресурсы
L1 — Basics
Ведёт к
- ExtendsRegularization
- ExtendsDistributed training
- RelatedCalculus and analysis
- PrerequisiteNeural networksL1→L2
- ExtendsStochastic gradient descent
- ExtendsMomentum and Nesterov methods
- ExtendsAdam, AdamW and adaptive optimizers
- RelatedLearning rate schedules
- ExtendsSecond-order optimization methods
- RelatedConvex optimization