rlfoundations
Temporal-difference learning
TD(0), SARSA, n-step TD, eligibility traces — bootstrapped value updates that enabled modern RL.
Уровни глубины
L0Intro~2ч
Explains "learn from guess of a guess"; bootstrapping idea.
L1Basics~10ч
Implements TD(0), SARSA, expected SARSA on gridworld.
L2Working~15ч
Uses n-step returns, eligibility traces (TD(λ)); understands off- vs on-policy.
L3Advanced~25ч
Off-policy TD, importance sampling, emphatic TD; convergence with function approximation.
L4Research~50ч
Gradient-TD, deadly triad, distributional RL.
Ресурсы
L1 — Basics