rlfoundations

Temporal-difference learning

TD(0), SARSA, n-step TD, eligibility traces — bootstrapped value updates that enabled modern RL.

Уровни глубины

L0Intro~2ч

Explains "learn from guess of a guess"; bootstrapping idea.

L1Basics~10ч

Implements TD(0), SARSA, expected SARSA on gridworld.

L2Working~15ч

Uses n-step returns, eligibility traces (TD(λ)); understands off- vs on-policy.

L3Advanced~25ч

Off-policy TD, importance sampling, emphatic TD; convergence with function approximation.

L4Research~50ч

Gradient-TD, deadly triad, distributional RL.

Ресурсы

L1 — Basics

L2 — Working

📄
Learning to Predict by the Methods of Temporal Differences
Sutton, Richard S.en~3ч

Ведёт к

Требует знания

← Обратно к графу Предложить правку