rlfoundationsmath
Bellman equations and dynamic programming
The recursive value relations at the heart of reinforcement learning — value iteration and policy iteration.
Уровни глубины
L0Intro~2ч
Reads V(s) = immediate reward + γ × future value.
L1Basics~10ч
Derives Bellman expectation / optimality equations; runs value iteration on small MDP.
L2Working~15ч
Implements policy iteration with exact evaluation; analyses convergence via contraction.
L3Advanced~25ч
Generalised policy iteration; asynchronous DP; approximate DP.
L4Research~50ч
Linear programming formulations; contraction analysis in infinite-state MDPs.
Ресурсы
L1 — Basics
L2 — Working