rlfoundations
Markov Decision Processes
States, actions, rewards, and policies — the formal framework for reinforcement learning.
Уровни глубины
L0Intro~1ч
Understands that an agent acts in an environment, receives rewards, and aims to maximise cumulative return.
L1Basics~10ч
Defines MDP formally; solves small MDPs with value iteration and policy iteration.
L2Working~20ч
Applies Bellman equations; understands discount factor, reward shaping, and partial observability (POMDP basics).
L3Advanced~35ч
Analyses convergence of DP methods; derives TD learning; understands multi-agent MDPs and hierarchical RL.
L4Research~70ч
Contributes to RL theory, safe RL, or multi-agent systems.
Ресурсы
L1 — Basics