MountainAI
Войти
rlfoundations

Markov Decision Processes

States, actions, rewards, and policies — the formal framework for reinforcement learning.

Уровни глубины

L0Intro~1ч

Understands that an agent acts in an environment, receives rewards, and aims to maximise cumulative return.

L1Basics~10ч

Defines MDP formally; solves small MDPs with value iteration and policy iteration.

L2Working~20ч

Applies Bellman equations; understands discount factor, reward shaping, and partial observability (POMDP basics).

L3Advanced~35ч

Analyses convergence of DP methods; derives TD learning; understands multi-agent MDPs and hierarchical RL.

L4Research~70ч

Contributes to RL theory, safe RL, or multi-agent systems.

Ресурсы