rlfoundations

Markov Decision Processes

States, actions, rewards, and policies — the formal framework for reinforcement learning.

Уровни глубины

L0Intro~1ч

Understands that an agent acts in an environment, receives rewards, and aims to maximise cumulative return.

L1Basics~10ч

Defines MDP formally; solves small MDPs with value iteration and policy iteration.

L2Working~20ч

Applies Bellman equations; understands discount factor, reward shaping, and partial observability (POMDP basics).

L3Advanced~35ч

Analyses convergence of DP methods; derives TD learning; understands multi-agent MDPs and hierarchical RL.

L4Research~70ч

Contributes to RL theory, safe RL, or multi-agent systems.

Ресурсы

L0 — Intro

▶
David Silver RL Course — Lecture 1: Introduction to RL
Silver, Daviden~1ч

L1 — Basics

L2 — Working

▶
David Silver RL Course — Lectures 2-3: Dynamic Programming
Silver, Daviden~3ч

Ведёт к

← Обратно к графу Предложить правку