rl
Q-learning and DQN
Value-based reinforcement learning — Q-tables, temporal difference updates, and deep Q-networks.
Уровни глубины
L0Intro~0ч
Knows Q-learning estimates action values; has heard of DQN playing Atari.
L1Basics~10ч
Implements tabular Q-learning on GridWorld; understands epsilon-greedy exploration and TD updates.
L2Working~25ч
Builds DQN with experience replay and target networks; applies Double DQN, Dueling DQN, Prioritised Replay.
L3Advanced~35ч
Understands distributional RL (C51), Rainbow; analyses overestimation bias; implements multi-step returns.
L4Research~70ч
Contributes to offline RL, model-based RL, or sample-efficient value-based methods.
Ресурсы
L2 — Working