rlpolicydeep-rl
Actor-critic methods
A2C, A3C, GAE — combining a policy (actor) with a value baseline (critic) for lower-variance updates.
Уровни глубины
L0Intro~2ч
Understands actor = policy, critic = value function; reduces variance.
L1Basics~10ч
Derives advantage-based gradient; implements A2C on CartPole.
L2Working~20ч
Uses GAE for bias/variance tradeoff; trains A3C / SAC on continuous control.
L3Advanced~30ч
Off-policy actor-critic (DDPG, TD3, SAC); entropy regularisation theory.
L4Research~60ч
Decoupled actor-critic for LLMs / multi-agent; stability analysis.