rlfoundations
Exploration vs exploitation
ε-greedy, entropy bonuses, curiosity, intrinsic motivation — how agents decide when to try something new.
Уровни глубины
L0Intro~1ч
Names ε-greedy as simplest exploration rule.
L1Basics~6ч
Compares ε-greedy / softmax / optimistic-init on a bandit.
L2Working~12ч
Adds entropy regularisation in policy-gradient; uses ICM / RND curiosity.
L3Advanced~25ч
Counts-based exploration, Bayesian exploration, Thompson sampling for deep RL.
L4Research~50ч
Provably-efficient exploration in deep RL, go-explore, meta-exploration.