MountainAI
Войти
rlfoundations

Exploration vs exploitation

ε-greedy, entropy bonuses, curiosity, intrinsic motivation — how agents decide when to try something new.

Уровни глубины

L0Intro~1ч

Names ε-greedy as simplest exploration rule.

L1Basics~6ч

Compares ε-greedy / softmax / optimistic-init on a bandit.

L2Working~12ч

Adds entropy regularisation in policy-gradient; uses ICM / RND curiosity.

L3Advanced~25ч

Counts-based exploration, Bayesian exploration, Thompson sampling for deep RL.

L4Research~50ч

Provably-efficient exploration in deep RL, go-explore, meta-exploration.

Ресурсы