Ibra Niang — Reinforcement Learning

Reinforcement Learning

DiscoRL (DeepMind) — reimplementation in PyTorch and training.
No AI use. Only autocomplete and docs. After Codex/OP review.
Nanomoe
Mira MHC (try different optimizer from NVIDIA) and Pandey MLP — all train runs on MLRun.

World Models by David Ha and saved LeCun Twitter post explanation.
Paper on top of Andy Jones (see RL debugging link above).
Kaparthy last 30M models ideas.
Spinning Up (OpenAI).
OpenAI Five paper · AlphaStar · Learning Dexterity · Emergent Tool Use · Capture the Flag · AlphaGo.
How do you know what lines of work are promising?
OpenAI blog on how AI training scales and scaling laws for single-agent RL.
Look at RL scaling compute and RL scaling discussed from Grok chat.
Most promising: rerun old work with more experiments on faster envs (Puffer and others) to run hundreds per GPU/day.
RL deals with high-performance distributed simulation. Get your hands dirty with async multiprocessing and writing envs from scratch in C.
Skim Sutton book and other.
Opinion guide: read PufferLib docs on writing your own env.
Blog posts are often more accessible than papers. Start there, then read the full papers if doing research.