Scaling RL

This lab maybe?

Read this blog https://www.interconnects.ai/p/scaling-rl-axes

“In generative modeling, cross-entropy loss improves smoothly with model size and training compute, following a power law plus constant scaling law…”

There’s also Seohong Park’s blog that addresses some pretty important points about how to scale offline RL, and how this is an open-ended problem:

Papers: