Scaling RL
This lab maybe?
Read this blog https://www.interconnects.ai/p/scaling-rl-axes
“In generative modeling, cross-entropy loss improves smoothly with model size and training compute, following a power law plus constant scaling law…”
There’s also Seohong Park’s blog that addresses some pretty important points about how to scale offline RL, and how this is an open-ended problem:
Papers: