Search
Apr 10, 2026, 1 min read
Wow so all of RL is really just weighted behavior cloning in some form.