Search
Feb 11, 2026, 1 min read
Wow so all of RL is really just weighted behavior cloning in some form.