reward is enough
been thinking about this.
Some limitations:
- Slow convergence in training because of sparse rewards, complexity of environment
- Ability to generalize?
- AI Alignment problem
Humans work because there’s biology involved. Which is why foundation models work?
Other thoughts RLHF