Reward Engineering

reward is enough

been thinking about this.

Some limitations:

  • Slow convergence in training because of sparse rewards, complexity of environment
  • Ability to generalize?
  • AI Alignment problem

Humans work because there’s biology involved. Which is why foundation models work?

Other thoughts RLHF