This is one of chelsea’s papers that talks about unifying RLHF for robotics.

It cites some papers on how RLHF is done.