🛠️ Steven Gong
Search
Search
Search
Light mode
Dark mode
Aug 25, 2025, 1 min read
Deep reinforcement learning from human preferences
Graph View
Backlinks
Reinforcement Learning from Human Feedback (RLHF)
Training language models to follow instructions with human feedback