🛠️ Steven Gong

Search

Aug 26, 2025, 1 min read

Reinforcement Learning from Human Feedback (RLHF)

I dont get how the PPO works:

https://huggingface.co/blog/rlhf

The papers that you need to read:

Deep reinforcement learning from human preferences
Learning to summarize from human feedback
FineTuning Language Models from Human Preferences
Training language models to follow instructions with human feedback

Robotics

From r to Q Your Language Model is Secretly a QFunction

Resources

https://watml.github.io/slides/CS480680_lecture12.pdf

This is where AI Alignment is important, since the human feedback determines what are good and bad instructions.

Ok i don’t wanna keep copy pasting these slides but they are very good for explaining how things work.

Graph View

Backlinks

Supervised Fine-Tuning (SFT)
reward is enough

Created with Quartz, © 2025

Blog
LinkedIn
Twitter
GitHub