Hindsight Experience Replay (HER)
This is a pretty fundamental paper with a pretty basic idea. Jason Ma explained it to me and ian.
The agent relabels trajectories with achieved states as pseudo-goals, turning failures into successes for training purposes. This provides learning signals even from trajectories that didn’t reach the original goal.
Goal-Conditioned RL essentially builds upon this idea.