Prioritized Level Replay

Introduced to this concept from the Maestro paper.

Prioritized Level Replay = Experience Replay, but experiences are replayed with a priority that depends on their importance

In contrast, PLR prioritizes important experiences, such as those that led to high rewards or those that the agent had difficulty with, by replaying them more frequently than less important experiences. This means that important experiences are learned from more effectively and can help the agent learn faster and more efficiently.

It’s a form of Autocurricula.

PLR adapts the curriculum or sequence of training levels based on the agent’s performance and estimated learning potential. By prioritizing levels with higher estimated learning potential, PLR encourages the agent to learn gradually more complex skills and strategies over time. This can be seen as a form of self-generated curriculum, where the agent learns to select and focus on the most informative training examples based on its own needs and abilities. This approach has been shown to significantly improve sample efficiency and generalization in deep reinforcement learning, and is a promising direction for future research in this field.