Experience Replay

From lilian-weng blog: https://lilianweng.github.io/posts/2018-02-19-rl-overview/#deep-q-network

It’s essentially storing the episode in the dataset, and sampling from it, as opposed to going through it sequentially. Seems analoguous to how we do Stochastic Gradient Descent in practice where we just shuffle a batch before using it as input training.

Experience Replay: All the episode steps $e_{t} = (S_{t}, A_{t}, R_{t}, S_{t + 1})$ are stored in one replay memory $D_{t} = {e_{1}, \dots, e_{t}}$ . $D_{t}$ has experience tuples over many episodes. During Q-learning updates, samples are drawn at random from the replay memory and thus one sample could be used multiple times. Experience replay improves data efficiency, removes correlations in the observation sequences, and smooths over changes in the data distribution

They used this in the DQN paper.

Offline RL

When we talk about Offline RL, we generally talk about learning from a Replay buffer.

🛠️ Steven Gong

Table of Contents

Experience Replay

Offline RL

Graph View

Backlinks