Experience Replay

From lilian-weng blog: https://lilianweng.github.io/posts/2018-02-19-rl-overview/#deep-q-network

It’s essentially storing the episode in the dataset, and sampling from it, as opposed to going through it sequentially. Seems analoguous to how we do Stochastic Gradient Descent in practice where we just shuffle a batch before using it as input training.

Experience Replay: All the episode steps  are stored in one replay memory  has experience tuples over many episodes. During Q-learning updates, samples are drawn at random from the replay memory and thus one sample could be used multiple times. Experience replay improves data efficiency, removes correlations in the observation sequences, and smooths over changes in the data distribution