Experience Replay
From lilian-weng blog: https://lilianweng.github.io/posts/2018-02-19-rl-overview/#deep-q-network
It’s essentially storing the episode in the dataset, and sampling from it, as opposed to going through it sequentially. Seems analoguous to how we do Stochastic Gradient Descent in practice where we just shuffle a batch before using it as input training.
Experience Replay: All the episode steps are stored in one replay memory . has experience tuples over many episodes. During Q-learning updates, samples are drawn at random from the replay memory and thus one sample could be used multiple times. Experience replay improves data efficiency, removes correlations in the observation sequences, and smooths over changes in the data distribution