Efficient Online Reinforcement Learning with Offline Data (RLPD)
It’s actually a very simple idea, just sample equally from offline dataset and online dataset. However, there are also some implementation details to get this right.
UTD Ratio: However can result in statistical over-fitting
A great way to visualize this overfitting