Offline RL

Conservative Q-Learning for Offline Reinforcement Learning (CQL)

Introduced to me by Jason Ma.

This is how you do Offline RL without so much bias. Honestly, I’m still quite confused by it.