Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

Might have some good insights into why things don’t work. In the lines of Horizon Reduction Makes RL Scalable.

“We propose Local SIgnal MiXing, or LIX, a new layer specifically designed to prevent catastrophic self-overfitting in convolutional reinforcement learning architecture”

This is a good plot to have

You can clearly see the variance of the q-values is much lower when we apply image augmentation

🛠️ Steven Gong

Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

Graph View

Backlinks