Double Q-Learning

This was invented to avoid Maximization Bias, so we can have an unbiased estimator.

Pseudocode