Greedy in the Limit with Infinite Exploration

Greedy in the Limit with Infinite Exploration (GLIE)

All state-action pairs are explored infinitely many times,

The policy converges on a greedy policy,

I initially undersold how important this is, but this is EXTREMELY important to understand.

We use this GLIE idea for Monte-Carlo Control.