Greedy in the Limit with Infinite Exploration (GLIE)

All state-action pairs are explored infinitely many times, $lim_{k→∞}N_{k}(s,a)=∞$

The policy converges on a greedy policy, $lim_{n→∞}π_{k}(a∣s)=1(a=a_{′}∈Aargmax Q_{k}(s,a_{′}))$

I initially undersold how important this is, but this is EXTREMELY important to understand.

We use this GLIE idea for Monte-Carlo Control.