Monte-Carlo Learning Model-Free Control

Monte-Carlo Control

Now, instead of the value function, we have the Q function that we want to determine.

GLIE Monte-Carlo Control

  • Sample kth episode using
  • For each state and action in the episode,
  • Improve policy based on new action-value function

You need to understand the difference between On-Policy Methods and Off-Policy Methods. So far, the exploring starts is a solution to the On-Policy Methods method through Monte-Carlo.

The above updates seem so similar to Multi-Armed Bandit updates!