Monte-Carlo Learning Model-Free Control
Monte-Carlo Control
Now, instead of the value function, we have the Q function that we want to determine.
GLIE Monte-Carlo Control
- Sample kth episode using
- For each state and action in the episode,
- Improve policy based on new action-value function
You need to understand the difference between On-Policy Methods and Off-Policy Methods. So far, the exploring starts is a solution to the On-Policy Methods method through Monte-Carlo.
The above updates seem so similar to Multi-Armed Bandit updates!