🛠️ Steven Gong

Search

Monte-Carlo Control
GLIE Monte-Carlo Control

Jun 22, 2023, 1 min read

Monte-Carlo Learning Model-Free Control

Monte-Carlo Control

Now, instead of the value function, we have the Q function that we want to determine.

GLIE Monte-Carlo Control

Sample kth episode using $π : {S_{1}, A_{1}, R_{2}, ..., S_{T}} \sim π$
For each state $S_{t}$ and action $A_{t}$ in the episode, $N (S_{t}, A_{t}) \leftarrow N (S_{t}, A_{t}) + 1$ $Q (S_{t}, A_{t}) \leftarrow Q (S_{t}, A_{t}) + \frac{1}{N ( S _{t} , A _{t} )} (G_{t} - Q (S_{t}, A_{t}))$
Improve policy based on new action-value function $ϵ \leftarrow 1/ k$
$π \leftarrow ϵ - g ree d y (Q)$

You need to understand the difference between On-Policy Methods and Off-Policy Methods. So far, the exploring starts is a solution to the On-Policy Methods method through Monte-Carlo.

The above updates seem so similar to Multi-Armed Bandit updates!

Graph View

Backlinks

Blackjack
Generalized Policy Iteration (GPI)
Greedy in the Limit of Infinite Exploration (GLIE)
Model-Free Control
Monte-Carlo Learning
Reinforcement Learning (RL)
Sarsa

Created with Quartz, © 2025

Blog
LinkedIn
Twitter
GitHub