Monte-Carlo Tree Search

A heuristic search algorithm for some kinds of decision processes.

Insight: We want here a universally applicable evaluation function we could leverage for all games, instead of a hand-engineered Evaluation Function. What can we leverage? Randomness

Introductory video: https://www.youtube.com/watch?v=Fbs4lnGLS8M

The core idea of MCTS is to successively focus multiple simulations starting at the current state by extending the initial portions of trajectories that have received high evaluations from earlier simulations.

There are 4 steps:

  1. Selection
  2. Expansion
  3. Simulation
  4. Backpropagation

In model-based RL: Once you have a known model, you can plan value function/policy from model In model-free RL: No model, learn value function/policy from experience

Minimax vs. MCTS?

“While Minimax combined with Alpha-Beta pruning is a solid solution to approach games where an evaluation function to estimate the game outcome can easily be defined, Monte Carlo Tree Search (MCTS) is a universally applicable solution given that no evaluation function is necessary due to its reliance on randomness.” Source

MCTS is more domain agnostic, since you don’t need to define a Heurisitic Function unlike in minimax.

Resources