AlphaGo

Why do you have both a policy network and a value network?

Policy network helps give you an idea of the distribution of great moves.

Value network helps you assess how good your current position is.

In alphago, they actually use MC to learn the value network.

Important details:

  • “The naive approach of predicting game outcomes from data consisting of complete games leads to overfitting.”
    • This is because many games are highly correlated. So they make sure that each data point is from a different game