Monte-Carlo CFR

Paper link.

With MCCFR, we avoid traversing the entire game tree on each iteration while still having the immediate counterfactual regrets be unchanged in expectation.

Wait, lol, the CFR I am implementing already is using Monte-Carlo CFR… I just realized.

  • No lol

Ways to Sample:

  • Chance Sampling (CS)
    • selects a single chance node at the root of the tree.
  • External Sampling (ES)
    • Sample the actions of the opponent and of chance only.
    • This means that these samples are based on how likely the opponent’s plays are to occur, which is sensible, since then regret values corresponding to these plays are updated faster.
  • Outcome Sampling (OS)
    • Samples one action down the whole tree.

Average Strategy Sampling (AS), selects actions for player i according to the cumulative profile and three predefined parameter