With MCCFR, we avoid traversing the entire game tree on each iteration while still having the immediate counterfactual regrets be unchanged in expectation.
Wait, lol, the CFR I am implementing already is using Monte-Carlo CFR… I just realized.
- No lol
Ways to Sample:
- Chance Sampling (CS)
- selects a single chance node at the root of the tree.
- External Sampling (ES)
- Sample the actions of the opponent and of chance only.
- This means that these samples are based on how likely the opponent’s plays are to occur, which is sensible, since then regret values corresponding to these plays are updated faster.
- Outcome Sampling (OS)
- Samples one action down the whole tree.
Average Strategy Sampling (AS), selects actions for player i according to the cumulative profile and three predefined parameter