Regret Matching
We use regret matching to solve Normal-Form Game such as RPS.
Blackwell’s approachability theorem when applied to minimizing regret is known as regret matching.
In regret matching, an agent’s actions are selected at random with a distribution that is proportional to positive regrets.
I implemented regret matching in Python. But at a high level, you mainly implement train
, which is broken down into three steps:
- Get regret-matched mixed-strategy actions (i.e. play a game)
- The strategy you play is based on the regret you have, computed from
regretSum
- The strategy you play is based on the regret you have, computed from
- Compute action utilities (i.e. compute the regrets of those games)
- Accumulate action regrets (accumulate those regrets) into
regretSum
Regret sums are highly erratic. What converges to a minimal regret strategy is the average strategy across all iterations. This is why we use avgStrategy
at the very end when we want the final strategy.