# Regret Matching

We use regret matching to solve Normal-Form Game such as RPS.

Blackwell’s approachability theorem when applied to minimizing regret is known as regret matching.

In regret matching, **an agent’s actions are selected at random with a distribution that is proportional to positive regrets**.

$σ_{i}(I,a)=⎩⎨⎧ ∑_{a∈A(I)}R_{i}(I,a)R_{i}(I,a) ∣A(I)∣1 ∑_{a∈A(I)}R_{i}(I,a)>0otherwise $

I implemented regret matching in Python. But at a high level, you mainly implement `train`

, which is broken down into three steps:

- Get regret-matched mixed-strategy actions (i.e. play a game)
- The strategy you play is based on the regret you have, computed from
`regretSum`

- The strategy you play is based on the regret you have, computed from
- Compute action utilities (i.e. compute the regrets of those games)
- Accumulate action regrets (accumulate those regrets) into
`regretSum`

Regret sums are highly erratic. What converges to a minimal regret strategy is the average strategy across all iterations. This is why we use `avgStrategy`

at the very end when we want the final strategy.