# Regret Matching

We use regret matching to solve Normal-Form Game such as RPS.

Blackwell’s approachability theorem when applied to minimizing regret is known as regret matching.

In regret matching, **an agent’s actions are selected at random with a distribution that is proportional to positive regrets**.

$σ_{i}(I,a)=⎩⎨⎧ ∑_{a∈A(I)}R_{i}(I,a)R_{i}(I,a) ∣A(I)∣1 ∑_{a∈A(I)}R_{i}(I,a)>0otherwise $

I implemented regret matching in Python. But at a high level, you mainly implement `train`

, which is broken down into three steps:

- Get regret-matched mixed-strategy actions (i.e. play a game)
- The strategy you play is based on the regret you have, computed from
`regretSum`

- The strategy you play is based on the regret you have, computed from
- Compute action utilities (i.e. compute the regrets of those games)
- Accumulate action regrets (accumulate those regrets) into
`regretSum`

```
def train(iterations: int):
actionUtility = np.zeros(NUM_ACTIONS)
for i in range(iterations):
# 1. Get regret-matched mixed-strategy actions
strategy = getStrategy()
myAction = getAction(strategy)
otherAction = getAction(oppStrategy)
# 2. Compute action utilities
actionUtility[otherAction] = 0
actionUtility[(otherAction + 1) % NUM_ACTIONS] = 1
actionUtility[(otherAction - 1) % NUM_ACTIONS] = -1
# 3. Accumulate action regrets
for a in range(NUM_ACTIONS):
regretSum[a] += actionUtility[a] - actionUtility[myAction]
```

Regret sums are highly erratic. What converges to a minimal regret strategy is the average strategy across all iterations. This is why we use `avgStrategy`

at the very end when we want the final strategy.