Regret Matching

We use regret matching to solve Normal-Form Game such as RPS.

Blackwell’s approachability theorem when applied to minimizing regret is known as regret matching.

In regret matching, an agent’s actions are selected at random with a distribution that is proportional to positive regrets.

$σ_{i}^{T + 1} (I, a) = ⎩ ⎨ ⎧ \frac{R _{i}^{T, +} ( I , a )}{\sum _{a \in A (I)} R _{i}^{T, +} ( I , a )} \frac{1}{∣ A ( I ) ∣} \sum_{a \in A (I)} R_{i}^{T, +} (I, a) > 0 otherwise$

I implemented regret matching in Python. But at a high level, you mainly implement train, which is broken down into three steps:

Get regret-matched mixed-strategy actions (i.e. play a game)
1. The strategy you play is based on the regret you have, computed from regretSum
Compute action utilities (i.e. compute the regrets of those games)
Accumulate action regrets (accumulate those regrets) into regretSum

def train(iterations: int):
	actionUtility = np.zeros(NUM_ACTIONS)
	 
	for i in range(iterations):
		# 1. Get regret-matched mixed-strategy actions
		strategy = getStrategy()
		myAction = getAction(strategy)
		otherAction = getAction(oppStrategy)
		
		# 2. Compute action utilities
		actionUtility[otherAction] = 0
		actionUtility[(otherAction + 1) % NUM_ACTIONS] = 1
		actionUtility[(otherAction - 1) % NUM_ACTIONS] = -1
		
		# 3. Accumulate action regrets
		for a in range(NUM_ACTIONS):
			regretSum[a] += actionUtility[a] - actionUtility[myAction]

Regret sums are highly erratic. What converges to a minimal regret strategy is the average strategy across all iterations. This is why we use avgStrategy at the very end when we want the final strategy.

Counterfactual Regret Minimization

🛠️ Steven Gong

Table of Contents

Regret Matching

Next

Graph View

Backlinks