Regret Matching

We use regret matching to solve Normal-Form Game such as RPS.

Blackwell’s approachability theorem when applied to minimizing regret is known as regret matching.

In regret matching, an agent’s actions are selected at random with a distribution that is proportional to positive regrets.

I implemented regret matching in Python. But at a high level, you mainly implement train, which is broken down into three steps:

  1. Get regret-matched mixed-strategy actions (i.e. play a game)
    1. The strategy you play is based on the regret you have, computed from regretSum
  2. Compute action utilities (i.e. compute the regrets of those games)
  3. Accumulate action regrets (accumulate those regrets) into regretSum
def train(iterations: int):
	actionUtility = np.zeros(NUM_ACTIONS)
	 
	for i in range(iterations):
		# 1. Get regret-matched mixed-strategy actions
		strategy = getStrategy()
		myAction = getAction(strategy)
		otherAction = getAction(oppStrategy)
		
		# 2. Compute action utilities
		actionUtility[otherAction] = 0
		actionUtility[(otherAction + 1) % NUM_ACTIONS] = 1
		actionUtility[(otherAction - 1) % NUM_ACTIONS] = -1
		
		# 3. Accumulate action regrets
		for a in range(NUM_ACTIONS):
			regretSum[a] += actionUtility[a] - actionUtility[myAction]

Regret sums are highly erratic. What converges to a minimal regret strategy is the average strategy across all iterations. This is why we use avgStrategy at the very end when we want the final strategy.

Next