Regret

You will regret no matter what you do. The reason is that you are human.

There is this quote that I heard from the School of Life quoting another philosopher called Søren Kierkegaard:

Hang yourself or not hang ourself, you will regret both. That is the essence of human philosophy.

I am afraid of regretting to not live at my fullest potential.

In AI

The regret is the opportunity loss for one step: $regret = optimal reward - Q_{t} I wrote this myself$ Better notation: $l_{t} = E [v_{*} - q (A_{t})]$

The total regret is the total opportunity loss: $L_{t} = E [\sum_{τ = 1}^{t} v_{*} - q (A_{t})]$ where $Q_{t}$ is the average reward up to timestep $t$ .

Game Theory

In Game Theory, the regret of not having chosen an action is the difference between the utility/reward of that action and the action we actually chose, with respect to the fixed choices of other players.

Regret is useful because it helps us understand how well an algorithm would do. With Epsilon-Greedy, we have linear regret (think about why). This is why Epsilon-Greedy is a naive algorithm, and we prefer other algorithms.

Optimistic Greedy has optimistic initialization (we initialize with the highest reward)

From CFR page: Regret is a measure of how much one regrets not having chosen an action. It is the difference between the utility/reward of that action and the action we actually chose, with respect to the fixed choices of other players.

The overall average regret of player $i$ at time $T$ is $R_{i}^{T} = \frac{1}{T} max_{σ_{i}^{*} \in \sum_{i}} \sum_{t = 1}^{T} (u_{i} (σ_{i}^{*}, σ_{- i}^{t}) - u_{i} (σ^{t}))$

where $σ_{i}^{t}$ is the strategy used by player $i$ on round $t$
$u_{i} (σ_{i}^{*}, σ_{- i}^{t})$ is utility of a strategy profile with $σ_{i}^{*}$ and $σ_{- i}^{t}$

I originally learned regret in the context of Reinforcement Learning and Multi-Armed Bandit, t it pops up for Regret Matching to solve Imperfect Information games

🛠️ Steven Gong

Table of Contents

Regret

In AI

Game Theory

Next

Graph View

Backlinks