Stochastic/Markov Game

Stochastic Games generalize MDPs to multiple interacting decision-makers.

Formalization

A Markov Game is a tuple $⟨ S, A, P, R, γ ⟩$

$S = S_{1} \times \dots \times S_{N}$ where $S_{i}$ is a finite set of states for player $i$

$A = A_{1} \times \dots \times A_{N}$ where $A_{i}$ is a finite set of actions for player $i$

$P : S \times A \to S$ is a state transition probability matrix

$R = R_{1} \times \dots \times R_{N}$ is a reward function, where $R_{i} : S \times A \times S \to R$

$γ$ is a discount factor, $γ \in [0, 1]$

Value Function $V_{π^{i}, π^{- i}}^{i} (s) := E [\sum_{t \geq 0} γ^{t} R^{i} (s_{t}, a_{t}, s_{t + 1}) ∣ a_{t}^{i} \sim π_{i} (\cdot ∣ s_{t}), s_{0} = s]$

Nash Equilibrium A Nash Equilibrium of the Markov game $(N, S, {A^{i}}_{i \in N}, P, {R^{i}}_{i \in N}, γ)$ is a joint policy $π^{*} = (π^{1, *}, \dots, π^{N, *})$ , such that for any $s \in S$ and $i \in N$ $V_{π^{i, *}, π^{- i, *}}^{i} (s) \geq V_{π^{i}, π^{- i, *}}^{i} (s), for any π_{i}$

where $- i$ represents the indices of all agents in N except agent $i$ .

Resources

Turn-based markov game formalization: https://arxiv.org/pdf/2002.10620.pdf

🛠️ Steven Gong

Stochastic/Markov Game

Graph View

Backlinks