Partially Observable Markov Decision Process

A Partially Observable Markov Decision Process is an MDP with hidden states. It is a hidden Markov model with actions.

i.e. When the environment is only partially observable by the agent, ex: game of poker.

Highlighted in red is what we add to the original MDP.

Definition

A POMDP is a tuple $⟨ S, A, O, P, R, Z, γ ⟩$

$S$ is a finite set of states

$A$ is a finite set of actions

$O$ is a finite set of observations

$P$ is a state transition probability matrix

$P_{s s^{'}}^{a} = P [S_{t + 1} = s^{'} ∣ S_{t} = s, A_{t} = a]$

$R$ is a reward function, $R_{s}^{a} = E [R_{t + 1} ∣ S_{t} = s, A_{t} = a]$

$Z$ is an observation function

$Z_{s^{'} o}^{a} = P [O_{t + 1} = o ∣ S_{t + 1} = s^{'}, A_{t} = a]$

$γ$ is a discount factor, $γ \in [0, 1]$

🛠️ Steven Gong

Partially Observable Markov Decision Process

Graph View

Backlinks