# Partially Observable Markov Decision Process

A Partially Observable Markov Decision Process is an MDP with hidden states. It is a hidden Markov model with actions.

- i.e. When the environment is only partially observable by the agent, ex: game of poker.

Highlighted in red is what we add to the original MDP.

Definition

A

POMDPis a tuple $⟨S,A,O,P,R,Z,γ⟩$

- $S$ is a finite set of states
- $A$ is a finite set of actions
- $O$ is a finite set of observations
- $P$ is a state transition probability matrix
- $P_{ss_{′}}=P[S_{t+1}=s_{′}∣S_{t}=s,A_{t}=a]$
- $R$ is a reward function, $R_{s}=E[R_{t+1}∣S_{t}=s,A_{t}=a]$
- $Z$ is an observation function
- $Z_{s_{′}o}=P[O_{t+1}=o∣S_{t+1}=s_{′},A_{t}=a]$
- $γ$ is a discount factor, $γ∈[0,1]$