History and State

Definition

The history is the sequence of observations, actions, rewards

  • i.e. all observable variables up to time
  • i.e. the sensorimotor stream of a robot or embodied agent

State is the information used to determine what happens next Formally, state is a function of the history:

Information State / Markov State

An information state (a.k.a. Markov state) contains all useful information from the history.

  • “The future is independent of the past given the present”

See Markov Decision Process

Environment State

The environment state is the environment’s private representation i.e. whatever data the environment uses to pick the next observation/reward The environment state is not usually visible to the agent Even if is visible, it may contain irrelevant information

Agent State

The agent state is the agent’s internal representation

  • i.e. whatever information the agent uses to pick the next action
  • i.e. it is the information used by reinforcement learning algorithms It can be any function of history:

What we think happens next really depends on our representation on state. Our job is to build a state that is useful.

Fully Observable Environments

This is the best case.

Full observability: agent directly observes environment state

Partially Observable Environments

These are more difficult scenarios. Partial observability: agent indirectly observes environment:

  • A robot with camera vision isn’t told its absolute location
  • A trading agent only observes current prices
  • A poker playing agent only observes public cards

In this case, agent state environment state