# Agent-Environment Interface

**Agent**: The learner and decision-maker
**Environment**: The thing the agent interacts with

In a finite MDP, the sets of states, actions, and rewards are finite. So we have well defined discrete probability distributions.

$p$ specifies a probability distribution for each choice of $s$ and $a$, tells us how likely we are to end up in a new state $s_{′}$.

$p$ is decided by the environment, agent cannot modify this. $∑_{s_{′},r}p(s_{′},r∣s,a)=1$

Instead, the agent modifies $π$, the policy.