Agent-Environment Interface

Screen Shot 2021-12-11 at 1.40.24 PM.png Agent: The learner and decision-maker Environment: The thing the agent interacts with

Screen Shot 2021-12-08 at 2.14.25 PM.png

In a finite MDP, the sets of states, actions, and rewards are finite. So we have well defined discrete probability distributions.

$p$ specifies a probability distribution for each choice of $s$ and $a$ , tells us how likely we are to end up in a new state $s^{'}$ .

$p$ is decided by the environment, agent cannot modify this. $\sum_{s^{'}, r} p (s^{'}, r ∣ s, a) = 1$

Instead, the agent modifies $π$ , the policy.

🛠️ Steven Gong