Agent-Environment Interface

Screen Shot 2021-12-11 at 1.40.24 PM.png Agent: The learner and decision-maker Environment: The thing the agent interacts with

Screen Shot 2021-12-08 at 2.14.25 PM.png

In a finite MDP, the sets of states, actions, and rewards are finite. So we have well defined discrete probability distributions.

specifies a probability distribution for each choice of and , tells us how likely we are to end up in a new state .

is decided by the environment, agent cannot modify this.

Instead, the agent modifies , the policy.