Discount Factor
Used in the Markov Decision Process. Why does this exist?
- Mathematically convenient to discount rewards. Avoids infinite returns
- Uncertainty about the future
- Animal / human behavior
Undiscounted Markov reward processes exist.
Discount factor punishes you for being slow. The lower the discount factor value, the more you are saying that values in the future don’t matter as much as the values in the present
- vs