Dynamic Programming in Reinforcement Learning
Dynamic programming is used to solve many other problems, e.g.
- Scheduling algorithms
- String algorithms (e.g. sequence alignment)
- Graph algorithms (e.g. shortest path algorithms)
- Graphical models (e.g. Viterbi algorithm)
- Bioinformatics (e.g. lattice models)
Dynamic sequential or temporal component to the problem Programming optimising a “program”, i.e. a policy
- c.f. linear programming
A method for solving complex problems
- By breaking them down into subproblems
- Solve the subproblems
- Combine solutions to subproblems
Policy Iteration Value Iteration
Extensions
- In-place dynamic programming
- Prioritized sweeping
- Use a priority queue
- Real-time dynamic programming
We will consider sample backups.