Guided Policy Search

The main idea behind GPS is to use the expert demonstrations to provide the robot with an initial understanding of how to perform the task and then use reinforcement learning to fine-tune the policy.

GPS is an algorithm that combines supervised learning and reinforcement learning to train policies. It starts by using a supervised learning algorithm to learn an approximate policy from a dataset of expert demonstrations. Then, it uses a guided search algorithm, such as gradient descent, to refine the policy.