CS287: Advanced Robotics

Taught by Pieter Abbeel.

I really liked one of the philosophies Pieter Abbeel talked about for Building Foundation:

  • Just like when you have a derivative, you don’t need to look it up, you just know it from the definition of a limit for example, he wants to build this foundation in advanced robotics. So for the midterm, it is just a set of 20 questions that is given beforehand.
  • The idea is not to memorize all of these long formulas, but rather logically reason through how this formula, such as Policy Gradient Methods, came to be

I am gonna put some of these on hault, come back to concepts when I am actually going to be using these. Since I have a high level idea of what each of these ideas talk about.

  • Mainly focusing on learning for F1TENTH

Notes from CS287

Study these equations by heart:


Lecture 2

Lecture 3

Lecture 4

This seems like a different way to resolve function approximation.

Lecture 5

Starting at this part of the course, we are looking more at trying to make sense of our sensor data. Because there are lots of noise in the real world.

Lecture 6

Lecture 7

Lecture 8

Lecture 10

Lecture 11 Be careful, make sure all your sensor readings are all actually really independent.

Lecture 12

Lecture 13

Lecture 14

Lecture 15

Lecture 18 In RL, it is still an MDP, but that is not given to us. ahh I see

Two main branches in RL Landscape:

  1. Dynamic Programming -> exploits the Bellman Optimality Backup
  2. Policy Optimization

At the middle of both is Actor-Critic methods.

model-based rl learns the dynamics model.

Policy Optimization has the following objective:

Learnings from doing homework

Homework 1:

  • You realize that in the real world, you run into cases where you have inf and nan values. You need to know how to debug those.

I’m really struggling with how this entropy is implemented.


Lecture 1:

  • The main bottleneck in robotics now is no longer hardware, it is software!!

You only really need to break down robotics to 3 core techniques, and then you can pretty much solve any problem (YouTube this)

It’s really exciting to take this course because it is going to unlock so much potential in me.

And robots are the main self-driving cars.

why not be more Redundancy

Ohh they talk about robustness, because the optimal policy generated might not work, but the MDP is not a good model of the world. So you do it over a distribution