# CS287: Advanced Robotics

Taught by Pieter Abbeel.

I really liked one of the philosophies Pieter Abbeel talked about for Building Foundation:

- Just like when you have a derivative, you don’t need to look it up, you just know it from the definition of a limit for example, he wants to build this foundation in advanced robotics. So for the midterm, it is just a set of 20 questions that is given beforehand.
- The idea is not to memorize all of these long formulas, but rather logically reason through how this formula, such as Policy Gradient Methods, came to be

I am gonna put some of these on hault, come back to concepts when I am actually going to be using these. Since I have a high level idea of what each of these ideas talk about.

- Mainly focusing on learning for F1TENTH

### Notes from CS287

Study these equations by heart:

### Concepts

Lecture 2

- MDP
- Value Iteration
- Contraction → still confused about this
- Policy Iteration
- Linear Programming
- Maximum Entropy
- Constrained Optimization

Lecture 3

- Exact solution methods
- Discretization
- Kuhn Triangulation
- Cross-Entropy Method
- Courant–Friedrichs–Lewy Condition

Lecture 4

- Value Function Approximation
- Value Iteration with Value Function Approximation
- Policy Iteration with Value Function Approximation

This seems like a different way to resolve function approximation.

Lecture 5

Starting at this part of the course, we are looking more at trying to make sense of our sensor data. Because there are lots of noise in the real world.

Lecture 6

Lecture 7

Lecture 8

Lecture 10

Lecture 11 Be careful, make sure all your sensor readings are all actually really independent.

Lecture 12

- Multivariate Guassian
- Kalman Filter
- Extended Kalman Filter
- Unscented Kalman Filter

Lecture 13

- Kalman Smoother (Smoothing)
- MAP Estimation
- Likelihood
- Maximum Likelihood
- Beta Distribution
- Dirichlet Distribution

Lecture 14

Lecture 15

- Expectation Maximization
- POMDP (starts 37:35)

Lecture 18 In RL, it is still an MDP, but that is not given to us. ahh I see

Two main branches in RL Landscape:

- Dynamic Programming → exploits the Bellman Optimality Backup
- Policy Optimization

At the middle of both is Actor-Critic methods.

model-based rl learns the dynamics model.

Policy Optimization has the following objective: $max_{θ}E[∑_{t=0}R(s_{t})∣π_{θ}]$

### Learnings from doing homework

Homework 1:

- You realize that in the real world, you run into cases where you have
`inf`

and`nan`

values. You need to know how to debug those.

I’m really struggling with how this entropy is implemented.

### Thoughts

Lecture 1:

- The main bottleneck in robotics now is no longer hardware, it is software!!

You only really need to break down robotics to 3 core techniques, and then you can pretty much solve any problem (YouTube this)

- Optimization
- Probabilisitic Reasoning
- Learning

It’s really exciting to take this course because it is going to unlock so much potential in me.

And robots are the main self-driving cars.

why not be more Redundancy

Ohh they talk about robustness, because the optimal policy generated might not work, but the MDP is not a good model of the world. So you do it over a distribution