CS287: Advanced Robotics
Taught by Pieter Abbeel.
I really liked one of the philosophies Pieter Abbeel talked about for Building Foundation:
- Just like when you have a derivative, you don’t need to look it up, you just know it from the definition of a limit for example, he wants to build this foundation in advanced robotics. So for the midterm, it is just a set of 20 questions that is given beforehand.
- The idea is not to memorize all of these long formulas, but rather logically reason through how this formula, such as Policy Gradient Methods, came to be
I am gonna put some of these on hault, come back to concepts when I am actually going to be using these. Since I have a high level idea of what each of these ideas talk about.
- Mainly focusing on learning for F1TENTH
Notes from CS287
Study these equations by heart:
Concepts
Lecture 2
- MDP
- Value Iteration
- Contraction → still confused about this
- Policy Iteration
- Linear Programming
- Maximum Entropy
- Constrained Optimization
Lecture 3
- Exact solution methods
- Discretization
- Kuhn Triangulation
- Cross-Entropy Method
- Courant–Friedrichs–Lewy Condition
Lecture 4
- Value Function Approximation
- Value Iteration with Value Function Approximation
- Policy Iteration with Value Function Approximation
This seems like a different way to resolve function approximation.
Lecture 5
Starting at this part of the course, we are looking more at trying to make sense of our sensor data. Because there are lots of noise in the real world.
Lecture 6
Lecture 7
Lecture 8
Lecture 10
Lecture 11 Be careful, make sure all your sensor readings are all actually really independent.
Lecture 12
- Multivariate Guassian
- Kalman Filter
- Extended Kalman Filter
- Unscented Kalman Filter
Lecture 13
- Kalman Smoother (Smoothing)
- MAP Estimation
- Likelihood
- Maximum Likelihood
- Beta Distribution
- Dirichlet Distribution
Lecture 14
Lecture 15
- Expectation Maximization
- POMDP (starts 37:35)
Lecture 18 In RL, it is still an MDP, but that is not given to us. ahh I see
Two main branches in RL Landscape:
- Dynamic Programming → exploits the Bellman Optimality Backup
- Policy Optimization
At the middle of both is Actor-Critic methods.
model-based rl learns the dynamics model.
Policy Optimization has the following objective:
Learnings from doing homework
Homework 1:
- You realize that in the real world, you run into cases where you have
inf
andnan
values. You need to know how to debug those.
I’m really struggling with how this entropy is implemented.
Thoughts
Lecture 1:
- The main bottleneck in robotics now is no longer hardware, it is software!!
You only really need to break down robotics to 3 core techniques, and then you can pretty much solve any problem (YouTube this)
- Optimization
- Probabilisitic Reasoning
- Learning
It’s really exciting to take this course because it is going to unlock so much potential in me.
And robots are the main self-driving cars.
why not be more Redundancy
Ohh they talk about robustness, because the optimal policy generated might not work, but the MDP is not a good model of the world. So you do it over a distribution