CS287: Advanced Robotics
Taught by Pieter Abbeel.
- Just like when you have a derivative, you don’t need to look it up, you just know it from the definition of a limit for example, he wants to build this foundation in advanced robotics. So for the midterm, it is just a set of 20 questions that is given beforehand.
- The idea is not to memorize all of these long formulas, but rather logically reason through how this formula, such as Policy Gradient Methods, came to be
I am gonna put some of these on hault, come back to concepts when I am actually going to be using these. Since I have a high level idea of what each of these ideas talk about.
- Mainly focusing on learning for F1TENTH
Notes from CS287
Study these equations by heart:
- Value Iteration
- Contraction -> still confused about this
- Policy Iteration
- Linear Programming
- Maximum Entropy
- Constrained Optimization
- Exact solution methods
- Kuhn Triangulation
- Cross-Entropy Method
- Courant–Friedrichs–Lewy Condition
- Value Function Approximation
- Value Iteration with Value Function Approximation
- Policy Iteration with Value Function Approximation
This seems like a different way to resolve function approximation.
Starting at this part of the course, we are looking more at trying to make sense of our sensor data. Because there are lots of noise in the real world.
Lecture 11 Be careful, make sure all your sensor readings are all actually really independent.
- Kalman Smoother (Smoothing)
- MAP Estimation
- Maximum Likelihood
- Beta Distribution
- Dirichlet Distribution
Two main branches in RL Landscape:
At the middle of both is Actor-Critic methods.
model-based rl learns the dynamics model.
Policy Optimization has the following objective:
Learnings from doing homework
- You realize that in the real world, you run into cases where you have
nanvalues. You need to know how to debug those.
I’m really struggling with how this entropy is implemented.
- The main bottleneck in robotics now is no longer hardware, it is software!!
You only really need to break down robotics to 3 core techniques, and then you can pretty much solve any problem (YouTube this)
- Probabilisitic Reasoning
It’s really exciting to take this course because it is going to unlock so much potential in me.
And robots are the main self-driving cars.
why not be more Redundancy
Ohh they talk about robustness, because the optimal policy generated might not work, but the MDP is not a good model of the world. So you do it over a distribution