Research Ideas
Ideas are cheap. It’s all about who can execute the fastest. Below are a list of ideas that I will execute on for robot learning over the next year.
How to achieve AGI in robotics, what’s missing?
What’s missing is the ability for robots to self-improve over time robustly.
- Reinforcement learning (model-free generally) →
- “How good are these actions?”
- teaching robots to learn on their own
- Q-learning
- Do we see policy gradient methods for manipulation? I have not. Why not?
- Q-Chunking
- World models
- “Where am i going to end up when i take this action” Perhaps this is called reasoning? or search? This is needed for Robustness (but don’t care as much about safety, more like recovering from failure states)
- Learning the dynamics of the world
- Dream to Control Learning Behaviors by Latent Imagination
- How do we bake a world model into our policy?
Policy No need for proprio-history? We need propriohistory somewhere
W(s,s’) = a W(s,a) = s’
Perhaps we need a Q world model? s,a → s’, Q
There’s also Goal-Conditioned RL.
1. first way
Three stage
- Policy sampling
Query the world model to get .
Query the Q function conditioned of all three . But isn’t that just V(s’).
the problems of multi-stage approaches
Slow, inefficient, hard to debug. Really it should be end-to-end?
There’s Past Token Prediction.
2. Second way
Bake the world model into policy
? Doesn’t this just collapse down to V(s’)? Yes
So what if we had both and computed?
- There’s inaccuracy in Q(s,a)
- modeling error from
- There’s inaccuracy in V(s’)
- modeling error from
- There’s inaccuracy in s’ computed
- modelling error from , like s’ might actually not be reachable
How do we teach the model to predict better frames?