Research Ideas

Ideas are cheap. It’s all about who can execute the fastest. Below are a list of ideas that I will execute on for robot learning over the next year.

How to achieve AGI in robotics, what’s missing?

What’s missing is the ability for robots to self-improve over time robustly.

  1. Reinforcement learning (model-free generally)
  • “How good are these actions?”
  • teaching robots to learn on their own
  • Q-learning
    • Do we see policy gradient methods for manipulation? I have not. Why not?
  • Q-Chunking
  1. World models
  • “Where am i going to end up when i take this action” Perhaps this is called reasoning? or search? This is needed for Robustness (but don’t care as much about safety, more like recovering from failure states)
  • Learning the dynamics of the world
  • Dream to Control Learning Behaviors by Latent Imagination
  • How do we bake a world model into our policy?

Policy No need for proprio-history? We need propriohistory somewhere

Dreamer

W(s,s’) = a W(s,a) = s’

Perhaps we need a Q world model? s,a s’, Q

There’s also Goal-Conditioned RL.

1. first way

Three stage

  1. Policy sampling

Query the world model to get .

Query the Q function conditioned of all three . But isn’t that just V(s’).

the problems of multi-stage approaches

Slow, inefficient, hard to debug. Really it should be end-to-end?

There’s Past Token Prediction.

2. Second way

Bake the world model into policy

? Doesn’t this just collapse down to V(s’)? Yes

So what if we had both and computed?

  • There’s inaccuracy in Q(s,a)
    • modeling error from
  • There’s inaccuracy in V(s’)
    • modeling error from
  • There’s inaccuracy in s’ computed
    • modelling error from , like s’ might actually not be reachable

How do we teach the model to predict better frames?