Learning from Human Videos (LfV)

Broad field, so far focused on mostly locomotion (they get expert demos from mocap). How do we stop relying on mocap?

What I need to understand:

  • How do people close the cross-embodiment gap?
  • How is training done specifically? RL or imitation learning?
  • What is not solved in the field?

https://www.youtube.com/watch?v=RdPftGBhN8c&t=2386s&ab_channel=CMURoboticsInstitute

Survey paper:

Line of work for this:

Skill discovery