🛠️ Steven Gong

Search

Learning from Human Videos (LfV)
Related

Aug 28, 2025, 1 min read

Learning from Human Videos (LfV)

Broad field, so far focused on mostly locomotion (they get expert demos from mocap). How do we stop relying on mocap?

What I need to understand:

How do people close the cross-embodiment gap?
How is training done specifically? RL or imitation learning?
What is not solved in the field?

https://www.youtube.com/watch?v=RdPftGBhN8c&t=2386s&ab_channel=CMURoboticsInstitute

Survey paper:

Towards Generalist Robot Learning from Internet Video A Survey

Line of work for this:

SFV Reinforcement Learning of Physical Skills from Videos (uses motion tracking + DeepMimic)
Learning Physically Simulated Tennis Skills from Broadcast Videos
VideoMimic
BeyondMimic

Skill discovery

XSkill Cross Embodiment Skill Discovery

Related

Locomotion

Graph View

Backlinks

Robot Learning Hackathon

Created with Quartz, © 2025

Blog
LinkedIn
Twitter
GitHub