🛠️ Steven Gong

Search

Robot Learning Reading Group
About
Week ? - Symbolic Planning
Week ? - Tactile Sensing
Week ? - Out of distribution
Week ? - Cross-Embodiment
Week ? - Egocentric Papers / Learning from human videos
Week 6 - CoRL Shiza top paper
Week 5 - Data Collection
Week 4 - Swarm Robotics
Week 3 - Real2Sim + Sim2Real / Whole-Body Control
Week 2 - World Models
Topic - World Models
Week 1 - Robot Foundation Models

Oct 14, 2025, 8 min read

Robot Learning Reading Group

Starting this robot learning reading group with New Systems, and potentially going to host more after.

Similar events:

https://ut-robotlearning.github.io/
https://www.youtube.com/@AustinRobotics
- Presentation by Sri with EB Garamond font looks very clean

Questions we should answer as we read these papers:

What are the authors trying to do? Articulate their objectives.
How was it done prior to their work, and what were the limits of current practice?
What is new in their approach, and why do they think it will be successful?
What are the mid-term and final “exams” to check for success? (i.e., How is the method evaluated?)
What are limitations that the author mentions (any that they omit)?

About

Meetup to discuss state-of-the-art research on robot learning, similar to the Toronto ML/Systems Reading Group and Vector Institute’s Machine Learning Lunches- list of topics & articles below - all are welcome! 🎉

Annotate papers through Alphaxiv:

Join the waterloo group here https://www.alphaxiv.org/invite/776bc8f4-7d98-4185-83e1-b6da5120a4c9

Week ? - Symbolic Planning

Shah, et al. 2025. From Real World to Logic and Back: Learning Generalizable Relational Concepts For Long Horizon Robot Planning.

Week ? - Tactile Sensing

Sharma, et al. 2025. Self-supervised perception for tactile skin covered dexterous hands.

Week ? - Out of distribution

One of the key challenges is robustness. When robots get out of distribution. And how it recovers. I would say this is the key challenge right now.

offline RL enables learning from suboptimal demonstrations, not necessarily being able to be robust to edge cases.

Week ? - Cross-Embodiment

Workshops:

https://sites.google.com/view/xembodimentworkshop
Kevin Black: Talks about how they went from Octo to pi0 - notes in pi0
Edward johns: In-context learning

Yang, et al., 2024. Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation.

Week ? - Egocentric Papers / Learning from human videos

Grauman, et al. 2022. Ego4D: Around the World in 3,000 Hours of Egocentric Video. Kareer, et al. 2024. EgoMimic: Scaling Imitation Learning via Egocentric Video. Athalye, et al. 2025. From Pixels to Predicates: Learning Symbolic World Models via Pretrained Vision-Language Models. Doshi, et al. 2024. Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation. Patel, et al. 2024. GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization. Niu, et al. 2025. Pre-training Auto-regressive Robotic Models with 4D Representations. Liu, et al. 2025. EgoZero: Robot Learning from Smart Glasses. Ye, et al. 2025. MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA.

Week 6 - CoRL Shiza top paper

Week 5 - Data Collection

Grauman, et al. 2022. Ego4D: Around the World in 3,000 Hours of Egocentric Video. Wu, et al., 2023. GELLO: A General, Low-Cost, and Intuitive Teleoperation Framework for Robot Manipulators. Zhao, et al. 2023. Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware. Chi, et al., 2024. Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots. Etukuru, et. al. 2024. Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments. Iyer, et al. 2024. OPEN TEACH: A Versatile Teleoperation System for Robotic Manipulation. Xu, et al., 2025. DexUMI: Using Human Hand as the Universal Manipulation Interface for Dexterous Manipulation. Si, et al., 2025. ExoStart: Efficient learning for dexterous manipulation with sensorized exoskeleton demonstrations. Wu, et al., 2025. MagiClaw: A Dual-Use, Vision-Based Soft Gripper for Bridging the Human Demonstration to Robotic Deployment Gap..

Workshops:

Robot Learning: Methods and Considerations for Scaling Data Collection by MILA
The 1st Workshop on Making Sense of Data in Robotics at CoRL 2025

Week 4 - Swarm Robotics

https://luma.com/fjbqcg94

LLM2Swarm: Robot Swarms that Responsively Reason, Plan, and Collaborate through LLMs. Swarm Robotic Behaviors and Current Applications. HeRo 2.0: A Low-Cost Robot for Swarm Robotics Research

Week 3 - Real2Sim + Sim2Real / Whole-Body Control

Kalaria, et al., 2025. DreamControl: Human-Inspired Whole-Body Humanoid Control for Scene Interaction via Guided Diffusion by Ayush Garg

Additional Reading List:

Xie, et al., 2023. OmniControl: Control Any Joint at Any Time for Human Motion Generation.
Cheng, et al. 2024. Expressive Whole-Body Control for Humanoid Robots.
Fu, et al., 2024. HumanPlus: Humanoid Shadowing and Imitation from Humans.
- He, et al., 2024. HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots.
Dugar, et al., 2024. Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots.
Ji, et al., 2024. ExBody2: Advanced Expressive Humanoid Whole-Body Control.
Kim, et al., 2024. ARMOR: Egocentric Perception for Humanoid Robot Motion Planning in Dense Environments.
Serifi, et al., 2024. VMP: Versatile Motion Priors for Robustly Tracking Motion on Physical Characters.
He, et al., 2024. Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation (H2O).
He, et al., 2024. OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning.
van Marum, et al., 2024. Revisiting Reward Design and Evaluation for Robust Standing and Walking.
Radosavovic, et al., 2024. Humanoid Locomotion as Next Token Prediction.
Gu, et al., 2024. Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning (ETH-Loco).
Zhang, et al., 2025. FALCON: Learning Force-Adaptive Humanoid Loco-Manipulation.
Li, et al., 2025. CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks.
Ze, et al., 2025. TWIST: Teleoperated Whole-Body Imitation System.
Zhang, et al., 2025. Unleashing Humanoid Reaching Potential via Real-world-Ready Skill Space (R2S2).
Chen, et al., 2025. GMT: General Motion Tracking for Humanoid Whole-Body Control.
Allshire, et al., 2025. Visual Imitation Enables Contextual Humanoid Control.
Zhang, et al., 2025. Track Any Motions under Any Disturbances.
He, et al., 2025. Attention-Based Map Encoding for Learning Generalized Legged Locomotion.

Background knowledge:

Schulman, et al., 2017. Proximal Policy Optimization Algorithms.
Chi, et al., 2023. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion.

Extra articles:

https://www.agilityrobotics.com/content/training-a-whole-body-control-foundation-model?curius=4304
https://gofai2robots.substack.com/p/the-emerging-humanoid-motor-cortex

Week 2 - World Models

Topic - World Models

6:00pm - Assran, et al., 2025, V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning by Steven Gong

Additional Reading List:

Wu, et al., 2022. DayDreamer: World Models for Physical Robot Learning.
Feng, et. al., 2023. Finetuning Offline World Models in the Real World.
Bar, et al. 2024. Navigation World Models.
Bardes, et al., 2024. Revisiting Feature Prediction for Learning Visual Representations from Video.
Richens, et al., 2024. Robust agents learn causal world models.
ByteDance Seed, 2025. GR-3 Technical Report.
Goff, et. al, 2025. Learning to Drive from a World Model.
Zhou, et al., 2025. DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning.
Sobal, et al., 2025. Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models.
NVIDIA, 2025. Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models.
NVIDIA, 2025. Cosmos World Foundation Model Platform for Physical AI.
NVIDIA, 2025. Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Hu, et al., 2025. Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations.
Wayve, 2025. GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving.
Kong, et al., 2025. 3D and 4D World Modeling: A Survey
Guo, et al., 2025. MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft.
Hafner, et al., 2025. Mastering Diverse Domains through World Models.
Bruce, et al. 2025. Genie: Generative Interactive Environments.

Workshops:

CoRL 2025 https://simulatingrobotworlds.github.io/submit.html (see “Related Publications” section)
https://physical-world-modeling.github.io/#schedule
https://sites.google.com/view/worldmodel-iclr2025/home?authuser=0
- Recordings https://iclr.cc/virtual/2025/workshop/24000
https://www.worldmodelworkshop.org/
https://world-model-tutorial.github.io/#schedule

Week 1 - Robot Foundation Models

6:00pm - Physical Intelligence, 2024, Pi0: A Vision-Language-Action Flow Model for General Robot Control
6:20pm - TRI LBM Team, 2025, A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation
6:40pm - Google, 2025, Gemini Robotics: Bringing AI into the Physical World
7:00pm - Open floor discussion on future directions
7:15pm - Wrap up and social

Additional Reading List

Brohan, et al., 2022, RT-1: Robotics Transformer for Real-World Control at Scale
Driess, et al, 2023, PaLM-E: An Embodied Multimodal Language Model.
Brohan, et al., 2023, RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Open X-Embodiment Collaboration, 2023, Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Chi, et al. 2023, Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
Liu, et al., 2024, RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Etukuru, et al., 2024, Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments
Kim, et al. 2024, OpenVLA: An Open-Source Vision-Language-Action Model
Cheang, et al., 2024, GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
Octo Model Team, 2024, Octo: An Open-Source Generalist Robot Policy
Fang, et al., 2025, Robix: A Unified Model for Robot Interaction, Reasoning and Planning
NVIDIA, 2025, GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Yang, et al., 2025, FP3: A 3D Foundation Policy for Robotic Manipulation
https://arxiv.org/abs/2508.07917
Lee, et al. 2025, MolmoAct: Action Reasoning Models that can Reason in Space

Additional Resources

Short Blog on VLAs by Chris Paxton
U of T Robotics Institute Seminar on Robotics Foundation Models by Sergey Levine

Graph View

Backlinks

Egocentric Data
Robot Foundation Models
Robot Learning Hackathon
Teacher

Created with Quartz, © 2025

Blog
LinkedIn
Twitter
GitHub