Robot Learning Reading Group
Starting this robot learning reading group with New Systems, and potentially going to host more after.
Similar events:
- https://ut-robotlearning.github.io/
- https://www.youtube.com/@AustinRobotics
- Presentation by Sri with EB Garamond font looks very clean
Questions we should answer as we read these papers:
- What are the authors trying to do? Articulate their objectives.
- How was it done prior to their work, and what were the limits of current practice?
- What is new in their approach, and why do they think it will be successful?
- What are the mid-term and final “exams” to check for success? (i.e., How is the method evaluated?)
- What are limitations that the author mentions (any that they omit)?
About
​Meetup to discuss state-of-the-art research on robot learning, similar to the Toronto ML/Systems Reading Group and Vector Institute’s Machine Learning Lunches- list of topics & articles below - all are welcome! 🎉
Annotate papers through Alphaxiv:
- Join the waterloo group here https://www.alphaxiv.org/invite/776bc8f4-7d98-4185-83e1-b6da5120a4c9
Week ? - Out of distribution
One of the key challenges is robustness. When robots get out of distribution. And how it recovers. I would say this is the key challenge right now.
offline RL enables learning from suboptimal demonstrations, not necessarily being able to be robust to edge cases.
Week ? - Cross-Embodiment
Workshops:
- https://sites.google.com/view/xembodimentworkshop
- Kevin Black: Talks about how they went from Octo to pi0 - notes in pi0
- Edward johns: In-context learning
Yang, et al., 2024. Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation.
Week ? - Egocentric Papers
Kareer, et al. 2024. EgoMimic: Scaling Imitation Learning via Egocentric Video. Athalye, et al. 2025. From Pixels to Predicates: Learning Symbolic World Models via Pretrained Vision-Language Models. Doshi, et al. 2024. Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation. Patel, et al. 2024. GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization. Niu, et al. 2025. Pre-training Auto-regressive Robotic Models with 4D Representations.
Week ? - Shiza top paper
Week ? - Learning from Human Videos
Liu, et al. 2025. EgoZero: Robot Learning from Smart Glasses. Ye, et al. 2025. MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA.
Week 3 - Real2Sim + Sim2Real / Whole-Body Control
Kalaria, et al., 2025. DreamControl: Human-Inspired Whole-Body Humanoid Control for Scene Interaction via Guided Diffusion by Ayush Garg
Additional Reading List:
- Xie, et al., 2023. OmniControl: Control Any Joint at Any Time for Human Motion Generation.
- Cheng, et al. 2024. Expressive Whole-Body Control for Humanoid Robots.
- Fu, et al., 2024. HumanPlus: Humanoid Shadowing and Imitation from Humans.
-
- He, et al., 2024. HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots.
- Dugar, et al., 2024. Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots.
- Ji, et al., 2024. ExBody2: Advanced Expressive Humanoid Whole-Body Control.
- Kim, et al., 2024. ARMOR: Egocentric Perception for Humanoid Robot Motion Planning in Dense Environments.
- Serifi, et al., 2024. VMP: Versatile Motion Priors for Robustly Tracking Motion on Physical Characters.
- He, et al., 2024. Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation (H2O).
- He, et al., 2024. OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning.
- van Marum, et al., 2024. Revisiting Reward Design and Evaluation for Robust Standing and Walking.
- Radosavovic, et al., 2024. Humanoid Locomotion as Next Token Prediction.
- Gu, et al., 2024. Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning (ETH-Loco).
- Zhang, et al., 2025. FALCON: Learning Force-Adaptive Humanoid Loco-Manipulation.
- Li, et al., 2025. CLONE: Closed-Loop Whole-Body Humanoid Teleoperation for Long-Horizon Tasks.
- Ze, et al., 2025. TWIST: Teleoperated Whole-Body Imitation System.
- Zhang, et al., 2025. Unleashing Humanoid Reaching Potential via Real-world-Ready Skill Space (R2S2).
- Chen, et al., 2025. GMT: General Motion Tracking for Humanoid Whole-Body Control.
- Allshire, et al., 2025. Visual Imitation Enables Contextual Humanoid Control.
- Zhang, et al., 2025. Track Any Motions under Any Disturbances.
- He, et al., 2025. Attention-Based Map Encoding for Learning Generalized Legged Locomotion.
Background knowledge:
- Schulman, et al., 2017. Proximal Policy Optimization Algorithms.
- Chi, et al., 2023. Diffusion Policy: Visuomotor Policy Learning via Action Diffusion.
Extra articles:
- https://www.agilityrobotics.com/content/training-a-whole-body-control-foundation-model?curius=4304
- https://gofai2robots.substack.com/p/the-emerging-humanoid-motor-cortex
Week 2 - World Models
Topic - World Models
- 6:00pm - Assran, et al., 2025, V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning by Steven Gong
Additional Reading List:
- Wu, et al., 2022. DayDreamer: World Models for Physical Robot Learning.
- Feng, et. al., 2023. Finetuning Offline World Models in the Real World.
- Bar, et al. 2024. Navigation World Models.
- Bardes, et al., 2024. Revisiting Feature Prediction for Learning Visual Representations from Video.
- Richens, et al., 2024. Robust agents learn causal world models.
- ByteDance Seed, 2025. GR-3 Technical Report.
- Goff, et. al, 2025. Learning to Drive from a World Model.
- Zhou, et al., 2025. DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning.
- Sobal, et al., 2025. Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models.
- NVIDIA, 2025. Cosmos-Drive-Dreams: Scalable Synthetic Driving Data Generation with World Foundation Models.
- NVIDIA, 2025. Cosmos World Foundation Model Platform for Physical AI.
- NVIDIA, 2025. Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
- Hu, et al., 2025. Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations.
- Wayve, 2025. GAIA-2: A Controllable Multi-View Generative World Model for Autonomous Driving.
- Kong, et al., 2025. 3D and 4D World Modeling: A Survey
- Guo, et al., 2025. MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft.
- Hafner, et al., 2025. Mastering Diverse Domains through World Models.
- Bruce, et al. 2025. Genie: Generative Interactive Environments.
Workshops:
- CoRL 2025 https://simulatingrobotworlds.github.io/submit.html (see “Related Publications” section)
- https://physical-world-modeling.github.io/#schedule
- https://sites.google.com/view/worldmodel-iclr2025/home?authuser=0
- Recordings https://iclr.cc/virtual/2025/workshop/24000
- https://www.worldmodelworkshop.org/
- https://world-model-tutorial.github.io/#schedule
Week 1 - Robot Foundation Models
- 6:00pm - Physical Intelligence, 2024, Pi0: A Vision-Language-Action Flow Model for General Robot Control
- 6:20pm - TRI LBM Team, 2025, A Careful Examination of Large Behavior Models for Multitask Dexterous Manipulation
- 6:40pm - Google, 2025, Gemini Robotics: Bringing AI into the Physical World
- 7:00pm - Open floor discussion on future directions
- 7:15pm - Wrap up and social
Additional Reading List
- Brohan, et al., 2022, RT-1: Robotics Transformer for Real-World Control at Scale
- Driess, et al, 2023, PaLM-E: An Embodied Multimodal Language Model.
- Brohan, et al., 2023, RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
- Open X-Embodiment Collaboration, 2023, Open X-Embodiment: Robotic Learning Datasets and RT-X Models
- Chi, et al. 2023, Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
- Liu, et al., 2024, RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
- Etukuru, et al., 2024, Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments
- Kim, et al. 2024, OpenVLA: An Open-Source Vision-Language-Action Model
- Cheang, et al., 2024, GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
- Octo Model Team, 2024, Octo: An Open-Source Generalist Robot Policy
- Fang, et al., 2025, Robix: A Unified Model for Robot Interaction, Reasoning and Planning
- NVIDIA, 2025, GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
- Yang, et al., 2025, FP3: A 3D Foundation Policy for Robotic Manipulation
- https://arxiv.org/abs/2508.07917
- ​Lee, et al. 2025, MolmoAct: Action Reasoning Models that can Reason in Space
Additional Resources
- Short Blog on VLAs by Chris Paxton
- U of T Robotics Institute Seminar on Robotics Foundation Models by Sergey Levine