DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills
Core idea: “Minimize tracking error”
Very FUNDAMENTAL paper, really set the path for making locomation work.
- Limitation is that it’s a single-motion policy, later papers (and currently) we try to address this
Mentioned from BeyondMimic.
DeepMimic project page:
Blog https://bair.berkeley.edu/blog/2018/04/10/virtual-stuntman/
Where did they get the mocap from?
“Each skill is learned from approximately 0.5-5s of mocap data collected from http://mocap.cs.cmu.edu and http://mocap.cs.sfu.ca.”
The rewards:
- And the gains are slightly tuned
Is there normalization applied to those rewards?
Where each reward is something along the lines of
Some insights (mostly came from blog):
- Reference state initialization is very important to get it to work
- Early termination
- ” This is analogous to the class imbalance problem encountered by other methodologies such as supervised learning.”