DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills

Core idea: “Minimize tracking error”

Very FUNDAMENTAL paper, really set the path for making locomation work.

Limitation is that it’s a single-motion policy, later papers (and currently) we try to address this

DeepMimic project page:

https://xbpeng.github.io/projects/DeepMimic/index.html

Blog https://bair.berkeley.edu/blog/2018/04/10/virtual-stuntman/

Where did they get the mocap from?

“Each skill is learned from approximately 0.5-5s of mocap data collected from http://mocap.cs.cmu.edu and http://mocap.cs.sfu.ca.”

The rewards:

R_{I t} = w_{p} R_{pt} + w_{v} R_{v t} + w_{e} R_{e t} + w_{c} R_{c t}

And the gains are slightly tuned

Is there normalization applied to those rewards?

Where each reward is something along the lines of

r_{t}^{p} = exp [- 2∥ \overset{q}{^}_{t} - q_{t} ∥^{2}] .

Some insights (mostly came from blog):

Reference state initialization is very important to get it to work
Early termination
- ” This is analogous to the class imbalance problem encountered by other methodologies such as supervised learning.”

🛠️ Steven Gong

DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills

Graph View

Backlinks