Diffusion Policy

Diffusion policy presents a novel method for generating robot behavior by representing visuomotor policies as conditional denoising diffusion processes.

Advantages:

handling multimodal action distributions
suitability for high-dimensional action spaces
stable training

Has mention of Implicit Behavior Cloning.

Jason Ma says that these example notebooks are pretty good:

https://colab.research.google.com/drive/1gxdkgRVfM55zihY9TFLja97cSVZOZq2B?usp=sharing#scrollTo=X-XRB_g3vsgf - State-based notebook
https://colab.research.google.com/drive/18GIHeOQ5DyjMN8iIRZL2EKZ0745NLIpg?usp=sharing - vision-based notebook

Sample Notebook Example

They define a 1D UNet architecture ConditionalUnet1D as the noies prediction network: Components

SinusoidalPosEmb Positional encoding for the diffusion iteration k
Downsample1d Strided convolution to reduce temporal resolution
Upsample1d Transposed convolution to increase temporal resolution
Conv1dBlock Conv1d → GroupNorm → Mish
ConditionalResidualBlock1D Takes two inputs x and cond.
x is passed through 2 Conv1dBlock stacked together with residual connection. cond is applied to x with FiLM conditioning

Is this the same as the original UNET?

Is this

This is the sinusoidal positional embedding code (review Positional Encoding)

$PE (p os, 2 i) = s in (\frac{p os}{1000 0 ^{2 i} / d _{m o d e l}})$ $PE (p os, 2 i + 1) = cos (\frac{p os}{1000 0 ^{2 i} / d _{m o d e l}})$

class SinusoidalPosEmb(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.dim = dim
 
    def forward(self, x):
        device = x.device
        half_dim = self.dim // 2
        emb = math.log(10000) / (half_dim - 1)
        emb = torch.exp(torch.arange(half_dim, device=device) * -emb)
        emb = x[:, None] * emb[None, :]
        emb = torch.cat((emb.sin(), emb.cos()), dim=-1)
        return emb

They use an implicit policy.

🛠️ Steven Gong

Table of Contents

Diffusion Policy

Sample Notebook Example

Graph View

Backlinks