Diffusion Policy
Diffusion policy presents a novel method for generating robot behavior by representing visuomotor policies as conditional denoising diffusion processes.
Advantages:
- handling multimodal action distributions
- suitability for high-dimensional action spaces
- stable training
Has mention of Implicit Behavior Cloning.
Jason Ma says that these example notebooks are pretty good:
- https://colab.research.google.com/drive/1gxdkgRVfM55zihY9TFLja97cSVZOZq2B?usp=sharing#scrollTo=X-XRB_g3vsgf - State-based notebook
- https://colab.research.google.com/drive/18GIHeOQ5DyjMN8iIRZL2EKZ0745NLIpg?usp=sharing - vision-based notebook


Sample Notebook Example
They define a 1D UNet architecture ConditionalUnet1D as the noies prediction network:
Components
- SinusoidalPosEmbPositional encoding for the diffusion iteration k
- Downsample1dStrided convolution to reduce temporal resolution
- Upsample1dTransposed convolution to increase temporal resolution
- Conv1dBlockConv1d → GroupNorm → Mish
- ConditionalResidualBlock1DTakes two inputs- xand- cond.
 - xis passed through 2- Conv1dBlockstacked together with residual connection.- condis applied to- xwith FiLM conditioning
Is this the same as the original UNET?
Is this
This is the sinusoidal positional embedding code (review Positional Encoding)
class SinusoidalPosEmb(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.dim = dim
 
    def forward(self, x):
        device = x.device
        half_dim = self.dim // 2
        emb = math.log(10000) / (half_dim - 1)
        emb = torch.exp(torch.arange(half_dim, device=device) * -emb)
        emb = x[:, None] * emb[None, :]
        emb = torch.cat((emb.sin(), emb.cos()), dim=-1)
        return embThey use an implicit policy.