Imitation Learning

Diffusion Policy

Diffusion policy presents a novel method for generating robot behavior by representing visuomotor policies as conditional denoising diffusion processes.

Advantages:

  • handling multimodal action distributions
  • suitability for high-dimensional action spaces
  • stable training

Has mention of Implicit Behavior Cloning.

Jason Ma says that these example notebooks are pretty good:

Sample Notebook Example

They define a 1D UNet architecture ConditionalUnet1D as the noies prediction network: Components

  • SinusoidalPosEmb Positional encoding for the diffusion iteration k
  • Downsample1d Strided convolution to reduce temporal resolution
  • Upsample1d Transposed convolution to increase temporal resolution
  • Conv1dBlock Conv1d GroupNorm Mish
  • ConditionalResidualBlock1D Takes two inputs x and cond.
    x is passed through 2 Conv1dBlock stacked together with residual connection. cond is applied to x with FiLM conditioning

Is this the same as the original UNET?

Is this

This is the sinusoidal positional embedding code (review Positional Encoding)

class SinusoidalPosEmb(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.dim = dim
 
    def forward(self, x):
        device = x.device
        half_dim = self.dim // 2
        emb = math.log(10000) / (half_dim - 1)
        emb = torch.exp(torch.arange(half_dim, device=device) * -emb)
        emb = x[:, None] * emb[None, :]
        emb = torch.cat((emb.sin(), emb.cos()), dim=-1)
        return emb

They use an implicit policy.