Imitation Learning

Diffusion Policy

https://diffusion-policy.cs.columbia.edu/diffusion_policy_2023.pdf

Diffusion policy presents a novel method for generating robot behavior by representing visuomotor policies as conditional denoising diffusion processes. This method, known as Diffusion Policy, demonstrates superior performance across various robot manipulation tasks, achieving an average improvement of 46.9% over existing methods. Key advantages include handling multimodal action distributions, suitability for high-dimensional action spaces, and stable training.

Has mention of Implicit Behavioral Cloning.

Jason Ma says that these example notebooks are pretty good:

Sample Notebook Example

They define a 1D UNet architecture ConditionalUnet1D as the noies prediction network: Components

  • SinusoidalPosEmb Positional encoding for the diffusion iteration k
  • Downsample1d Strided convolution to reduce temporal resolution
  • Upsample1d Transposed convolution to increase temporal resolution
  • Conv1dBlock Conv1d GroupNorm Mish
  • ConditionalResidualBlock1D Takes two inputs x and cond.
    x is passed through 2 Conv1dBlock stacked together with residual connection. cond is applied to x with FiLM conditioning

Is this the same as the original UNET?

Is this

This is the sinusoidal positional embedding code (review Positional Encoding)

class SinusoidalPosEmb(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.dim = dim
 
    def forward(self, x):
        device = x.device
        half_dim = self.dim // 2
        emb = math.log(10000) / (half_dim - 1)
        emb = torch.exp(torch.arange(half_dim, device=device) * -emb)
        emb = x[:, None] * emb[None, :]
        emb = torch.cat((emb.sin(), emb.cos()), dim=-1)
        return emb

They use an implicit policy.