Diffusion Policy
https://diffusion-policy.cs.columbia.edu/diffusion_policy_2023.pdf
Diffusion policy presents a novel method for generating robot behavior by representing visuomotor policies as conditional denoising diffusion processes. This method, known as Diffusion Policy, demonstrates superior performance across various robot manipulation tasks, achieving an average improvement of 46.9% over existing methods. Key advantages include handling multimodal action distributions, suitability for high-dimensional action spaces, and stable training.
Has mention of Implicit Behavioral Cloning.
Jason Ma says that these example notebooks are pretty good:
- https://colab.research.google.com/drive/1gxdkgRVfM55zihY9TFLja97cSVZOZq2B?usp=sharing#scrollTo=X-XRB_g3vsgf - State-based notebook
- https://colab.research.google.com/drive/18GIHeOQ5DyjMN8iIRZL2EKZ0745NLIpg?usp=sharing - vision-based notebook
Sample Notebook Example
They define a 1D UNet architecture ConditionalUnet1D
as the noies prediction network:
Components
SinusoidalPosEmb
Positional encoding for the diffusion iteration kDownsample1d
Strided convolution to reduce temporal resolutionUpsample1d
Transposed convolution to increase temporal resolutionConv1dBlock
Conv1d → GroupNorm → MishConditionalResidualBlock1D
Takes two inputsx
andcond
.
x
is passed through 2Conv1dBlock
stacked together with residual connection.cond
is applied tox
with FiLM conditioning
Is this the same as the original UNET?
Is this
This is the sinusoidal positional embedding code (review Positional Encoding)
class SinusoidalPosEmb(nn.Module):
def __init__(self, dim):
super().__init__()
self.dim = dim
def forward(self, x):
device = x.device
half_dim = self.dim // 2
emb = math.log(10000) / (half_dim - 1)
emb = torch.exp(torch.arange(half_dim, device=device) * -emb)
emb = x[:, None] * emb[None, :]
emb = torch.cat((emb.sin(), emb.cos()), dim=-1)
return emb
They use an implicit policy.