Adaptive Layer Normalization (AdaLN)
AdaLN is a ViT architecture designed for multidomain learning, particularly in the context of extracting building information from satellite and street view images
Used in the pi0 paper and DiT.
Links:
Normally, when you add a new input modality (e.g., image embeddings, action tokens, etc.) into a pretrained LLM backbone, you need to inject this new information in a way that doesn’t disrupt the pretrained distribution too much.
adaLN-Zero introduces learnable scale and shift parameters into LayerNorm that are initialized at zero.
- Scale = 0
- Shift = 0
This means at the very start of training, the new conditioning path contributes nothing (zero effect) to the model’s computation. The model behaves like the pretrained LLM.