Adaptive Layer Normalization (AdaLN)

AdaLN is a ViT architecture designed for multidomain learning, particularly in the context of extracting building information from satellite and street view images

Used in the pi0 paper and DiT.

Links:

https://web.eecs.umich.edu/~stellayu/publication/doc/2022AdaLN.pdf

Normally, when you add a new input modality (e.g., image embeddings, action tokens, etc.) into a pretrained LLM backbone, you need to inject this new information in a way that doesn’t disrupt the pretrained distribution too much.

adaLN-Zero introduces learnable scale and shift parameters into LayerNorm that are initialized at zero.

Scale = 0
Shift = 0

This means at the very start of training, the new conditioning path contributes nothing (zero effect) to the model’s computation. The model behaves like the pretrained LLM.

🛠️ Steven Gong

Adaptive Layer Normalization (AdaLN)

Graph View

Backlinks