Positional Encoding

Rotary Positional Encoding (RoPE)

It applies a rotation to query and key vectors in multi-head attention, based on their position index.

The rotation is computed using predefined sin and cos functions, very similar in spirit to sinusoidal encoding — but instead of adding or concatenating, RoPE encodes position by rotating vectors in a complex plane.