PyTorch MultiheadAttention
See this first:
Transformers building blocks.
https://pytorch.org/tutorials/prototype/nestedtensor.html#why-nested-tensor
See this first:
Transformers building blocks.
https://pytorch.org/tutorials/prototype/nestedtensor.html#why-nested-tensor