Masked Attention
https://www.youtube.com/watch?v=bCz4OMemCcA
How is masked attention implemented? Just use a lower triangular matrix right.
- Andrej Karpathy shows how this is implemented
https://www.youtube.com/watch?v=bCz4OMemCcA
How is masked attention implemented? Just use a lower triangular matrix right.