Warp Matrix Multiply Accmulate (WMMA)
Reosurces
- https://developer.nvidia.com/blog/programming-tensor-cores-cuda-9/
- https://developer.download.nvidia.com/video/gputechconf/gtc/2020/presentations/s21745-developing-cuda-kernels-to-push-tensor-cores-to-the-absolute-limit-on-nvidia-a100.pdf