Corner Turning

Remember that matrices are generally stored in Row-Major Layout. This is inefficient for mat-mat multiplication, where the second matrix is also stored in row-major, but accessed in a column-wise way.

This means that no Memory Coalescing can be done, so it is much slower

The solution? Store the second input matrix in a column-major layout. This technique is called corner turning.

🛠️ Steven Gong

Corner Turning

Graph View

Backlinks