Row-Major Layout

This is the layout that makes more intuitive sense to me.

In CUDA

Matrices in CUDA are typically stored in row-major order by default, similar to the storage format used in C and C++.