Matrix Multiplication (Compute)

Need to really fundamentally understand this from a compute perspective.

This is one of the most fundamental building blocks for fast computation.

Matrix multiplication resources

Wow, this is actually kind of complicated.

The most simple implementation

template <int rows, int columns, int inners>
inline void matmulImplNaive(const float *left, const float *right,
                            float *result) {
  for (int row = 0; row < rows; row++) {
    for (int col = 0; col < columns; col++) {
      for (int inner = 0; inner < inners; inner++) {
        result[row * columns + col] +=
            left[row * columns + inner] * right[inner * columns + col];
} } } }