Matrix Multiplication (Compute)
Need to really fundamentally understand this from a compute perspective.
This is one of the most fundamental building blocks for fast computation.
Matrix multiplication resources
- https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#shared-memory
- https://siboehm.com/articles/22/Fast-MMM-on-CPU (CPU level multiplication)
- https://siboehm.com/articles/22/CUDA-MMM (GPU level mutplication)
Wow, this is actually kind of complicated.
The most simple implementation
template <int rows, int columns, int inners>
inline void matmulImplNaive(const float *left, const float *right,
float *result) {
for (int row = 0; row < rows; row++) {
for (int col = 0; col < columns; col++) {
for (int inner = 0; inner < inners; inner++) {
result[row * columns + col] +=
left[row * columns + inner] * right[inner * columns + col];
} } } }