# Memory-Bound Program §

Notes copied from Chapter 4 of PMPP.

Memory-Bound Program

Memory-bound programs are programs whose execution speed is limited by memory access throughput.

One of the fundamental concept is the compute-to-global-memory-access ratio.

Memory is expensive

Compute is a lot faster than memory access, so you want this number to be as high as possible.

They motivate this through matrix multiplication.

__global__ void MatrixMulKernel(float* M, float* N, float* P,
int Width) {
// Calculate the row index of the P element and M
// Calculate the column index of P and N
if ((Row < Width) && (Col < Width)) {
float Pvalue = 0;
// each thread computes one element of the block sub-matrix
for (int k = 0; k < Width; ++k) {
Pvalue += M[Row*Width+k]*N[k*Width+Col];
}
P[Row*Width+Col] = Pvalue;
}
}

The above is slow

Every single time you are reading into M[...], N[...], or P[...], that is a read from global memory.

EXERCISE: Calculate the compute-to-memory-acess ratio. It is 1.0