CUDA Memory

CUDA Shared Memory

Learned about this through the PMPP book.

CUDA Shared Variables

Shared variables in CUDA are stored in the shared memory.

This is super fundamental, motivating example is Matrix Multiplication.

Fundamental for you to understand so you can make things run fast.

Resources

Limit on Shared Memory

CUDA shared memory has size limits for each thread block which is 48 KB by default.

If a variable declaration is preceded by the __shared__ keyword, it is declares a shared variable in CUDA.

Shared variables reside in shared memory.

Shared Variable Scope

The scope of a shared variable is within a thread block.

Shared memory can be thought of like the L1/L2 cache in CPU.

How is shared memory implemented?

The shared memory of a CUDA device is implemented with ?