CUDA Shared Memory
CUDA Shared Variables
Shared variables in CUDA are stored in the shared memory.
This is super fundamental, motivating example is Matrix Multiplication.
Fundamental for you to understand so you can make things run fast.
Limit on Shared Memory
CUDA shared memory has size limits for each thread block which is 48 KB by default.
If a variable declaration is preceded by the
__shared__ keyword, it is declares a shared variable in CUDA.
Shared variables reside in shared memory.
Shared Variable Scope
The scope of a shared variable is within a thread block.
Shared memory can be thought of like the L1/L2 cache in CPU.
How is shared memory implemented?
The shared memory of a CUDA device is implemented with ?