CUDA Shared Memory
Learned about this through the PMPP book.
CUDA Shared Variables
Shared variables in CUDA are stored in the shared memory.
This is super fundamental, motivating example is Matrix Multiplication.
Fundamental for you to understand so you can make things run fast.
Resources
Limit on Shared Memory
CUDA shared memory has size limits for each thread block which is 48 KB by default.
If a variable declaration is preceded by the __shared__
keyword, it is declares a shared variable in CUDA.
Shared variables reside in shared memory.
Shared Variable Scope
The scope of a shared variable is within a thread block.
Shared memory can be thought of like the L1/L2 cache in CPU.
How is shared memory implemented?
The shared memory of a CUDA device is implemented with ?