CUDA Memory Alignment
CUDA
- CudaMallocPitch Understanding how memory is aligned will be fundamental to getting CUDA to run so much faster.
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#device-memory-accesses
“When a warp executes an instruction that accesses global memory, it coalesces the memory accesses of the threads within the warp into one or more of these memory transactions depending on the size of the word accessed by each thread and the distribution of the memory addresses across the threads”
CUDA Memory Alignment
Also see CUDA Memory.