CUDA Memory Allocation
When I first learned CUDA, I was introduced to cudaMallocManaged
.
CPU version
GPU Version
Manual Device Memory Management
However, it’s better to manually manage memory, so you have better control:
cudaMalloc
will allocate memory directly to the active GPUcudaMallocHost
will allocate memory directly to the CPUcudaMemcpy
can copy (not transfer) memory, either from host to device or from device to host
Example
What is the point of
cudaMallocHost
when we have thenew
keyword?In CUDA programming, when you’re optimizing for performance, you’d typically use cudaMallocHost to allocate memory that you plan to transfer between the CPU and GPU frequently. For memory that you don’t intend to transfer, or for a typical C++ application that does not interface with a GPU, you would use new.
The new
keyword is also more flexible, since it allows for object construction with constructors, unlike cudaMallocHost
.
Can you do
CudaMemcpy
if the memory was created using thenew
keyword?Yes, you can use
cudaMemcpy
with memory allocated using thenew
keyword, but the memory is pageable and the transfer will not be as efficient as with pinned memory.