CUDA Memory Allocation
When I first learned CUDA, I was introduced to cudaMallocManaged
.
CPU version
int N = 2<<20;
size_t size = N * sizeof(int);
int *a;
a = (int *)malloc(size);
... // Do stuff
free(a);
GPU Version
int N = 2<<20;
size_t size = N * sizeof(int);
int *a;
cudaMallocManaged(&a, size);
... // Do stuff
cudaFree(a);
Manual Device Memory Management
However, it’s better to manually manage memory, so you have better control:
cudaMalloc
will allocate memory directly to the active GPUcudaMallocHost
will allocate memory directly to the CPUcudaMemcpy
can copy (not transfer) memory, either from host to device or from device to host
Example
int *host_a, *device_a;
cudaMalloc(&device_a, size); // Device allocation
cudaMallocHost(&host_a, size); // Host allocation
cudaMemcpy(device_a, host_a, size, cudaMemcpyHostToDevice);
kernel<<<blocks, threads, 0, someStream>>>(device_a, N);
cudaMemcpy(host_a, device_a, size, cudaMemcpyDeviceToHost);
cudaFree(device_a);
cudaFreeHost(host_a); // Free pinned memory like this
What is the point of
cudaMallocHost
when we have thenew
keyword?In CUDA programming, when you’re optimizing for performance, you’d typically use cudaMallocHost to allocate memory that you plan to transfer between the CPU and GPU frequently. For memory that you don’t intend to transfer, or for a typical C++ application that does not interface with a GPU, you would use new.
The new
keyword is also more flexible, since it allows for object construction with constructors, unlike cudaMallocHost
.
Can you do
CudaMemcpy
if the memory was created using thenew
keyword?Yes, you can use
cudaMemcpy
with memory allocated using thenew
keyword, but the memory is pageable and the transfer will not be as efficient as with pinned memory.