CUDA Memory Allocation
When I first learned CUDA, I was introduced to cudaMallocManaged.
CPU version
int N = 2<<20;
size_t size = N * sizeof(int);
int *a;
a = (int *)malloc(size);
... // Do stuff
free(a);GPU Version
int N = 2<<20;
size_t size = N * sizeof(int);
int *a;
cudaMallocManaged(&a, size);
... // Do stuff
cudaFree(a);Manual Device Memory Management
However, it’s better to manually manage memory, so you have better control:
cudaMallocwill allocate memory directly to the active GPUcudaMallocHostwill allocate memory directly to the CPUcudaMemcpycan copy (not transfer) memory, either from host to device or from device to host
Example
int *host_a, *device_a;
cudaMalloc(&device_a, size); // Device allocation
cudaMallocHost(&host_a, size); // Host allocation
cudaMemcpy(device_a, host_a, size, cudaMemcpyHostToDevice);
kernel<<<blocks, threads, 0, someStream>>>(device_a, N);
cudaMemcpy(host_a, device_a, size, cudaMemcpyDeviceToHost);
cudaFree(device_a);
cudaFreeHost(host_a); // Free pinned memory like thisWhat is the point of
cudaMallocHostwhen we have thenewkeyword?In CUDA programming, when you’re optimizing for performance, you’d typically use cudaMallocHost to allocate memory that you plan to transfer between the CPU and GPU frequently. For memory that you don’t intend to transfer, or for a typical C++ application that does not interface with a GPU, you would use new.
The new keyword is also more flexible, since it allows for object construction with constructors, unlike cudaMallocHost.
Can you do
CudaMemcpyif the memory was created using thenewkeyword?Yes, you can use
cudaMemcpywith memory allocated using thenewkeyword, but the memory is pageable and the transfer will not be as efficient as with pinned memory.