CUDA Best Practices

Tips

From https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#memory-optimizations

High Priority: Minimize data transfer between the host and the device, even if it means running some kernels on the device that do not show performance gains when compared with running them on the host CPU.

  • Peak theoretical bandwidth between the device memory and the GPU is 898 GB/s
  • eak theoretical bandwidth between host memory and device memory (16 GB/s on the PCIe x16 Gen3)

https://blog.logicalincrements.com/2018/08/data-transfer-rates-bandwidth-cpu-ram-pcie-m-2-sata-usb-hdmi/