GPU Optimization

In my own words: Optimizing GPUs.

There are some fundamental ideas behind optimization:

  • Principle of Locality for GPUs
    • We want to keep data in Registers as much as possible for as long as possible. It’s
  • Latency Hiding
    • Oftentimes, there’s quite a big overhead in memcpy. Oftentimes, we can hide this latency behind computation (leveraging 2 different CUDA Streams) for example.
    • TODO: insert profiling of before and after of profiling

Memory bottleneck (within GPU):

Compute Bottleneck:

Kernel Overhead: