GPU Optimization
In my own words: Optimizing GPUs.

- Source: Jane Street ML Talk
There are some fundamental ideas behind optimization:
- Principle of Locality for GPUs
- We want to keep data in Registers as much as possible for as long as possible. It’s
- Latency Hiding
- Oftentimes, there’s quite a big overhead in memcpy. Oftentimes, we can hide this latency behind computation (leveraging 2 different CUDA Streams) for example.
- TODO: insert profiling of before and after of profiling
Memory bottleneck (within GPU):
Compute Bottleneck:
Kernel Overhead:
- Kernel Fusion
- Leverage CUDA Graph
