Kernel Fusion
This is simply combining CUDA Kernels.
Kernel fusion is an optimization technique in GPU computing that combines multiple small GPU operations (kernels) into a single, more efficient kernel.
GPUs execute operations in parallel, but frequent memory access slows down execution. In deep learning, models often perform sequences of element-wise operations, such as:
- Matrix multiplications
- Activation functions (ReLU, Softmax, etc.)
- Normalization (LayerNorm, BatchNorm, etc.)
- Scaling and element-wise additions
Each of these steps traditionally launches separate GPU kernels, leading to:
Excessive memory read/write operations (global memory bottleneck) High launch overhead (each kernel launch adds latency)