Kernel Fusion

This is simply combining CUDA Kernels.

Kernel fusion is an optimization technique in GPU computing that combines multiple small GPU operations (kernels) into a single, more efficient kernel.

GPUs execute operations in parallel, but frequent memory access slows down execution. In deep learning, models often perform sequences of element-wise operations, such as:

Matrix multiplications
Activation functions (ReLU, Softmax, etc.)
Normalization (LayerNorm, BatchNorm, etc.)
Scaling and element-wise additions

Each of these steps traditionally launches separate GPU kernels, leading to:

Excessive memory read/write operations (global memory bottleneck) High launch overhead (each kernel launch adds latency)

🛠️ Steven Gong

Kernel Fusion

Graph View

Backlinks