Heterogeneous Programming

Heterogeneous programming writes code for systems that mix processor kinds, typically CPU plus GPU. Introduced in ECE459 L21. Examples include the PS3 Cell (PowerPC + 8 SIMD coprocessors) [Ent08], CUDA, and OpenCL. The PS4 went back to a CPU+GPU on one AMD chip.

Why the split?

GPU cores are individually slower (~1.8 GHz vs 3.6 GHz CPU on ecetesla2) but there are hundreds of them (1920 CUDA cores). Offloading pays when the parallel work outweighs the setup and transfer cost.

Programming model

The same shape works across Cell, CUDA, and OpenCL:

Write the massively-parallel code (kernel) separately from the main code
At runtime, set up the input
Transfer data to the GPU
Wait while the GPU runs the kernel
Transfer results back

Data parallelism is the central feature: evaluate a kernel at a set of points (the index space). CUDA also supports task parallelism (different kernels running in parallel with one-point index spaces), but the course sticks to data parallelism.

See the drive-vs-fly analogy in GPU Programming for when offload pays off.

🛠️ Steven Gong

Table of Contents

Heterogeneous Programming

Programming model

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

Heterogeneous Programming

Programming model

Related

Graph View

Backlinks