Data Parallelism

Data parallelism is a parallel execution pattern in which multiple threads perform the same operation on separate data items. Covered in ECE459 L17.

Why split it this way?

The same computation runs many times, so the work divides cleanly by input. Assign slices to threads, and each thread does exactly the same thing on its slice.

Analogy: a call center where everyone handles support calls the same way. Example: doubling every element of a big array by giving each thread a slice.

Contrast with task parallelism (different operations per thread, like an assembly line).

SIMD is data parallelism inside a single core: one instruction operates on a whole vector. Multicore plus SIMD gets both levels.

ML training context

Related to Distributed Machine Learning. Two main strategies: