CUDA

SIMT Architecture

SIMT (Single-Instruction, Multiple-Thread) Architecture.

Links

The multiprocessor creates, manages, schedules, and executes threads in groups of 32 parallel threads called warps.

Individual threads composing a warp start together at the same program address, but they have their own instruction address counter and register state and are therefore free to branch and execute independently.

SIMD vs. SIMT?

Stephen Jones answered this in his talk. I should also revisit this.

SIMT extends the SIMD concept to multiple threads, and is found in GPU architectures. Unlike SIMD, SIMT allows each thread to independently handle conditional operations. It’s more adaptable when processing path might diverge based on the data (is this Control Divergence?)