Compiler Optimization

Loop Optimization

Loop optimization is the family of compiler transformations that restructure loops to reduce overhead, improve locality, or expose parallelism. Covered in ECE459 L18.

Why focus on loops?

Programs spend most of their time looping, especially loops with high iteration counts. Per-iteration wins multiply. Profiling tells you which loops matter.

Transformations

  • Induction variable elimination: an induction variable varies per iteration. Remove redundant ones that are computable from a primary induction variable
  • Scalar replacement: replace repeated a[i] with temp = a[i] and use temp. Needs proof that a[i] doesn’t change between reads
  • Bounds check elimination: sane languages check array bounds. If the loop never exceeds them, drop the checks. Idiomatic Rust uses IntoIterator and avoids the question
  • Loop unrolling: for i in &[1,2,3,4] { f(*i); } becomes f(0); f(1); f(2); f(3);. Less branching, exposes parallelism. Synergizes with software pipelining (overlap iterations) and SIMD
  • Loop interchange: swap nesting to match memory layout. Rust is row-major (a[1][1] next to a[1][2]). Iterate with the fast-changing index innermost so each step reads contiguous memory. OpenGL is column-major
  • Loop fusion: two loops over 0..100 merge into one. Saves loop overhead and reuses cache lines
  • Loop fission: inverse of fusion, sometimes better because loop bodies become simpler or more vectorizable
  • Loop-invariant code motion (hoisting): pull s = x * y out of the loop if neither x nor y change

Counter-intuitive trick from L18

For a large enough matrix, iterating “wrong-way” (column-major over row-major storage) can be faster. You trigger all page faults up front, populating the cache, then proceed unimpeded. Used in some matrix-multiply implementations.

Compilers do many of these automatically when they can prove safety. Aliasing ambiguity is the usual blocker. Rust’s &mut uniqueness helps, though LLVM may not fully exploit it.