Value Speculation

Value speculation runs a dependent computation in parallel with its predecessor by guessing the predecessor’s result. On a correct guess the work overlaps; on a miss, redo the dependent work. Covered in ECE459 L13.

Why does this work?

Most values in programs are predictable, just like most branches. If you can guess right most of the time, the rare recomputation is cheaper than always waiting.

How is this different from speculative execution?

Both run work in parallel on a guess, but they break different dependencies:

  • Speculative execution is control speculation: the two calls are independent and you guess whether the second is needed
  • Value speculation is data speculation: the second depends on the first’s return and you guess what value to feed it

Mispredict cost differs too. Speculative execution wastes of parallel work; value speculation pays to redo the dependent call serially.

Original, with a true data dependency between long_calculation and its successor:

fn do_other_work(x: i32, y: i32) -> i32 {
    let val = long_calculation(x, y);
    return second_long_calculation(val);
}

Speculated with two spawned threads, feeding last_value as the guess and falling back on a miscompare:

fn do_other_work(x: i32, y: i32, last_value: i32) -> i32 {
    let t1 = thread::spawn(move || long_calculation(x, y));
    let t2 = thread::spawn(move || second_long_calculation(last_value));
    let val = t1.join().unwrap();
    let v2 = t2.join().unwrap();
    if val == last_value { return v2; }
    return second_long_calculation(val);
}

Speculated with one spawned thread, running long_calculation on the current thread to skip a spawn/join pair:

fn do_other_work(x: i32, y: i32, last_value: i32) -> i32 {
    let t = thread::spawn(move || second_long_calculation(last_value));
    let val = long_calculation(x, y);
    let v2 = t.join().unwrap();
    if val == last_value { return v2; }
    return second_long_calculation(val);
}

The two-thread version is more uniform when you’re spawning many of these; the one-thread version has lower overhead for small counts. The last_value is wherever your prediction comes from: a cached prior result, a known-common default, or the last observed value of the upstream computation.

Where it shows up

  • 90% of customers use the standard billing plan: assume it, correct on miss
  • Software cache over a rarely-changing DB value, skipping the lookup unless the guess turns out stale

Similar to memoization, but with parallelization layered on top.

Timing model

Original runtime:

Speculative runtime:

where:

  • = time for the first computation
  • = time for the second (dependent) computation
  • = synchronization overhead
  • = mispredict probability

Same caveats as speculative execution

second_long_calculation must not depend on long_calculation’s side effects, and long_calculation’s return value must be deterministic.