Value Speculation
Value speculation runs a dependent computation in parallel with its predecessor by guessing the predecessor’s result. On a correct guess the work overlaps; on a miss, redo the dependent work. Covered in ECE459 L13.
Why does this work?
Most values in programs are predictable, just like most branches. If you can guess right most of the time, the rare recomputation is cheaper than always waiting.
How is this different from speculative execution?
Both run work in parallel on a guess, but they break different dependencies:
- Speculative execution is control speculation: the two calls are independent and you guess whether the second is needed
- Value speculation is data speculation: the second depends on the first’s return and you guess what value to feed it
Mispredict cost differs too. Speculative execution wastes of parallel work; value speculation pays to redo the dependent call serially.
Original, with a true data dependency between long_calculation and its successor:
fn do_other_work(x: i32, y: i32) -> i32 {
let val = long_calculation(x, y);
return second_long_calculation(val);
}Speculated with two spawned threads, feeding last_value as the guess and falling back on a miscompare:
fn do_other_work(x: i32, y: i32, last_value: i32) -> i32 {
let t1 = thread::spawn(move || long_calculation(x, y));
let t2 = thread::spawn(move || second_long_calculation(last_value));
let val = t1.join().unwrap();
let v2 = t2.join().unwrap();
if val == last_value { return v2; }
return second_long_calculation(val);
}Speculated with one spawned thread, running long_calculation on the current thread to skip a spawn/join pair:
fn do_other_work(x: i32, y: i32, last_value: i32) -> i32 {
let t = thread::spawn(move || second_long_calculation(last_value));
let val = long_calculation(x, y);
let v2 = t.join().unwrap();
if val == last_value { return v2; }
return second_long_calculation(val);
}The two-thread version is more uniform when you’re spawning many of these; the one-thread version has lower overhead for small counts. The last_value is wherever your prediction comes from: a cached prior result, a known-common default, or the last observed value of the upstream computation.
Where it shows up
- 90% of customers use the standard billing plan: assume it, correct on miss
- Software cache over a rarely-changing DB value, skipping the lookup unless the guess turns out stale
Similar to memoization, but with parallelization layered on top.
Timing model
Original runtime:
Speculative runtime:
where:
- = time for the first computation
- = time for the second (dependent) computation
- = synchronization overhead
- = mispredict probability
Same caveats as speculative execution
second_long_calculationmust not depend onlong_calculation’s side effects, andlong_calculation’s return value must be deterministic.