Self-Optimizing Software

Self-optimizing software changes itself at runtime to run faster. Introduced in ECE459 L20. Compilers only optimize at compile-time, but a sufficiently-smart program can adapt once real data arrives.

Why bother at runtime?

Compile-time decisions are made without seeing the real workload. Runtime observation lets the program swap in a better decision once it knows what the program is actually doing.

Three tiers, increasing in ambition:

  1. Change what’s in memory: caching. Track popular exchange rates in memory, let CAD-EUR become hot automatically, and swap in CAD-GBP when that takes over. Your program should already be doing this
  2. Observe and change configuration. Pick among alternatives based on measured behaviour
  3. Change the binary itself

Tier 2 examples:

  • Query optimizer notices a different plan is faster and adopts it
  • Reorganize on-disk storage to match the dominant access pattern (Excel analogy: sort by student ID during the term, by user ID afterwards)
  • External services: remember RTT per server, send 8/10 messages to the current best and 1/10 each to alternatives to keep the estimate fresh

Genetic algorithms for parameter tuning

A genetic algorithm is inspired by natural selection [Whi98]. Generate random candidate solutions, score them with a fitness function, let the best reproduce (combine/mutate) into the next generation, and repeat until good enough or out of budget.

Requirements:

  • A parameter space, not too small (else brute force), not trivial
  • A continuous or discrete fitness function (binary pass/fail can’t rank solutions)

Not guaranteed to find the global optimum, often a local max. Shines on non-linear problems where parameters interact. Google Maps style example: tune the number of routes, heuristic, staleness decay, search radius, time-of-day sensitivity. Too many interacting knobs to tune by hand, so let the GA explore.

Hotspot (JVM JIT)

Hotspot is the canonical runtime optimizer. The JVM has two JITs: client (fast compile, less optimized code) and server (slower compile, better code). Clients pay startup costs often, servers amortize the heavier JIT over long runs.

The big advantage: runtime behaviour lets the JIT change decisions. If inlining should have happened but didn’t, swap it in. Useless for something like branch prediction, since the hardware already adapts.

Key JIT tricks [Str18]:

  • Escape analysis: if an object doesn’t escape a method, stack-allocate instead of heap, cutting GC pressure. Also unlocks lock optimizations
  • Lock elision: a synchronized block only ever reached by one thread drops the lock
  • Lock coarsening: merge sequential blocks holding the same lock
  • Nested locks: fold repeated acquire/release in recursion
  • On-stack replacement (OSR): recompile a hot method and swap the running frame for the new version. Brief pause, possibly recompile on another thread. Also what lets you edit code under a JVM debugger and continue
  • Intrinsics: hand-written, platform-optimized implementations (precompiled for x86 etc.) swapped in for specific functions. Originally C++-level. Graal is a newer approach, turning bytecode straight to machine code

Rust has no JVM-style runtime and can’t do this natively. But…

Rewriting the binary (C / Rust)

Two approaches:

Multi-variant precompiled code. Compile several versions with different trade-offs, default to one, overwrite in memory with another when observation demands. Limits: how many variants to ship, coordinating swap-outs so the running code isn’t the target, and since variants were chosen at compile time, you’re still guessing.

True dynamic recompilation. Take the bytes of a segment, feed them to a compiler on the target system, lower to IR, optimize with runtime info, emit a new binary, swap. Requires a compiler on the deploy machine, which isn’t always available (end-user Windows box, or servers with security policies banning compilers). Upside: can inline or rewrite library functions, specialized for the actual use case. [EHS19] show recompilation pays off on a 649×649 matrix, 20000 iterations, once the recompile cost is amortized.

Program sus

Self-rewriting binaries look like viruses to antivirus software. A long arms race: malware mutates itself (shuffle instructions, inject NOPs) to dodge binary-pattern detection, and heuristic AV flags any program that rewrites itself. A legitimate self-optimizer shipped to end-users may be blocked or quarantined.