Queueing Theory

The mathematics of waiting lines. Given an arrival process, a service process, and some number of servers, predicts steady-state metrics: average queue length, wait time, utilization. From ECE459 L31.

Why care?

Shows up everywhere queues do: call centres, routers, supermarket tills, CPU schedulers. It’s how telcos decide staffing levels. The non-obvious payoff: the latency-vs-utilization curve is a hockey stick, not a line, which is why over-provisioning isn’t wasteful but is literally the cost of predictable latency.

Kendall notation: A/S/c/K/N/D

  • A: arrival process (M = Markov/Poisson, D = deterministic, G = general)
  • S: service-time distribution
  • c: number of servers
  • K: queue capacity (default āˆž)
  • N: population size (default āˆž)
  • D: scheduling discipline (default FIFO)

Common models: 1, k, M/M/1/K, M/G/1.

Key quantities

  • : arrival rate
  • : service rate per server
  • : utilization, must be < 1 for stability
  • : average number of items in system
  • : average time in system
  • Little’s Law: , holds for any stable system

Vocabulary [Liu09]

  • Server (teller), customer (requester)
  • Wait time (in line), service time (at teller), response time = wait + service
  • Residence time: response across multiple visits
  • Throughput: rate of completed requests

Stability:

If , queue length grows without bound [HB13]:

Averages, not instant by instant. Short overshoots recover.

The hockey stick

Plot vs . Linear up to , vertical past . A server at 99% utilization queues ~100Ɨ longer than at 90%. Production systems target .

Doubling both \lambda and \mu halves response time

Not unchanged, halved. If the boss doubles arrival rate, the CPU doesn’t need to be 2Ɨ faster to preserve response time, it needs less. is sensitive to the gap, not the ratio.

One fast server vs. many slow ones

Non-preemptable jobs, it depends:

  • High variability in job sizes: many servers, ā€œthe guy with 85 itemsā€ doesn’t block the milk-and-eggs line
  • Low load: one fast server, don’t leave slower ones idle
  • Preemptible jobs: one fast server simulates slow ones

Closed vs. open systems

  • Open: arrivals independent of departures. Server upgrades help if you’re not already near bottleneck
  • Closed: fixed jobs in flight. The bottleneck device dominates; upgrading a non-bottleneck server can do nothing until you raise or fix the bottleneck. Intuition from open systems fails here

Improving doesn’t always improve throughput

If arrivals are the limit, adding service capacity raises maximum throughput, not realized throughput. ā€œComplete six assignments but the prof only gave you four.ā€ In closed/batch systems throughput tracks directly because there’s always work.

Measuring honestly [HB13]

Naive single-job benchmarks miss caching and concurrency effects:

  • Open system: ramp until completion rate flatlines, the plateau is
  • Closed system: drive think time to zero (always-on workload), measure completions/sec