Queueing Theory

The mathematics of waiting lines. Given an arrival process, a service process, and some number of servers, predicts steady-state metrics: average queue length, wait time, utilization. From ECE459 L31.

Why care?

Shows up everywhere queues do: call centres, routers, supermarket tills, CPU schedulers. It’s how telcos decide staffing levels. The non-obvious payoff: the latency-vs-utilization curve is a hockey stick, not a line, which is why over-provisioning isn’t wasteful but is literally the cost of predictable latency.

Kendall notation: A/S/c/K/N/D

A: arrival process (M = Markov/Poisson, D = deterministic, G = general)
S: service-time distribution
c: number of servers
K: queue capacity (default ∞)
N: population size (default ∞)
D: scheduling discipline (default FIFO)

Common models: 1, k, M/M/1/K, M/G/1.

Key quantities

$λ$ : arrival rate
$μ$ : service rate per server
$ρ = λ / (c μ)$ : utilization, must be < 1 for stability
$L$ : average number of items in system
$W$ : average time in system
Little’s Law: $L = λW$ , holds for any stable system

Vocabulary [Liu09]

Server (teller), customer (requester)
Wait time (in line), service time (at teller), response time = wait + service
Residence time: response across multiple visits
Throughput: rate of completed requests

Stability: $λ \leq μ$

If $λ > μ$ , queue length grows without bound [HB13]: $E [N (t)] = E [A (t)] - E [D (t)] \geq λ t - μ t = t (λ - μ)$

Averages, not instant by instant. Short overshoots recover.

The hockey stick

Plot $W$ vs $ρ$ . Linear up to $ρ \approx 0.7$ , vertical past $0.9$ . A server at 99% utilization queues ~100× longer than at 90%. Production systems target $ρ \in [0.5, 0.7]$ .

Doubling both $\lambda$ and $\mu$ halves response time

Not unchanged, halved. If the boss doubles arrival rate, the CPU doesn’t need to be 2× faster to preserve response time, it needs less. $W = 1/ (μ - λ)$ is sensitive to the gap, not the ratio.

One fast server vs. many slow ones

Non-preemptable jobs, it depends:

High variability in job sizes: many servers, “the guy with 85 items” doesn’t block the milk-and-eggs line
Low load: one fast server, don’t leave slower ones idle
Preemptible jobs: one fast server simulates $n$ slow ones

Closed vs. open systems

Open: arrivals independent of departures. Server upgrades help if you’re not already near bottleneck
Closed: fixed $N$ jobs in flight. The bottleneck device dominates; upgrading a non-bottleneck server can do nothing until you raise $N$ or fix the bottleneck. Intuition from open systems fails here

Improving $μ$ doesn’t always improve throughput

If arrivals are the limit, adding service capacity raises maximum throughput, not realized throughput. “Complete six assignments but the prof only gave you four.” In closed/batch systems throughput tracks $μ$ directly because there’s always work.

Measuring $μ$ honestly [HB13]

Naive single-job benchmarks miss caching and concurrency effects:

Open system: ramp $λ$ until completion rate flatlines, the plateau is $μ$
Closed system: drive think time to zero (always-on workload), measure completions/sec

1
k
Queuing Network Model
Queueing Complications (balking, reneging, priority, FastPass)
Little’s Law
Load Balancing

🛠️ Steven Gong

Table of Contents

Queueing Theory

Kendall notation: A/S/c/K/N/D

Key quantities

Vocabulary [Liu09]

Stability: $λ \leq μ$

The hockey stick

One fast server vs. many slow ones

Closed vs. open systems

Improving $μ$ doesn’t always improve throughput

Measuring $μ$ honestly [HB13]

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

Queueing Theory

Kendall notation: A/S/c/K/N/D

Key quantities

Vocabulary [Liu09]

Stability: λ≤μ

The hockey stick

One fast server vs. many slow ones

Closed vs. open systems

Improving μ doesn’t always improve throughput

Measuring μ honestly [HB13]

Related

Graph View

Backlinks

Stability: $λ \leq μ$

Improving $μ$ doesn’t always improve throughput

Measuring $μ$ honestly [HB13]