Rate Limiting

Rate limiting caps the number of requests a client can send to a service in a given time window. Requests above the cap are rejected, typically with HTTP 429 Too Many Requests. Covered in ECE459 L16.

Why do services limit us?

Every request has a cost: literal money (cloud CPU), opportunity cost (resources diverted from legitimate traffic), and exposure (DoS/DDoS, scraping). A rate limit is the blunt instrument that keeps costs bounded.

Real limits can be multi-threshold (A/hr or B/day) and segmented by request type. Responses do not have to be outright rejection: a service can delay or deprioritize instead.

Why limits exist

Cost per request: literal money (cloud CPU) or opportunity cost (resources taken from legitimate requests)
DoS / DDoS mitigation: many invalid requests can crash a service, exhaust resources, or make it too slow to use
Scraping prevention: early-2021 Parler was scraped easily because posts had sequential IDs and there was no rate limit. The same concern applies to sites being scraped for LLM training

War stories from lecture

Payment processor called our service via webhook, then rate-limited us when we validated the webhook back. HTTP 429

An Ontario-wide letter-writing campaign at a climbing gym got the gym’s IP rate-limited because many people sent letters from the same wifi

Unity’s 2023 per-install fee proposal raised an “install-bomb” DoS risk and was walked back

Dealing with limits (client side)

L16 rejects “nothing we can do” as defeatist. A candidate who gave this answer in a 2022 interview did not get an offer. Options:

Do less work: eliminate redundant calls, found in review
Caching: remember previous answers. Use write-through or write-back for updates, or a Redis sidecar. Domain knowledge helps (an exchange-rate quote valid for 20 min can be cached for 20 min)
Group up: combine multiple requests into one (batch update of 5 employees). Needs server support. Complicates error handling when one item in the batch fails
Patience (queue): add to a queue and drain at a controlled rate. Better delayed than denied. Rust’s ratelimit crate:
```
let rl = Ratelimiter::builder(1000, Duration::from_secs(3600))
    .max_tokens(1000).initial_available(1000).build().unwrap();
```
Also consider moving batch work to off-peak hours (invoicing overnight)
Roll persuasion: pay for a higher tier, or negotiate a higher limit

When it still happens

Honour the retry-after time if given
Otherwise use exponential backoff: wait, retry, wait longer, repeat with a cap on max retries
Jitter matters. Without it, all clients retry at the same tick and keep colliding. Adding randomness (X+7 vs X+9) spreads them out. Crossbeam’s backoff implementation does not include jitter
Exponential backoff with jitter fits many-independent-clients scenarios. A single client hammering one resource is more like TCP congestion control [Aeo19]

🛠️ Steven Gong

Table of Contents

Rate Limiting

Why limits exist

Dealing with limits (client side)

When it still happens

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

Rate Limiting

Why limits exist

Dealing with limits (client side)

When it still happens

Related

Graph View

Backlinks