Query Optimization

Query optimization is the database’s runtime process of picking a good plan for executing a SQL query. Introduced in ECE459 L19, framed as the three steps of running any query [SKS11]:

Parse and translate the SQL into relational algebra
Optimize, pick how to carry out the query
Evaluate, execute the chosen plan

Why at runtime?

The cheapest path depends on data the optimizer can only see now: table sizes, existing indexes, data distribution. A fixed compile-time plan can’t adapt.

Inputs and outputs

SQL becomes relational algebra as query blocks, one SELECT ... FROM ... WHERE ... (+ GROUP BY / HAVING). Nested queries are separate blocks and don’t need the same plan
An evaluation primitive is an algebraic op plus annotations (which algorithm, which index)
A query execution plan is a sequence of primitives
The optimizer is a heuristic chooser, not a true optimum-finder, it trades accuracy for time

Why this is hard: join ordering

The optimizer generates equivalent plans via algebraic rewrites (commute/associate joins, push selections down) and picks the cheapest. The combinatorial explosion is dominated by join order:

$orderings (n) = \frac{( 2 ( n - 1 ))!}{( n - 1 )!}$

where:

$n$ is the number of relations being joined

For $n = 5$ , that’s 1680 orderings. Symmetry trims it a bit, but factorial growth still wins. So the optimizer can’t enumerate exhaustively and must itself be cheap.

Heuristics that make it tractable

Push selection down: filter tuples before they feed anything else
Push projection down: drop unused columns early, but not before a join that needs them
Dynamic programming on join subsets: memoize the best order for each group of relations, so a 5-way join reuses answers from 3-way subproblems. Locally optimal, not globally
Interesting sort orders: a subresult already sorted on a later join’s attribute is cheaper to reuse, so the optimizer tracks sortedness as part of the plan state
Common-subexpression reuse keeps the plan representation small [SKS11]
Time-bound the optimizer itself: cheap query skips further search, expensive query gets more budget
Plan caching: identical query shapes (different bind values) reuse prior plans, with observed runtime feeding back into estimates

"Push selection down" isn't always right

If r is small, only s’s columns are wanted, and s has an index on the join attribute but not the selection attribute, early selection on s forces a full scan instead of an index lookup. Heuristics are defaults, not laws.

Broader lesson

This isn’t just databases. Programmatically generate, evaluate, and choose alternative plans at runtime whenever performance depends on data-dependent choice.

🛠️ Steven Gong

Table of Contents

Query Optimization

Inputs and outputs

Why this is hard: join ordering

Heuristics that make it tractable

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

Query Optimization

Inputs and outputs

Why this is hard: join ordering

Heuristics that make it tractable

Related

Graph View

Backlinks