Query Optimization
Query optimization is the database’s runtime process of picking a good plan for executing a SQL query. Introduced in ECE459 L19, framed as the three steps of running any query [SKS11]:
- Parse and translate the SQL into relational algebra
- Optimize, pick how to carry out the query
- Evaluate, execute the chosen plan
Why at runtime?
The cheapest path depends on data the optimizer can only see now: table sizes, existing indexes, data distribution. A fixed compile-time plan can’t adapt.
Inputs and outputs
- SQL becomes relational algebra as query blocks, one
SELECT ... FROM ... WHERE ...(+GROUP BY/HAVING). Nested queries are separate blocks and don’t need the same plan - An evaluation primitive is an algebraic op plus annotations (which algorithm, which index)
- A query execution plan is a sequence of primitives
- The optimizer is a heuristic chooser, not a true optimum-finder, it trades accuracy for time
Why this is hard: join ordering
The optimizer generates equivalent plans via algebraic rewrites (commute/associate joins, push selections down) and picks the cheapest. The combinatorial explosion is dominated by join order:
where:
- is the number of relations being joined
For , that’s 1680 orderings. Symmetry trims it a bit, but factorial growth still wins. So the optimizer can’t enumerate exhaustively and must itself be cheap.
Heuristics that make it tractable
- Push selection down: filter tuples before they feed anything else
- Push projection down: drop unused columns early, but not before a join that needs them
- Dynamic programming on join subsets: memoize the best order for each group of relations, so a 5-way join reuses answers from 3-way subproblems. Locally optimal, not globally
- Interesting sort orders: a subresult already sorted on a later join’s attribute is cheaper to reuse, so the optimizer tracks sortedness as part of the plan state
- Common-subexpression reuse keeps the plan representation small [SKS11]
- Time-bound the optimizer itself: cheap query skips further search, expensive query gets more budget
- Plan caching: identical query shapes (different bind values) reuse prior plans, with observed runtime feeding back into estimates
"Push selection down" isn't always right
If
ris small, onlys’s columns are wanted, andshas an index on the join attribute but not the selection attribute, early selection onsforces a full scan instead of an index lookup. Heuristics are defaults, not laws.
Broader lesson
This isn’t just databases. Programmatically generate, evaluate, and choose alternative plans at runtime whenever performance depends on data-dependent choice.