Query Cost
Query cost is the DBMS’s estimate of how much work a query plan will take. Introduced in ECE459 L19. The dominant term is disk I/O; CPU time is nonzero but ignored for simplicity, as in [SKS11].
Why estimate at all?
The optimizer needs a number to compare alternative plans. Actually running a query to measure it defeats the point of planning, so we guess from metadata and statistics.
Two measurables of interest:
- Block transfers: data moved in or out of memory
- Disk seeks: repositioning on the disk
[SKS11] estimates assume the worst case: only one block per table fits in memory at a time. If reality is better, the actual cost is lower than estimated, which is the less dangerous direction to be wrong.
Things the estimate doesn’t capture
- System load (concurrent operations may block)
- Buffer contents (data already in memory skips disk ops)
- Data layout (packed data means fewer seeks, distributed data enables parallel reads)
The lowest-cost plan is not necessarily the fastest in wall-clock terms. The DB optimizer is a “hypermiler”: it minimizes disk reads, even if a path with more reads but more parallelism would finish sooner.
Metadata used in estimates [SKS11, EN11]
- : rows in table
- : blocks containing
- : bytes in , equal to blocks times block size
- : rows per block, equal to
- : distinct values of attribute in . Equals when is a key
- : height of index on
Histograms divide a value range into buckets with counts. Distributions are rarely even. For “salary ≥ 150k), interpolate within the bucket.
Stats can go stale between updates, but “close enough, more often than not” suffices. The optimization only needs to beat doing nothing.