Data Partitioning

Learned in SE464.

Data partitioning divides by rows: this is database sharding when done manually at the application layer.

Data partitioning is a more general approach than functional partitioning: instead of splitting by tables, rows are distributed across nodes.

Because each row is stored once:

If the sharding key is chosen carefully:

✘ Cannot use plain SQL; queries must be manually adapted to match sharding
✘ If the sharding key is chosen poorly, shard load will be imbalanced (by capacity or traffic)
✘ Some queries will involve all shards, so their throughput is limited by a single machine

Partitioning should minimize references between partitions. It can be treated as a graph partitioning problem.

🛠️ Steven Gong