Data Partitioning
Learned in SE464.
Data partitioning divides by rows: this is database sharding when done manually at the application layer.
Data partitioning is a more general approach than functional partitioning: instead of splitting by tables, rows are distributed across nodes.
Pros
Because each row is stored once:
- ✓ Capacity scales
- ✓ Data is consistent
If the sharding key is chosen carefully:
- ✓ Data will be balanced across nodes
- ✓ Many queries will involve only one or a few shards (no central bottleneck)
Cons
- ✘ Cannot use plain SQL; queries must be manually adapted to match sharding
- ✘ If the sharding key is chosen poorly, shard load will be imbalanced (by capacity or traffic)
- ✘ Some queries will involve all shards, so their throughput is limited by a single machine
Partitioning should minimize references between partitions. It can be treated as a graph partitioning problem.