Database Sharding

Database sharding allows Horizontal Scaling.

Sharding involves separating different rows of information from the table and storing them on different machines.

Sharding disperses data across various databases or servers, while partitioning segregates data within a single database instance into subsets.

Partitioning is used to improve data management and performance optimization within a database.

Resources

https://www.pingcap.com/article/sharding-vs-partitioning-a-detailed-comparison/

Why sharding?

Database sharding is needed to address scalability, performance, and availability challenges in large-scale systems/

As the size of a database grows, so does the time it takes to perform certain operations (e.g., indexing, querying, and updating records). Additionally, a single database server may struggle to handle a large volume of concurrent read and write operations.

From SE464:

Pros of sharding

Because each row is stored once:

✓Capacity scales.

✓Data is consistent.

If sharding key is chosen carefully:

✓Data will be balanced.

✓Many queries will involve only one or a few shards. There is no central bottleneck for these.

Cons

✘Cannot use plain SQL.

✘Queries must be manually adapted to match sharding.

✘If sharding key is chosen poorly, shard load will be imbalanced, either by capacity or traffic.

✘Some queries will involve all the shards. The capacity for handling such queries is limited by each single machine’s speed.

🛠️ Steven Gong

Database Sharding

Graph View

Backlinks