NoSQL

Learned in SE464.

Review of sharding: splits data across machines, accepts writes on all machines, but partitioning is manual and works well only for queries handled within a single shard. If we keep the relational model with normalized data, many queries still involve all nodes, so scaling is limited.

NoSQL databases solve this by denormalizing data: duplicating it so that queries can be answered by a single node.

NoSQL

Removing the ability to create references gives us a NoSQL database. Instead of following references with JOINs, we store denormalized data with copies of referenced data.

NoSQL DBs are key-value stores. Hashing is the basis of distributed NoSQL; see Distributed Hash Table.

Downsides

Just one indexed column (the key), because the index is built with hash-based partitioning
Denormalized data is duplicated
- Wastes space
- Cannot be edited in one place (e.g., “Greater Seattle Area” repeated in many user profiles instead of region: 91)
References are possible, but:
- Following the reference requires another query, probably to another node
- No constraint checking (refs can become invalid after delete)

Recap

Data partitioning is necessary to divide write load among nodes
SQL sharding was a special case of data partitioning, done in app code
NoSQL databases make partitioning easy by eliminating references
Without references, data becomes denormalized; duplicated data consumes more space and can become inconsistent
Very scalable, but provides only a simple key-value abstraction

🛠️ Steven Gong

Table of Contents

NoSQL

Downsides

Recap

Graph View

Backlinks

🛠️ Steven Gong

Table of Contents

NoSQL

Downsides

Recap

Related

Graph View

Backlinks