Database
A database is an organized collection of data stored and accessed electronically. Formally studied this through CS348.
Two main types of database:
Database
A large and persistent collection of metadata and data organized in a way that facilitates efficient retrieval and revision.
Database Management System (DBMS)
A DBMS is a set of programs that implements a data model to manage a database.
A DBMS manages two kinds of information: metadata and data.
- metadata tells you how the data is organized
- Ex: There are employee entities that have a name and a salary.
- data is the actual data that you want to store
- Ex: Mary Smith is an employee who earns $92,000 per year.
Data Model
A data model determines the nature of the metadata and how retrieval and revision is expressed. See Data Model.
Benefits that Database Systems bring (slides 10-11 of chapter 1)
- Reliability
- Concurrency
- Data Integrity (Integrity Constraint)
- Security: Restrictions on who can access and update
- Productivity
- Data is Structured
General properties of a DBMS
- A DBMS adopts some data model for managing structured data via an interface with two sub-languages: a DDL, and a DML
- A DBMS supports physical and logical Data Independence
- A DBMS supports concurrent data manipulation (through Transactions)
- A DBMS guarantees data is reliably recorded and can be recovered in case of hardware or software failure (through Transactions)
- A DBMS provides access control to information via data access permissions relating to users and roles
- A DBMS provides utilities for database monitoring and maintenance.
- A DBMS supports a variety of users
Fundamentally
Fundamental in codifying the three big ideas underlying a DBMS:
- physical data independence,
- data manipulation that is declarative, and
- interaction via transactions.
Why are databases hard?
- Data redundancy and inconsistency
- Concurrent-access anomalies
- Security issues
Other misc. learnings
From SE464
Why not just use filesystems as database?
Filesystems don’t give us Analysis, integrity and deduplication properties. Also, shortcomings:
- Indexing – efficient access in just one dimension – the path/filename.
- Concurrency – multiple apps can read/write, but lacks transaction