Data Independence
Physical Data Independence
A property of application code devoid of any mention of the physical schema.
- “DBA (database administration) is free to replace or augment the PAYROLL emp-file B-tree without any need for PAT to revise PAYROLL source code”
Logical Data Independence
A property of pairs of external schema in which there is no common logical metadata.
- “PAT can work on developing PAYROLL without any need to learn about INVENTORY or ONLINE-ORDER.”
Data Dependency
Functional Dependency
Superkey
Given a relation with attributes and subset , is a superkey of if the FD holds on any instance of .
Candidate Key
Given a relation with attributes and subset , is a candidate key of if the FD holds on any instance of , and no strict subset of is a superkey.
“A candidate key is a minimal superkey“.
Primary Key
Given a relation with attributes the primary key of is a distinguished candidate key of chosen by the DBA group.
Superkey Theorem for Functional Dependencies
is a superkey of a relation with schema R if and only if .
I don’t really understand this From ChatGPT, they say that
- A superkey is a set of one or more attributes (columns) that can be used to uniquely identify a tuple (row) in a relation.
Explanation by ChatGPT
Explanation 1
Superkey: In a relational database, a superkey is a set of one or more columns (attributes) that can uniquely identify a row in a table. The uniqueness is an important aspect: no two rows can have the same combination of values for the columns that form a superkey.
Let’s say we have a student table with attributes (columns): {StudentID, FirstName, LastName, DOB (date of birth), Email}. Here are a few examples of superkeys:
- {StudentID}
- {StudentID, FirstName}
- {StudentID, LastName, DOB}
- {Email}
In all these cases, the values in the set of columns will be unique across all rows of the table.
Candidate Key: A candidate key is a minimal superkey, meaning it’s a set of columns that can still uniquely identify a row, but you can’t remove any columns from it without losing that property. In other words, no proper subset of a candidate key is a superkey.
From the previous example, the candidate keys would be:
- {StudentID}
- {Email}
While {StudentID, FirstName} is a superkey, it’s not a candidate key because we can remove FirstName and still have a set of columns ({StudentID}) that can uniquely identify a row.
Primary Key:
A primary key is simply a candidate key that has been chosen (typically by the database designer or administrator) to be the main way of identifying records in a table. In a given table, you can have multiple candidate keys, but usually only one of them will be selected as the primary key.
In our student table, we might choose {StudentID} to be the primary key, because it’s likely the most convenient way to identify students. Email could also uniquely identify a student, but StudentID is likely a better choice because it’s shorter, guaranteed not to change over time, and doesn’t contain any special characters.
I hope this helps make these definitions a little clearer! It’s important to understand that keys in a relational database are all about uniquely identifying records, and these different types of keys are just different approaches to doing that.
Example 2
Sure, let’s take an example of a Student table in a university database. The Student table has the following attributes:
- StudentID
- FirstName
- LastName
- DateOfBirth
- Address
Here, the StudentID and Email can uniquely identify a student in the database because these values are unique to each student.
So, {StudentID}, {Email} are candidate keys for this Student table.
A superkey could be any combination of attributes that includes the StudentID or Email. For instance:
- {StudentID, FirstName}
- {StudentID, LastName, DateOfBirth}
- {Email, Address}
- {StudentID, Email, FirstName, LastName, DateOfBirth, Address}
All of these are superkeys because they include at least one attribute (either StudentID or Email) that can uniquely identify a student in the table. However, they’re not candidate keys because they include extra attributes that are not necessary for unique identification.
In practice, the university might choose StudentID as the primary key for the Student table because it is short, numeric, and easy to use in queries.