Rules given to and enforced by the DBMS that ensure values stored as foreign keys actually exist in the table where they are the primary key.
A small database containing Student, Course, and Schedule tables.
Student describes students, Course courses, and Schedule the courses a student takes.
Here is an Entity Relationship Diagram (ERD) for this database -- ../demos/studentCourseScheduleERD.png
Here is the creation script – ../demos/studentCourseScheduleDDL.sql
Save to your P: drive.
Open MySQL Workbench
File | Open SQL script
Execute
After creating the tables, add a Student record and a two Course records.
Then add two Schedule records to denote that the student is scheduled for both courses.
Try to add a schedule record with an ID that does not exist in the student table. Foreign key constraint fails.
In MySQL, you can enter a ‘Default value’ for each field if you like.
A functional dependency is a way to say the value of one field (or the concatenated values of multiple fields) uniquely determines values present in another field or fields. The field (or concatenation of fields) that determines values is called a determinant, the field whose value is dependent on the determinant is called the dependent.
We notate functional dependencies as follows:
determinant -> dependent
read “determinant determines dependent”
Functional dependencies are the way we formally document assumptions we make to answer ‘how many’ questions in the database. We call these functional dependencies, because when you give a function a particular input, you get only one output.
Student:
ID |
Name |
Major |
1 |
JimBob |
Music |
2 |
JimBob |
Music |
3 |
JimBob |
Biology |
We are assuming every student has a unique ID, but not a unique name.
Name is functionally dependent on ID, i.e., given an ID, this ID determines a particular name.
Notice that given a name, I cannot necessarily find a unique ID corresponding to that name (consider two different students with the same name), so ID is NOT functionally dependent on name.
To formally describe this assumption as a functional dependency, we write:
ID -> Name
When the determinant of a FD is as small as it can be, we say it is irreducible. Consider
ID -> Name
ID -> Major
Notice:
ID, Name -> Name
ID, Name -> Major
(the left side for these FDs is the composite field ID concatenated with Name)
are also FDs, but they are reducible.
ID, Name -> Name, but also, from above, ID -> Name
So, the Name field is not needed as part of the determinant.
If there are multiple determinants (left side) that determine the same dependents (right side), the determinants that have no proper subsets are irreducible.
As a shortcut, we often write “ID -> Name, Major”, read ID determines Name and Major, instead of writing
ID -> Name
ID -> Major
Note: two fields appearing on the left of the -> operator does not mean the same thing as two fields appearing on the right of this operator. In the reducible example above, we might have written:
ID, Name -> Name, Major
Meaning the composite field ID concatenated with Name determines both the fields Name and Major.
On the left side, the comma means concatenate, but on the right side, the comma is a list separator.
I will avoid writing commas on the right side whenever possible, because an FD with a comma on the right side can always be written as two FDs without commas on the right sides.
Consider the following Student relation:
Name |
EmergencyContact |
Major |
Doe, Jane |
Austen, Jane |
Music |
Doe, Jimbob |
Austen, Jane |
Music |
Doe, Jimbob |
Beam, Jim |
Biology |
Since this is a relation, we know Name -> EmergencyContact is FALSE
We also know EmergencyContact -> Name is FALSE.
Name, EmergencyContact -> Major is TRUE
When we write this FD, we are stating the combination of Name and EmergencyContact uniquely identifies a Major, and since there are no other fields in this relation, Name and EmergencyContact uniquely identify an instance of Student, so Name, EmergencyContact is a candidate key for this version of Student
Notice given the data above it looks like we could write: Major -> EmergencyContact
If we do this, we are stating all Students with the same Major have the same EmergencyContact, which is probably not want we want, but it does hold in this example!
So, just because a FD holds on a particular set of data, if we don’t want it to hold in general we need to generate sample rows that eliminate it as a possibility, e.g.,
Doe, Tim |
Beam, Jim |
Music |
In order to prevent duplicate data – week01notes01 -- good database design produces a set of relations where each relation contains data for one theme (or entity). Any particular theme will likely contain references to other themes, e.g., Student refers to Address.
In a good DB design, all relations are well-formed from the outset, and you will learn to think instinctively in terms of well-formed relations.
Normalization is the process of creating well-formed relations. This is an important tool for structuring previously unstructured data.
In the book’s notation for a relation, e.g., ANIMAL(name, owner, breed, sex)
Underlined fields are the primary key. If more than one field is underlined, the primary key is a composite key.
A composite key means there may be duplicate values in one of the fields that makes up the key. A legal relation with the schema shown above would be:
Name |
Owner |
Breed |
sex |
Fido |
Jimbob |
Lab |
M |
Feora |
Jimbob |
siamese |
F |
In a well formed relation:
1. Every determinant must be a candidate key
2. Any relation that is not well formed must be broken into two or more relations that are well formed.
To make a set of well-formed relations from an unstructured relation:
· place all attributes of the FD (determinant and dependents) in their own relation and give this new relation a name. Make the determinant the key for this new relation.
· remove the dependents of the FD from the original relation, but leave the determinant in the original relation as a foreign key to the new relation
· create a referential integrity constraint between the foreign key in original relation and the key of new relation
Notes:
1. The algorithm above is slightly different than what is shown on p. 68 of the book. The book scatters the ‘assumptions’ step throughout the other steps
Use the well-formed relation algorithm to create well-formed relations from the VetOffice table shown in figure 1-26 in the text (p. 47)
Here is a link to this figure: Kroenke5eVetOfficeV1.png
· Owners do not ‘overload’ pet names, i.e., no owner will give two of his/her pets the same name.
· Database stores only one email and one phone per owner. (Note, two different owners may share an email or phone.)
· (OwnerLastName, OwnerFirstName) will uniquely identify the owner (it is a smallish veterinarian’s office.)
· Dogs and Cats do not share Breed names
(PetName, OwnerLastName, OwnerFirstName)
Is the only candidate key I could find, so it is the primary key for the VetOffice table.
Remember: Candidate keys determine every other attribute, and are irreducible.
Each of these is irreducible and determines all other fields.
Notice:
1. The algorithm does not necessarily choose a primary key for the ‘original’ relation, it only identifies candidate keys. If there are multiple candidate keys for the original relation, choosing a primary key from among the candidate keys helps.
2. The original relation stores data for both owners and pets – two different entities or themes.
VetOffice(PetName, PetType, PetBreed, PetDOB, OwnerLastName, OwnerFirstName, OwnerPhone, OwnerEmail)
PetBreed -> PetType
OwnerLastName, OwnerFirstName -> OwnerEmail, OwnerPhone
1. PetBreed is not a candidate key in VetOffice, so
Create relation Breed(PetBreed, PetType)
Leave PetBreed in VetOffice relation
The VetOffice relation now has the following definition:
VetOffice(PetName, PetBreed, PetDOB, OwnerLastName, OwnerFirstName, OwnerPhone, OwnerEmail)
Make referential integrity constraint:
VetOffice.PetBreed refers to Breed.PetBreed
2. (OwnerLastName, OwnerFirstName) is not a candidate key in VetOffice, so
Create relation Owner(OwnerLastName, OwnerFirstName, OwnerPhone, OwnerEmail)
Leave (OwnerLastName, OwnerFirstName) in VetOffice table
The VetOfficeTable now has the following definition:
VetOffice(PetName, PetBreed, PetDOB, OwnerLastName, OwnerFirstName)
Make referential integrity constraint:
VetOffice.OwnerLastName refers to Owner.OwnerLastName
VetOffice.OwnerFirstName refers to Owner.OwnerFirstName
Suppose we are given:
a relation, R, that has attributes A, B, C, D, E.
The following functional dependencies that hold in R:
A,E -> B,C,D
A -> B
E -> C,D
Create a set of well-formed relations from R.
Step 1 – We have no idea what the theme described by R is or what any of the attributes in R ‘mean’. Therefore, we can make no assumptions about the data.
Step 2 – A candidate key is a functional dependency where some irreducible set of attributes determines all the other attributes in the relation. The FD: A,E -> B,C,D meets this criteria, so it is a candidate key. It is also the only candidate key – see step 3 below.
We can now write R as: R(A, B, C, D, E)
Step 3: The FDs A -> B and E -> C, D are not candidate keys, because their determinants do not determine all of the attributes in R. Their determinants are irreducible, so they each qualify as one ‘other’ FD’s identified in this step.
Step 4.1: For A -> B:
· Make a new relation RA(A, B)
· Remove B from R, so R is now:
R(A, C, D, E) where A is a foreign key to RA
· Make a referential integrity constraint from R.A to RA.A. We show a referential integrity constraint using italics. R is now:
R(A, C, D, E) A is FK to RA.A
Step 4.2: For E -> C,D:
· Make a new relation RE(E, C, D)
· Remove C and D from R, so R is now:
R(A, E) where E is a foreign key to RE
· Make a referential integrity constraint from R.E to RE.E. R is now:
R(A, E) A is FK to RA.A, E is FK to RE.E
The final set of relations is:
R(A, E) A is FK to RA.A, E is FK to RE.E
RA(A, B)
RE(E, C, D)