Week 02 Notes 01

Referential Integrity

Rules given to and enforced by the DBMS that ensure values stored as foreign keys actually exist in the table where they are the primary key.

Example

A small database containing Student, Course, and Schedule tables.

Student describes students, Course courses, and Schedule the courses a student takes.

Here is an Entity Relationship Diagram (ERD) for this database -- ../demos/studentCourseScheduleERD.png

Here is the creation script – ../demos/studentCourseScheduleDDL.sql

Save to your P: drive.

Open MySQL Workbench

File | Open SQL script

Execute

After creating the tables, add a Student record and a two Course records.

Then add two Schedule records to denote that the student is scheduled for both courses.

Try to add a schedule record with an ID that does not exist in the student table. Foreign key constraint fails.

Null values

Occur under the following conditions:

There is no appropriate value for a particular field in the current record, e.g., address line 2 in a 1 line address. Possibly caused by poor database design – too many fields in a table, and can probably be worked around, e.g., use only one address field, but enter a ‘;’ in the middle of two line addresses.
The value for a particular field is unknown.

DBMS has options for how to handle

In MySQL, you can enter a ‘Default value’ for each field if you like.

Functional Dependencies

A functional dependency is a way to say the value of one field (or the concatenated values of multiple fields) uniquely determines values present in another field or fields. The field (or concatenation of fields) that determines values is called a determinant, the field whose value is dependent on the determinant is called the dependent.

We notate functional dependencies as follows:

determinant -> dependent

read “determinant determines dependent”

Functional dependencies are the way we formally document assumptions we make to answer ‘how many’ questions in the database. We call these functional dependencies, because when you give a function a particular input, you get only one output.

Example 1

Student:

ID	Name	Major
1	JimBob	Music
2	JimBob	Music
3	JimBob	Biology

We are assuming every student has a unique ID, but not a unique name.

Name is functionally dependent on ID, i.e., given an ID, this ID determines a particular name.

Notice that given a name, I cannot necessarily find a unique ID corresponding to that name (consider two different students with the same name), so ID is NOT functionally dependent on name.

To formally describe this assumption as a functional dependency, we write:

ID -> Name

Irreducibility

When the determinant of a FD is as small as it can be, we say it is irreducible. Consider

ID -> Name

ID -> Major

Notice:

ID, Name -> Name

ID, Name -> Major

(the left side for these FDs is the composite field ID concatenated with Name)

are also FDs, but they are reducible.

ID, Name -> Name, but also, from above, ID -> Name

So, the Name field is not needed as part of the determinant.

If there are multiple determinants (left side) that determine the same dependents (right side), the determinants that have no proper subsets are irreducible.

A Note on Notation

As a shortcut, we often write “ID -> Name, Major”, read ID determines Name and Major, instead of writing

ID -> Name

ID -> Major

Note: two fields appearing on the left of the -> operator does not mean the same thing as two fields appearing on the right of this operator. In the reducible example above, we might have written:

ID, Name -> Name, Major

Meaning the composite field ID concatenated with Name determines both the fields Name and Major.

On the left side, the comma means concatenate, but on the right side, the comma is a list separator.

I will avoid writing commas on the right side whenever possible, because an FD with a comma on the right side can always be written as two FDs without commas on the right sides.

Example 2

Consider the following Student relation:

Name	EmergencyContact	Major
Doe, Jane	Austen, Jane	Music
Doe, Jimbob	Austen, Jane	Music
Doe, Jimbob	Beam, Jim	Biology

Since this is a relation, we know Name -> EmergencyContact is FALSE

We also know EmergencyContact -> Name is FALSE.

Name, EmergencyContact -> Major is TRUE

When we write this FD, we are stating the combination of Name and EmergencyContact uniquely identifies a Major, and since there are no other fields in this relation, Name and EmergencyContact uniquely identify an instance of Student, so Name, EmergencyContact is a candidate key for this version of Student

Notice given the data above it looks like we could write: Major -> EmergencyContact

If we do this, we are stating all Students with the same Major have the same EmergencyContact, which is probably not want we want, but it does hold in this example!

So, just because a FD holds on a particular set of data, if we don’t want it to hold in general we need to generate sample rows that eliminate it as a possibility, e.g.,

Doe, Tim

Beam, Jim

Music

Normalization

In order to prevent duplicate data – week01notes01 -- good database design produces a set of relations where each relation contains data for one theme (or entity). Any particular theme will likely contain references to other themes, e.g., Student refers to Address.

In a good DB design, all relations are well-formed from the outset, and you will learn to think instinctively in terms of well-formed relations.

Normalization is the process of creating well-formed relations. This is an important tool for structuring previously unstructured data.

A note on notation

In the book’s notation for a relation, e.g., ANIMAL(name, owner, breed, sex)

Underlined fields are the primary key. If more than one field is underlined, the primary key is a composite key.

A composite key means there may be duplicate values in one of the fields that makes up the key. A legal relation with the schema shown above would be:

Name	Owner	Breed	sex
Fido	Jimbob	Lab	M
Feora	Jimbob	siamese	F

Well-formed relations

In a well formed relation:

1. Every determinant must be a candidate key

2. Any relation that is not well formed must be broken into two or more relations that are well formed.

Normalization algorithm

To make a set of well-formed relations from an unstructured relation:

Make reasonable assumptions about data dependencies and document these assumptions.
Identify candidate keys
Identify other irreducible functional dependencies (FDs)
For each FD whose determinant is not a candidate key:

· place all attributes of the FD (determinant and dependents) in their own relation and give this new relation a name. Make the determinant the key for this new relation.

· remove the dependents of the FD from the original relation, but leave the determinant in the original relation as a foreign key to the new relation

· create a referential integrity constraint between the foreign key in original relation and the key of new relation

Notes:

1. The algorithm above is slightly different than what is shown on p. 68 of the book. The book scatters the ‘assumptions’ step throughout the other steps

Example:

Use the well-formed relation algorithm to create well-formed relations from the VetOffice table shown in figure 1-26 in the text (p. 47)

Here is a link to this figure: Kroenke5eVetOfficeV1.png

Make reasonable assumptions and write them down:

· Owners do not ‘overload’ pet names, i.e., no owner will give two of his/her pets the same name.

· Database stores only one email and one phone per owner. (Note, two different owners may share an email or phone.)

· (OwnerLastName, OwnerFirstName) will uniquely identify the owner (it is a smallish veterinarian’s office.)

· Dogs and Cats do not share Breed names

Identify Candidate Keys given the assumptions:

(PetName, OwnerLastName, OwnerFirstName)

Is the only candidate key I could find, so it is the primary key for the VetOffice table.

Remember: Candidate keys determine every other attribute, and are irreducible.

Each of these is irreducible and determines all other fields.

Notice:

1. The algorithm does not necessarily choose a primary key for the ‘original’ relation, it only identifies candidate keys. If there are multiple candidate keys for the original relation, choosing a primary key from among the candidate keys helps.

2. The original relation stores data for both owners and pets – two different entities or themes.

VetOffice(PetName, PetType, PetBreed, PetDOB, OwnerLastName, OwnerFirstName, OwnerPhone, OwnerEmail)

Identify non-candidate key FDs:

PetBreed -> PetType

OwnerLastName, OwnerFirstName -> OwnerEmail, OwnerPhone

Walk non-candidate key FDs to create new relations

1. PetBreed is not a candidate key in VetOffice, so

Create relation Breed(PetBreed, PetType)

Leave PetBreed in VetOffice relation

The VetOffice relation now has the following definition:

VetOffice(PetName, PetBreed, PetDOB, OwnerLastName, OwnerFirstName, OwnerPhone, OwnerEmail)

Make referential integrity constraint:

VetOffice.PetBreed refers to Breed.PetBreed

2. (OwnerLastName, OwnerFirstName) is not a candidate key in VetOffice, so

Create relation Owner(OwnerLastName, OwnerFirstName, OwnerPhone, OwnerEmail)

Leave (OwnerLastName, OwnerFirstName) in VetOffice table

The VetOfficeTable now has the following definition:

VetOffice(PetName, PetBreed, PetDOB, OwnerLastName, OwnerFirstName)

Make referential integrity constraint:

VetOffice.OwnerLastName refers to Owner.OwnerLastName

VetOffice.OwnerFirstName refers to Owner.OwnerFirstName

Example 2:

Suppose we are given:

a relation, R, that has attributes A, B, C, D, E.

The following functional dependencies that hold in R:

A,E -> B,C,D

A -> B

E -> C,D

Create a set of well-formed relations from R.

Step 1 – We have no idea what the theme described by R is or what any of the attributes in R ‘mean’. Therefore, we can make no assumptions about the data.

Step 2 – A candidate key is a functional dependency where some irreducible set of attributes determines all the other attributes in the relation. The FD: A,E -> B,C,D meets this criteria, so it is a candidate key. It is also the only candidate key – see step 3 below.

We can now write R as: R(A, B, C, D, E)

Step 3: The FDs A -> B and E -> C, D are not candidate keys, because their determinants do not determine all of the attributes in R. Their determinants are irreducible, so they each qualify as one ‘other’ FD’s identified in this step.

Step 4.1: For A -> B:

· Make a new relation RA(A, B)

· Remove B from R, so R is now:

R(A, C, D, E) where A is a foreign key to RA

· Make a referential integrity constraint from R.A to RA.A. We show a referential integrity constraint using italics. R is now:

R(A, C, D, E) A is FK to RA.A

Step 4.2: For E -> C,D:

· Make a new relation RE(E, C, D)

· Remove C and D from R, so R is now:

R(A, E) where E is a foreign key to RE

· Make a referential integrity constraint from R.E to RE.E. R is now:

R(A, E) A is FK to RA.A, E is FK to RE.E

The final set of relations is:

R(A, E) A is FK to RA.A, E is FK to RE.E

RA(A, B)

RE(E, C, D)