Week 04 Notes 01 – Joins

Subqueries work as long as selected results come from a single table, but not if results must come from more than one table.  Enter the join operation.

 

A join is the way to join two tables together.  The technique is based on the Cartesian product algorithm.

Cartesian Product

A Cartesian product of two tables results in a third table containing all the columns of both tables.

 

The data in the Cartesian product is defined as follows:

 

For each row in table1

            For each row in table2

                        Create a new row in the result table

 

Example:

select *

from Student

 

select *

from Course

 

select *

from Student, Course

 

Note: 5 rows * 3 rows == 15 rows in the product.

Join operation

Usually, we want only row combinations where two fields that have the same ‘meaning’ have the same value.

 

Select <FieldNames>

From <tableNames>

Where <Table1>.<Field1> = <Table2>.<Field2>

 

Joins are computed as follows:

For each row in table1

            For each row in table2 where (value of table1.joinfield == value of table2.joinfield)

                        Create a new row in the result table

Examples

../demos/queryExamples.sql  contains several examples showing the techniques we have been talking about. 

../demos/maxEnrollmentQueriesDemo.sql

 

Here is a discussion of the

StudentScheduleJoin query, which shows student IDs and course IDs for students who are taking courses.

 

SELECT S.*, H.*

FROM Student AS S, Schedule AS H

WHERE S.id=H.id;

 

Result is obtained as follows:

id for first record in Student is 0

There are no records in Schedule where studentID == 0, so do not output any rows

 

id for 2nd record in Student is 1

1st record in Schedule has StudentID == 1, so output row:

            1, Jane, 1, 0 (Record 2 from Student X Record 1 from Schedule)

2nd record in Schedule also has StudentID == 1, so output row:

            1, Jane, 1, 1 (Record 2 from Student X Record 2 from Schedule)

No other records in Schedule have StudentID == 1, so move on.

 

Id for 3rd record in student is 2

3rd record in Schedule has StudentID == 2, so output row:

            2, Herman, 2, 1 (Record 3 from Student X Record 3 from Schedule)

4th record in Schedule has StudentID == 2, so output row:

            2, Herman, 2, 2 (Record 3 from Student X Record 4 from Schedule)

No other records in Schedule have StudentID == 2, so move on

 

And so on for records 4 and 5 in Student.

 

 

Now to the join we just did, join the course table to show names of student and the names of the courses they are taking.  (StudentJoinScheduleJoinCourse query).

 

SELECT S.name, C.name

From Student as S, Schedule as H, Course as C

Where S.id = H.id and C.crn= H.crn

 

Join on syntax

This last query is a little more clear using the ‘join … on’ syntax shown starting on p. 141 in the book.

 

SELECT S.name, C.name

From (Student as S INNER JOIN  Schedule as H ON S.id = H.id)

INNER JOIN  Course as C ON  C.crn = H.crn

 

(MS Access requires the parenthesis around the first join.)

 

Example

Names of students and names of the courses they are taking for all students taking more than 4 credits.

(Hint: requires a Having clause, and a join of a join.)

 

Answer: Don’t look here until you’ve tried it:

 

 

Get students taking more than 4 credits:

select ST.id

from Student as ST, Course as C, Schedule as SC

where

ST.id = SC.id and C.crn = SC.crn

group by ST.id

having sum(credits) > 4;

 

 

Get names of students and courses for all students taking courses:

select ST.name, C.name

from Student as ST, Course as C, Schedule as SC

where

ST.id = SC.id and C.crn = SC.crn

 

Use data from first query as an enumeration of the students we want from the second query.

 

select ST.name, C.name

from Student as ST, Course as C, Schedule as SC

where

ST.id = SC.id and C.crn = SC.crn

-- add the inner query to the above query

and ST.id in

(select ST.id

from Student as ST, Course as C, Schedule as SC

where

ST.id = SC.id and C.crn = SC.crn

group by ST.id

having sum(credits) > 4

)

 

 

Subquery vs Join

Any query that can be accomplished with a subquery can also be accomplished with a join. 

 

However, some queries you can do with join cannot be done with a subquery, so joins are more powerful than subqueries.

 

Subqueries will work only if all the fields you want to output come from a single table.

 

On Efficiency

The database has a query execution engine that breaks the query into underlying record retrieval operations.  It also has a query optimizer, which does its best to translate the SQL query you write into the most efficient set of underlying operations. 

 

Syntactic equivalences

Three-way join as two two-way joins

A three-way join can always be written as a combination of two-way joins.

 

Here is the outer query from the problem above written as  two two-way joins:

select DT.name, C.name

from (

     select ST.*, SC.crn

     from Student as ST, Schedule as SC

     where ST.id=SC.id

) as DT, Course as C

where

     DT.crn = C.crn

 

Notice the use of the derived table DT, and the ‘scope’ of the aliases. (ST and SC are valid only inside the ()’s that enclose their definitions.)

 

 Join on syntax

select DT.name, C.name

from (

     select ST.*, SC.crn

     from Student as ST join Schedule as SC

     on ST.id=SC.id

) as DT join Course as C

on

     DT.crn = C.crn