Subqueries work as long as selected results come from a single table, but not if results must come from more than one table. Enter the join operation.
A join is the way to join two tables together. The technique is based on the Cartesian product algorithm.
A Cartesian product of two tables results in a third table containing all the columns of both tables.
The data in the Cartesian product is defined as follows:
For each row in table1
For each row in table2
Create a new row in the result table
select * from Student |
select * from Course |
|
|
select *
from Student, Course
Note: 5 rows * 3 rows == 15 rows in the product.
Usually, we want only row combinations where two fields that have the same ‘meaning’ have the same value.
Select <FieldNames>
From <tableNames>
Where <Table1>.<Field1> = <Table2>.<Field2>
Joins are computed as follows:
For each row in table1
For each row in table2 where (value of table1.joinfield == value of table2.joinfield)
Create a new row in the result table
../demos/queryExamples.sql contains several examples showing the techniques we have been talking about.
../demos/maxEnrollmentQueriesDemo.sql
Here is a discussion of the
StudentScheduleJoin query, which shows student IDs and course IDs for students who are taking courses.
SELECT S.*, H.*
FROM Student AS S, Schedule AS H
WHERE S.id=H.id;
Result is obtained as follows:
id for first record in Student is 0
There are no records in Schedule where studentID == 0, so do not output any rows
id for 2nd record in Student is 1
1st record in Schedule has StudentID == 1, so output row:
1, Jane, 1, 0 (Record 2 from Student X Record 1 from Schedule)
2nd record in Schedule also has StudentID == 1, so output row:
1, Jane, 1, 1 (Record 2 from Student X Record 2 from Schedule)
No other records in Schedule have StudentID == 1, so move on.
Id for 3rd record in student is 2
3rd record in Schedule has StudentID == 2, so output row:
2, Herman, 2, 1 (Record 3 from Student X Record 3 from Schedule)
4th record in Schedule has StudentID == 2, so output row:
2, Herman, 2, 2 (Record 3 from Student X Record 4 from Schedule)
No other records in Schedule have StudentID == 2, so move on
And so on for records 4 and 5 in Student.
Now to the join we just did, join the course table to show names of student and the names of the courses they are taking. (StudentJoinScheduleJoinCourse query).
SELECT S.name, C.name
From Student as S, Schedule as H, Course as C
Where S.id = H.id and C.crn= H.crn
This last query is a little more clear using the ‘join … on’ syntax shown starting on p. 141 in the book.
SELECT S.name, C.name
From (Student as S INNER JOIN Schedule as H ON S.id = H.id)
INNER JOIN Course as C ON C.crn = H.crn
(MS Access requires the parenthesis around the first join.)
Names of students and names of the courses they are taking for all students taking more than 4 credits.
(Hint: requires a Having clause, and a join of a join.)
Answer: Don’t look here until you’ve tried it:
Get students taking more than 4 credits:
select ST.id
from Student as ST, Course as C, Schedule as SC
where
ST.id = SC.id and C.crn = SC.crn
group by ST.id
having sum(credits) > 4;
Get names of students and courses for all students taking courses:
select ST.name, C.name
from Student as ST, Course as C, Schedule as SC
where
ST.id = SC.id and C.crn = SC.crn
Use data from first query as an enumeration of the students we want from the second query.
select ST.name, C.name
from Student as ST, Course as C, Schedule as SC
where
ST.id = SC.id and C.crn = SC.crn
-- add the inner query to the above query
and ST.id in
(select ST.id
from Student as ST, Course as C, Schedule as SC
where
ST.id = SC.id and C.crn = SC.crn
group by ST.id
having sum(credits) > 4
)
Any query that can be accomplished with a subquery can also be accomplished with a join.
However, some queries you can do with join cannot be done with a subquery, so joins are more powerful than subqueries.
Subqueries will work only if all the fields you want to output come from a single table.
The database has a query execution engine that breaks the query into underlying record retrieval operations. It also has a query optimizer, which does its best to translate the SQL query you write into the most efficient set of underlying operations.
A three-way join can always be written as a combination of two-way joins.
Here is the outer query from the problem above written as two two-way joins:
select DT.name, C.name
from (
select ST.*, SC.crn
from Student as ST, Schedule as SC
where ST.id=SC.id
) as DT, Course as C
where
DT.crn = C.crn
Notice the use of the derived table DT, and the ‘scope’ of the aliases. (ST and SC are valid only inside the ()’s that enclose their definitions.)
select DT.name, C.name
from (
select ST.*, SC.crn
from Student as ST join Schedule as SC
on ST.id=SC.id
) as DT join Course as C
on
DT.crn = C.crn