2. Data Models
• A Database models some
portion of the real world.
• Data Model is link between
user’s view of the world and
bits stored in computer.
• Many models have been
proposed.
• We will concentrate on the
Relational Model.
10101
11101
Student (sid: string, name: string, login:
string, age: integer, gpa:real)
3. Describing Data:
Data Models
• A data model is a collection of concepts for
describing data.
• A database schema is a description of a
particular collection of data, using a given data
model.
• The relational model of data is the most widely
used model today.
o Main concept: relation, basically a table with rows and columns.
o Every relation has a schema, which describes the columns, or fields.
4. Need to design a
data model
Data Model
A data schema
Need to model the business
5. Relational Query
Languages
• Query languages:
o Allow manipulation and retrieval of data from a database.
• Relational model supports simple,
powerful QLs:
o Strong formal foundation based on logic.
o Allows for much optimization.
• Query Languages != programming
languages!
o QLs not expected to be “Turing complete”.
o QLs not intended to be used for complex calculations.
o QLs support easy, efficient access to large data sets.
6. Formal Relational Query
Languages
Two mathematical Query Languages form the
basis for “real” languages (e.g. SQL), and for
implementation:
¶ Relational Algebra: More operational, very
useful for representing execution plans.
· Relational Calculus: Lets users describe what
they want, rather than how to compute it.
· (Non-operational, declarative.)
* Understanding Algebra & Calculus is key to
* understanding SQL, query processing!
7. Relational Database:
Definitions
• Relational database: a set of relations.
• Relation: made up of 2 parts:
o Schema : specifies name of relation, plus name
and type of each column.
• E.g. Students(sid: string, name: string, login:
string, age: integer, gpa: real)
o Instance : a table, with rows and columns.
• #rows = cardinality
• #fields = degree / arity
• Can think of a relation as a set of rows or tuples.
o i.e., all rows are distinct
8. Set and Bag
A set of objects….
Formal distinction
Set:
All objects in the “set” are unique
If the objects are not unique, then it is a
Bag
9. Preliminaries
• A query is applied to relation instances, and the
result of a query is also a relation instance.
o Schemas of input relations for a query are fixed (but query will run
regardless of instance!)
o The schema for the result of a given query is also fixed! Determined by
definition of query language constructs.
• Positional vs. named-field notation:
o Positional notation easier for formal definitions, named-field notation more
readable.
o Both used in SQL
10. Algebra
• In math, algebraic operations like +, -, x, /.
• Operate on numbers: input are numbers, output are
numbers.
• Can also do Boolean algebra on sets, using union,
intersect, difference.
• Focus on algebraic identities, e.g.
o x (y+z) = xy + xz.
• (Relational algebra lies between propositional and 1st-order logic.)
3
4
7+
11. Relational Algebra
• Every operator takes one or two relation instances
• Result is also a relation
A relational algebra expression is a relation
Algebra is closed
F( R ) -> R
F(R1,R2) -> R
12. 12
Relational Algebra in a
DBMS
parser
SQL
query
Relational
algebra
expression
Optimized
Relational
algebra
expression
Query optimizer
Code
generator
Query
execution
plan
Executable
code
DBMS
13. Introduction to Relational Algebra
• Introduced by E. F.
Codd in 1970.
• Codd proposed such
an algebra as a basis
for database query
languages.
14. Terminology
• Relation - a set of tuples.
• Tuple - a collection of attributes which describe
some real world entity.
• Attribute - a real world role played by a named
domain.
• Domain - a set of atomic values.
• Set - a mathematical definition for a collection of
objects which contains no duplicates.
15. Relational Algebra
• Basic operations:
o Selection ( 𝛔) Selects a subset of rows from relation.
o Projection ( π) Deletes unwanted columns from relation.
o Cross-product ( X ) Allows us to combine two relations.
o Set-difference ( - ) Tuples in reln. 1, but not in reln. 2.
o Union ( U ) Tuples in reln. 1 and in reln. 2.
• Additional operations:
o Intersection, join, division, renaming: Not essential, but (very!) useful.
16. Closed Algebra
Since each operation returns a relation, operations
can be composed! (Algebra is “closed”.)
All these operations have a relation instance as input
And all these operations give an instance relation as output
18. 18
Projection
R1 := PROJL (R2)
R1 := πL (R2)
• L is a list of attributes from the schema of R2.
• R1 is constructed by looking at each tuple of R2,
extracting the attributes on list L, in the order
specified, and creating from those components a
tuple for R1.
• Eliminate duplicate tuples, if any.
19. Projection
• Deletes attributes that are not in projection list.
• Schema of result contains exactly the fields in the
projection list, with the same names that they had in
the (only) input relation.
sname rating
S
,
( )2
Schema: Result(sname,rating)
20. Projection
sname rating
yuppy 9
lubber 8
guppy 5
rusty 10
sname rating
S
,
( )2
• Deletes attributes that are not in projection list.
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
Schema: Result(sname,rating)
21. Projection
age
35.0
55.5
age S( )2
• Deletes attributes that are not in projection list.
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
Schema: Result(age)
Duplicates are eliminated
(sets not bags)
22. 22
Selection
R1 := SELECTC (R2)
R1 := 𝛔C (R2)
• C is a condition (as in “if” statements) that refers to
attributes of R2.
• R1 is all those tuples of R2 that satisfy C.
24. Selection
rating
S
8
2( )
sid sname rating age
28 yuppy 9 35.0
58 rusty 10 35.0
Schema of result identical to schema of input relation.
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S2
Result(sid,sname,rating,age)
25. Composite
We have two operations
Each operation, 𝛔 and π, have relations as input
Each operation has a relation as output
i.e., Relational Algebra is closed
Thus we can combine them into composite functions
sname rating rating
S
,
( ( ))
8
2
31. Difference
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S1
S2
Compute: S1 - S2
Union Compatible Take away
Duplicates
sid sname rating age
22 dustin 7 45.0
S1(sid,sname,rating,age)
S2(sid,sname,rating,age)
Result(sid,sname,rating,age)
The same schema
32. Union, Intersection, Set-
Difference
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
44 guppy 5 35.0
28 yuppy 9 35.0
S S1 2
sid sname rating age
31 lubber 8 55.5
58 rusty 10 35.0
S S1 2
sid sname rating age
22 dustin 7 45.0
S S1 2
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S1
S2
All have the same schema
33. 33
Cross-Product
R3 := R1 * R2
• Pair each tuple t1 of R1 with each tuple t2 of R2.
• Concatenation t1 and t2 is a tuple of R3.
• Schema of R3 is the attributes of R1 and then R2, in order.
• But beware attribute A of the same name in R1 and R2:
use R1.A and R2.A (rename)
36. Cross-Product
• Each row of S1 is paired with each row of R1.
• Result schema has one field per field of S1 and R1,
with field names `inherited’ if possible.
• Conflict: Both S1 and R1 have a field called sid.
( ( , ), )C sid sid S R1 1 5 2 1 1
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
* Renaming operator:
37. 37
Renaming
• The RENAME operator gives a new schema to a
relation.
• R1 := RENAMER1(A1,…,An)(R2) makes R1 be a relation
with attributes A1,…,An and the same tuples as
R2.
• Simplified notation: R1(A1,…,An) := R2.
39. Composite Functions
• Projection
• Selection
• Product
• Union
• Intersection
• Difference
Relation algebra is closed
Can form composite
function:
as our example before:
This is where the power of relation algebra
Comes into play
Can form useful composite functions:
Such as
Joins and Division
40. Conditional Joins
(Theta Join)
Select out rows of a cross product given a certain
condition
• Result schema same as that of cross-product.
• Sometimes called a theta-join.
R c S c R S ( )
Cross product
Selection
41. Joins
R c S c R S ( )
S R
S sid R sid
1 1
1 1
. .
1. Perform the cross product S1 x R1
2. The perform the selection
44. Joins
• Condition Join:
• Result schema same as that of cross-product.
• Fewer tuples than cross-product, might be able
to compute more efficiently
• Sometimes called a theta-join.
R c S c R S ( )
(sid) sname rating age (sid) bid day
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 58 103 11/12/96
S R
S sid R sid
1 1
1 1
. .
45. Conditional Joins
Special Case: Equi-Join)
The condition is equality
Selects out those rows where a attributes are the same
• (for example, two primary keys)
• Again, result schema same as that of cross-product.
R c S c R S ( )
Cross product
Selection
46. Joins
R c S c R S ( )
1. Perform the cross product S1 x R1
2. The perform the selection R1.sid = S1.sid
S R
sid
1 1
49. Equi-Join
• Equi-Join: A special case of condition join where the
condition c contains only equalities.
• Result schema similar to cross-product,
• but only one copy of fields for which equality is
specified.
sid sname rating age bid day
22 dustin 7 45.0 101 10/10/96
58 rusty 10 35.0 103 11/12/96
S R
sid
1 1
50. Natural Join
• Natural Join: Equijoin on all common fields.
sid bid day
22 101 10/10/96
58 103 11/12/96
R1(sid,bid,day)
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1(sid,sname,rating,age)
sid sname rating age bid day
22 dustin 7 45.0 101 10/10/96
58 rusty 10 35.0 103 11/12/96
S1 R1
52. Division
• Not supported as a primitive operator, but useful for
expressing queries like:
Find sailors who have reserved all boats.
• Let A have 2 fields, x and y; B have only field y:
o A/B =
o i.e., A/B contains all x tuples (sailors) such that for every y tuple (boat) in B, there
is an xy tuple in A.
o Or: If the set of y values (boats) associated with an x value (sailor) in A contains all
y values in B, the x value is in A/B.
• In general, x and y can be any lists of fields; y is the list of
fields in B, and x y is the list of fields of A.
x x y A y B| ,
54. sno pno
s1 p1
s1 p2
s1 p3
s1 p4
s2 p1
s2 p2
s3 p2
s4 p2
s4 p4
Examples of Division A/B
pno
p2
sno
s1
s2
s3
s4
A
B1
A/B1Which have
p2 in A
55. Examples of Division A/B
sno pno
s1 p1
s1 p2
s1 p3
s1 p4
s2 p1
s2 p2
s3 p2
s4 p2
s4 p4
pno
p2
p4
sno
s1
s4
A
B2
A/B2
Which have both
p2 and p4
56. Examples of Division A/B
sno pno
s1 p1
s1 p2
s1 p3
s1 p4
s2 p1
s2 p2
s3 p2
s4 p2
s4 p4
pno
p1
p2
p4
sno
s1
A
B3
A/B3
Which has
p1, p2 and p4
57. Expressing A/B Using
Basic Operators
• Division is not essential op; just a useful shorthand.
o (Also true of joins, but joins are so common that systems implement joins
specially.)
• Idea: For A/B, compute all x values that are not
`disqualified’ by some y value in B.
o x value is disqualified if by attaching y value from B, we obtain an xy tuple that
is not in A.
Disqualified x values:
A/B:
x x A B A(( ( ) ) )
x A( ) all disqualified tuples
58. Expressing A/B Using
Basic Operators
x x A B A(( ( ) ) )
sno pno
s1 p1
s1 p2
s1 p3
s1 p4
s2 p1
s2 p2
s3 p2
s4 p2
s4 p4
pno
p2
p4
Select out sno from A
(note that only unique element
x is attributes unique to A
(not in B)
sno
Cross with B
has the same schema as A
Subtract rows that are
the same as A
Select out sno
This is the set of “disqualified” rows
59. Expressing A/B Using
Basic Operators
x x A B A(( ( ) ) )
sno pno
s1 p1
s1 p2
s1 p3
s1 p4
s2 p1
s2 p2
s3 p2
s4 p2
s4 p4
pno
p2
p4
This is the set of “disqualified”
x A( )
If something remains,
Then it is in the answer
sno
s1
s4
Subtract out disqualified tuples
60. SQL and
Relational Algebra
• Project
o SELECT X FROM TABLE
• Select
o select * from E where salary < 200
• Product
o select * from E, D
• Union
o UNION
• Intersection
o INTERSECT
61. 61
Schemas for Results
• Union, intersection, and difference: the schemas of
the two operands must be the same, so use that
schema for the result.
• Selection: schema of the result is the same as the
schema of the operand.
• Projection: list of attributes tells us the schema.
62. 62
Schemas for Results ---
(2)
• Product: schema is the attributes of both relations.
o Use R.A, etc., to distinguish two attributes named A.
• Theta-join: same as product.
• Natural join: union of the attributes of the two
relations.
• Renaming: the operator tells the schema.
67. Find names of sailors who’ve reserved a
red boat
• Information about boat color only available in Boats;
so need an extra join:
sname color red
Boats serves Sailors((
' '
) Re )
v A more efficient solution:
sname sid bid color red
Boats s Sailors( ((
' '
) Re ) )
* A query optimizer can find this given the first solution!
68. Find sailors who’ve reserved a red or a
green boat
• Can identify all red or green boats, then find
sailors who has reserved one of these boats:
( , (
' ' ' '
))Tempboats
color red color green
Boats
sname Tempboats serves Sailors( Re )
v What happens if is replaced by this query?
69. Find sailors who’ve reserved a red and a
green boat
• Previous approach won’t work! Must identify
sailors who’ve reserved red boats, sailors who’ve
reserved green boats, then find the intersection
(note that sid is a key for Sailors):
( , ((
' '
) Re ))Tempred
sid color red
Boats serves
sname Tempred Tempgreen Sailors(( ) )
( , ((
' '
) Re ))Tempgreen
sid color green
Boats serves
70. Find the names of sailors who’ve reserved all
boats
• Uses division; schemas of the input relations to /
must be carefully chosen:
( , (
,
Re ) / ( ))Tempsids
sid bid
serves
bid
Boats
sname Tempsids Sailors( )
v To find sailors who’ve reserved all ‘Interlake’ boats:
/ (
' '
)
bid bname Interlake
Boats
.....
71. 71
Duplicates
• Duplicate rows not allowed in a relation
• However, duplicate elimination from query result is
costly and not automatically done; it must be
explicitly requested:
SELECT DISTINCT …..
FROM …..
72. 72
Operations on Bags
• Selection applies to each tuple, so its effect on
bags is like its effect on sets.
• Projection also applies to each tuple, but as a bag
operator, we do not eliminate duplicates.
• Products and joins are done on each pair of tuples,
so duplicates in bags have no effect on how we
operate.
73. 73
Beware: Bag Laws != Set
Laws
• Some, but not all algebraic laws that hold for sets
also hold for bags.
• Example: the commutative law for union (R UNION
S = S UNION R ) does hold for bags.
o Since addition is commutative, adding the number of times x appears in R
and S doesn’t depend on the order of R and S.
74. Relational Algebra
• Relational Algebra and Relational Calculus have
substantial expressive power. In particular, they can
express
• Natural Join
• Quotient
• Unions of conjunctive queries
• …
• However, they Cannot Express recursive Queries.
75. 75
Equivalences
The same relational algebraic expression can be
written in many different ways. The order in which
tuples appear in relations is never significant.
• A B <=> B A
• A B <=> B A
• A B <=> B A
• (A - B) is not the same as (B - A)
• c1 ( c2 (A)) <=> c2 ( c1 (A)) <=> c1 ^ c2 (A)
• a1(A) <=> a1( a1,etc(A)) , where etc is any
attributes of A.
• ...
76. 76
Operations on Bags
(and why we care)
• Union: {a,b,b,c} U {a,b,b,b,e,f,f} =
{a,a,b,b,b,b,b,c,e,f,f}
o add the number of occurrences
• Difference: {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}
o subtract the number of occurrences
• Intersection: {a,b,b,b,c,c}∩{b,b,c,c,c,c,d} = {b,b,c,c}
o minimum of the two numbers of occurrences
• Selection: preserve the number of occurrences
• Projection: preserve the number of occurrences (no
duplicate elimination)
• Cartesian product, join: no duplicate elimination
78. Summary
• The relational model has rigorously defined query
languages that are simple and powerful.
• Relational algebra is more operational; useful as
internal representation for query evaluation plans.
• Several ways of expressing a given query; a query
optimizer should choose the most efficient version.