2. Design Guidelines for Relational Databases
What is relational database design?
The grouping of attributes to form "good" relation
schemas
Two levels of relation schemas
The logical "user view" level
The storage "base relation" level
Design is concerned mainly with base relations
What are the criteria for "good" base relations?
3. 1. Semantics of the Relation Attributes
each tuple in a relation should represent one entity
or relationship instance. (Applies to individual
relations and their attributes).
Attributes of different entities should not be mixed
in the same relation
Only foreign keys should be used to refer to other
entities
Entity and relationship attributes should be kept
apart as much as possible.
4. 2. Redundancy and Data Anomalies
Redundant data is where we have stored the same
„information‟ more than once. i.e., the redundant data
could be removed without the loss of information.
Wastes storage
Causes problems with update anomalies
Insertion anomalies
Deletion anomalies
Modification anomalies
Design a schema that does not suffer from the
insertion, deletion and update anomalies.
5. Example: the following relation that contains staff
and department details:
staffNo job dept dname city
Such ‘redundancy’
SL10 Salesman 10 Sales Stratford could lead to the
following
SA51 Manager 20 Accounts Barking ‘anomalies’
DS40 Clerk 20 Accounts Barking
OS45 Clerk 30 Operations Barking
6. • Insert Anomaly: Need to store a value for an attribute but
cannot because the value for another attribute is unknown.
• We can‟t insert a dept without inserting a member of
staff that works in that department
Update Anomaly: Occurs when a change of a single
attribute in one record requires changes in multiple records
• We could change the name of the dept that SA51 works
in without simultaneously changing the dept that DS40
works in.
Deletion Anomaly: Occurs when the removal of a record
results in a loss of important information about an entity.
• By removing employee SL10 we have removed all
information pertaining to the Sales dept.
7. 3 Null Values in Tuples
Relations should be designed such that their tuples
will have as few NULL values as possible
Attributes that are NULL frequently could be placed
in separate relations (with the primary key)
Reasons for nulls:
Attribute not applicable or invalid
Attribute value unknown (may exist)
Value known to exist, but unavailable
8. Purpose of Normalization
To avoid redundancy by storing each „fact‟ within the
database only once.
To put data into a form that conforms to relational
principles - no repeating groups.
To put the data into a form that is more able to accurately
accommodate change.
To avoid certain updating „anomalies‟.
To facilitate the enforcement of data constraints.
9. Normalization
"Normalization" refers to the process of creating an
efficient, reliable, flexible, and appropriate "relational"
structure for storing information.
Normalized data must be in a "relational" data structure.
Usually involves dividing a database into two or more
tables and defining relationships between the tables.
The objective is to isolate data so that additions, deletions,
and modifications of a field can be made in just one table
and then propagated through the rest of the database via
the defined relationships
10. The Process of Normalization
• Normalization is often executed as a series of steps.
• Each step corresponds to a specific normal form that
has known properties.
• As normalization proceeds,
• the relations become progressively more restricted in
format, and
• less vulnerable to update anomalies.
11. Stages of Normalisation
Unnormalised
(UDF)
Remove repeating groups
First normal form
(1NF)
Remove partial dependencies
Second normal form
(2NF)
Remove transitive dependencies
Third normal form
(3NF) Remove remaining functional
dependency anomalies
Boyce-Codd normal
form (BCNF)
Remove multivalued dependencies
Fourth normal form
(4NF)
Remove remaining anomalies
Fifth normal form
(5NF)
12. Unnormalized Normal Form (UNF)
Definition: A relation is unnormalized when it has not
had any normalization rules applied to it, and it suffers
from various anomalies.
the capturing of attributes to a ‘Universal Relation’
from a screen layout, manual report, manual
document, etc...
13. ClientRental relation in UNF
Repeating group = (propertyNo, pAddress,
rentStart, rentFinish, rent, ownerNo, oName)
Unnormalized form (UNF)
A table that contains one or more repeating groups.
ClientNo cName propertyNo pAddress rentStart rentFinish rent ownerNo oName
6 lawrence Tina
1-Jul-00 31-Aug-01 350 CO40 Murphy
PG4 St,Glasgow
John
CR76
kay Tony
PG16 5 Novar Dr, Shaw
1-Sep-02 1-Sep-02 450 CO93
Glasgow
6 lawrence Tina
PG4 1-Sep-99 10-Jun-00 350 CO40 Murphy
St,Glasgow
Tony
Aline 2 Manor Rd,
CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Shaw
Stewart Glasgow
Tony
5 Novar Dr, Shaw
PG16 1-Nov-02 1-Aug-03 450 CO93
Glasgow
Figure ClientRental unnormalized table
14. First Normal Form (1NF)
Definition: A relation is in 1NF if, and only if, all its
underlying attributes contain atomic values only.
the intersection of each row and column contains one and only
one value.
Remove repeating groups into a new relation
1NF disallows having
a set of values,
a tuple of values, or
a combination of both as an attribute value for a
single tuple.
15. 1NF
There are two approaches to removing repeating groups from
unnormalized tables:
1. Removes the repeating groups by entering appropriate
data in the empty columns of rows containing the
repeating data.
2. Removes the repeating group by placing the repeating
data, along with a copy of the original key attribute(s), in
a separate relation. A primary key is identified for the
new relation.
16. 1NF ClientRental relation with the first
approach
The ClientRental relation is defined as follows,
ClientRental first approach, we remove the repeating group
With the ( clientNo, propertyNo, cName, pAddress, rentStart,
rentFinish, rent, ownerNo, oName) entering the appropriate client
(property rented details) by
data into each row.
ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName
John 6 lawrence Tina
CR76 PG4 1-Jul-00 31-Aug-01 350 CO40
Kay St,Glasgow Murphy
John 5 Novar Dr, Tony
CR76 PG16 1-Sep-02 1-Sep-02 450 CO93
Kay Glasgow Shaw
Aline 6 lawrence Tina
CR56 PG4 1-Sep-99 10-Jun-00 350 CO40
Stewart St,Glasgow Murphy
Tony
Aline 2 Manor Rd,
CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Shaw
Stewart Glasgow
Tony
Aline 5 Novar Dr,
CR56 PG16 1-Nov-02 1-Aug-03 450 CO93 Shaw
Stewart Glasgow
Figure 1NF ClientRental relation with the first approach
17. 1NF ClientRental relation with the
second approach
Client (clientNo, cName)
With the second approach, we remove the repeating group
PropertyRentalOwner (clientNo, propertyNo, pAddress, rentStart,
(property rented details) by placing the repeating data along wit
rentFinish, rent, ownerNo, oName)
a copy of the original key attribute (clientNo) in a separte relatio
ClientNo cName
CR76 John Kay
CR56 Aline Stewart
ClientNo propertyNo pAddress rentStart rentFinish rent ownerNo oName
6 lawrence Tina
CR76 PG4 1-Jul-00 31-Aug-01 350 CO40
St,Glasgow Murphy
5 Novar Dr, Tony
CR76 PG16 1-Sep-02 1-Sep-02 450 CO93
Glasgow Shaw
6 lawrence Tina
CR56 PG4 1-Sep-99 10-Jun-00 350 CO40
St,Glasgow Murphy
2 Manor Rd, Tony
CR56 PG36 10-Oct-00 1-Dec-01 370 CO93
Glasgow Shaw
5 Novar Dr, Tony
CR56 PG16 1-Nov-02 1-Aug-03 450 CO93
Glasgow Shaw
Figure 1NF ClientRental relation with the second approach
19. Second Normal Form (2NF)
A database table is said to be in 2NF if
it is in 1NF and
contains only those fields/columns that are
functionally dependent on the primary key.
In 2NF the partial dependencies can be removed
of any non-key field.
Note:
It is still possible for a table in 2NF to exhibit transitive
dependency; that is, one or more attributes may be
functionally dependent on nonkey attributes.
20. The process of converting the database
table into 2NF:
Identify the primary key for the 1NF relation.
Identify the functional dependencies in the
relation.
If partial dependencies exist on the primary key
remove them by placing then in a new relation
along with a copy of their determinant.
22. 2NF ClientRental relation
After removing the partial dependencies, the creation of the three
Client (clientNo, cName)
new relations called Client, Rental, andrentStart, rentFinish)
Rental (clientNo, propertyNo, PropertyOwner
PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName
Client Rental
ClientNo cName ClientNo propertyNo rentStart rentFinish
CR76 John Kay CR76 PG4 1-Jul-00 31-Aug-01
CR56 Aline Stewart CR76 PG16 1-Sep-02 1-Sep-02
CR56 PG4 1-Sep-99 10-Jun-00
CR56 PG36 10-Oct-00 1-Dec-01
CR56 PG16 1-Nov-02 1-Aug-03
PropertyOwner
propertyNo pAddress rent ownerNo oName
PG4 6 lawrence St,Glasgow 350 CO40 Tina Murphy
PG16 5 Novar Dr, Glasgow 450 CO93 Tony Shaw
PG36 2 Manor Rd, Glasgow 370 CO93 Tony Shaw
Figure 2NF ClientRental relation
23. Third Normal Form (3NF)
Transitive dependency
A condition where A, B, and C are attributes of a relation such th
if A B and B C, then C is transitively dependent on A via B
(provided that A is not functionally dependent on B or C).
24. Third normal form (3NF)
A relation that is in first and second normal form,
and in which no non-primary-key attribute is
transitively dependent on the primary key.
The normalization of 2NF relations to 3NF
involves the removal of transitive dependencies
by placing the attribute(s) in a new relation along
with a copy of the determinant.
28. Boyce-Codd Normal Form (BCNF)
A relation is in BCNF if, and only if, every
determinant is a candidate key.
BCNF is a refinement to third normal form,
A relation schema R is in Boyce-Codd Normal Form
(BCNF) if whenever an FD X -> A holds in R, then X
is a superkey of R
That is every relation in BCNF is also in 3NF but a
relation in 3NF is not necessary in BCNF.
29. 3NF to BCNF
Identify all candidate keys in the relation.
Identify all functional dependencies in the relation.
If functional dependencies exists in the relation
where their determinants are not candidate keys for
the relation, remove the functional dependencies by
placing them in a new relation along with a copy of
their determinant.
38. What is Decomposition?
Decomposition – the process of breaking down in parts
or elements.
Decomposition in database means breaking tables down
into multiple tables
From Database perspective means going to a higher
normal form
To break the modules to in smallest one to convert
the data models in to a normal forms to avoid
redundancies
39. Decomposition of relation schema
Suppose R is a relation schema
R = {A1,A2,A3,….An}
This is decompose into a set of relational
schemas by
D = {R1,R2,R3,…Rm } ,such that
Ri ⊆ R for 1<= i <=m
And R1 ⋃ R2 ⋃ R3….⋃ Rm = R
Ex: gradeInfo(rollNo, studName, course, grade)
R1 : gradeInfo(rollNo, course, grade)
R2 : studInfo(rollNo, studName)
42. Problem with Decomposition
Given instances of the decomposed relations,
we may not be able to reconstruct the corresponding
instance of the original relation – information loss
43. Example : Problem with Decomposition
R
Model Name Price Category
a11 100 Canon
s20 200 Nikon
a70 150 Canon
R1 R2
Model Name Category Price Category
a11 Canon 100 Canon
s20 Nikon 200 Nikon
a70 Canon 150 Canon
44. Example : Problem with Decomposition
R1 U R2 Model Name Price Category
a11 100 Canon
a11 150 Canon
s20 200 Nikon
a70 100 Canon
a70 150 Canon
Model Name Price Category
R
a11 100 Canon
s20 200 Nikon
a70 150 Canon
45. Lossy decomposition
In previous example, additional tuples are obtained
along with original tuples
Although there are more tuples, this leads to less
information
Due to the loss of information, decomposition for
previous example is called lossy decomposition or
lossy-join decomposition
46. Lossy decomposition (more example)
T
Employee Project Branch
Brown Mars L.A.
Green Jupiter San Jose
Green Venus San Jose
Hoskins Saturn San Jose
Hoskins Venus San Jose
Functional dependencies:
Employee Branch, Project Branch
47. Lossy decomposition
Decomposition of the previous relation
T1 T2
Employee Branch Project Branch
Mars L.A.
Brown L.A
Jupiter San Jose
Green San Jose
Saturn San Jose
Hoskins San Jose Venus San Jose
48. Lossy decomposition
After Natural Join Original Relation
Employee Project Branch Employee Project Branch
Brown Mars L.A. Brown Mars L.A.
Green Jupiter San Jose Green Jupiter San Jose
Green Venus San Jose Green Venus San Jose
Hoskins Saturn San Jose Hoskins Saturn San Jose
Hoskins Venus San Jose Hoskins Venus San Jose
Green Saturn San Jose
Hoskins Jupiter San Jose
After Natural Join, we get two extra tuples. Thus, there is loss of information
49. What is lossless?
Lossless means functioning without a loss.
In other words, retain everything.
Important for databases to have this feature.
50. Lossless Decomposition Property
R : relation
F : set of functional dependencies on R
X,Y : decomposition of R
Decomposition is lossles if :
X ∩ Y X, that is: all attributes common to both X and
Y functionally determine ALL the attributes in X
OR
X ∩ Y Y, that is: all attributes common to both X and
Y functionally determine ALL the attributes in Y
In other words, if X ∩ Y forms a superkey of either X or
Y, the decomposition of R is a lossless decomposition
51. Why lossless?
Ensures that attributes involved in the natural join
(X ∩ Y) are a candidate key for at least one of the two
relations.
This ensures we can never get the situation where
false tuples are generated,
as for any value on the join attributes there will
be a unique tuple in one of the relations.
52. Lossless Decomposition
A decomposition is lossless if we can recover:
R(A,B,C)
Decompose
R1(A,B) R2(A,C)
Recover
R‟(A,B,C) should be the same as
R(A,B,C)
Must ensure R‟ = R
53. Lossless Decomposition example
• Sometimes the same set of data is reproduced:
Name Price Category
Word 100 WP
Oracle 1000 DB
Access 100 DB
Name Price Name Category
Word 100 Word WP
Oracle 1000 Oracle DB
Access 100 Access DB
• (Word, 100) + (Word, WP) (Word, 100, WP)
• (Oracle, 1000) + (Oracle, DB) (Oracle, 1000, DB)
• (Access, 100) + (Access, DB) (Access, 100, DB)
54. Lossy Decomposition
• Sometimes it‟s not:
Name Price Category
Word 100 WP
What’s
Oracle 1000 DB
wrong?
Access 100 DB
Category Name Category Price
WP Word WP 100
DB Oracle DB 1000
DB Access DB 100
• (Word, WP) + (100, WP) = (Word, 100, WP)
• (Oracle, DB) + (1000, DB) = (Oracle, 1000, DB)
• (Oracle, DB) + (100, DB) = (Oracle, 100, DB)
• (Access, DB) + (1000, DB) = (Access, 1000, DB)
• (Access, DB) + (100, DB) = (Access, 100, DB)
55. Ensuring lossless decomposition
R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)
R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)
If A1, ..., An B1, ..., Bm or A1, ..., An C1, ..., Cp
Then the decomposition is lossless
Note: don‟t need both
56. Dependency preservation
Dependency preservation refers to a specific case of
lossless decomposition, such that the normalized
relvars are independent of each other
Some lossless decompositions do not exhibit
dependency preservation
Let relation R(A,B,C,D) that has dependencies F that
include A ➙ B and A ➙ C
decomposition: R1(A,B), R2(B,C,D)
A ➙ C can not be preserved using only one relation.
57. Not possible to preserve each and every
dependency in F
But dependency that are preserved are equivalent to
F
F dependency of Relation R
R decompose in R1,R2,….Rn
Dependency partition of F are F1,F2,….,Fn only
involve attributes of R1,R2,..,Rn respectively then
Decomposition have Preserved Dependencies
F1⋃ F2 ⋃ .. ⋃ Fn ➙ F
If decomposition does not preserve the dependency
than
decomposed relation do not satisfy the F or
58. Dependency Preserving Decompositions (Contd.)
Decomposition of R into X and Y is dependency
preserving
if (FX FY ) + = F +
i.e., if we consider only dependencies in the closure F + that can
be checked in X without considering Y, and in Y without
considering X, these imply all dependencies in F +.
Important to consider F + in this definition:
ABC, A B, B C, C A, decomposed into AB and BC.
Is this dependency preserving? Is C A preserved?????
note: F + contains F {A C, B A, C B}, so…
FAB contains A B and B A; FBC contains B C and C
B
+
59. Dependency Preservation
Example: decompose supplier, city, status where
supplier implies city and status, and city and status imply
each other
Dependency is preserved in this projection:
SC {S#, CITY}
CS {CITY, STATUS}
Dependency is not preserved in this one:
SC {S#, CITY}
CS {S#, STATUS}
Although the second is nonloss, you still cannot update
60. Dependency Preservation
Ensures we can “easily” check whether a FD X Y
is violated during an update to a database:
The projection of an FD set F onto a set of attributes
Z, FZ is
{X Y|X Y F +, X Y Z}
i.e., it is those FDs local to Z‟s attributes
A decomposition R1, …, Rk is dependency preserving
if
F + = (FR1 ... FRk)+
The decomposition hasn‟t “lost” any essential FD‟s, so
we can check without doing a join
61. Example of Lossless and
Dependency-Preserving Decompositions
Given relation scheme
R(cno, name, street, city, st, zip, item, price)
And FD set cno name
name street, city
street, city st
street, city zip
name, item price
Consider the decomposition
R1(cno, name, street, city, st, zip) and R2(cno, name, item,
price)
Is it lossless?
Is it dependency preserving?
What if we replaced the first FD by name, street city?
62. Comparison of BCNF and 3NF
It is always possible to decompose a relation into a
set of relations that are in 3NF such that:
the decomposition is lossless
the dependencies are preserved
It is always possible to decompose a relation into a
set of relations that are in BCNF such that:
the decomposition is lossless
it may not be possible to preserve dependencies.