Database Design Guidelines and Normalization Explained

Relational database design
Normalization
Prepared by Vaishali Kalaria

Design Guidelines for Relational Databases
 What is relational database design?
 The grouping of attributes to form "good" relation
schemas

 Two levels of relation schemas
 The logical "user view" level
 The storage "base relation" level

 Design is concerned mainly with base relations

 What are the criteria for "good" base relations?

1. Semantics of the Relation Attributes

 each tuple in a relation should represent one entity
or relationship instance. (Applies to individual
relations and their attributes).

 Attributes of different entities should not be mixed
in the same relation

 Only foreign keys should be used to refer to other
entities

 Entity and relationship attributes should be kept
apart as much as possible.

2. Redundancy and Data Anomalies

 Redundant data is where we have stored the same
„information‟ more than once. i.e., the redundant data
could be removed without the loss of information.

 Wastes storage

 Causes problems with update anomalies
 Insertion anomalies
 Deletion anomalies
 Modification anomalies

 Design a schema that does not suffer from the
insertion, deletion and update anomalies.

 Example: the following relation that contains staff
and department details:

staffNo job dept dname city
Such ‘redundancy’
SL10 Salesman 10 Sales Stratford could lead to the
following
SA51 Manager 20 Accounts Barking ‘anomalies’
DS40 Clerk 20 Accounts Barking

OS45 Clerk 30 Operations Barking

• Insert Anomaly: Need to store a value for an attribute but
cannot because the value for another attribute is unknown.
• We can‟t insert a dept without inserting a member of
staff that works in that department

 Update Anomaly: Occurs when a change of a single
attribute in one record requires changes in multiple records
• We could change the name of the dept that SA51 works
in without simultaneously changing the dept that DS40
works in.

 Deletion Anomaly: Occurs when the removal of a record
results in a loss of important information about an entity.
• By removing employee SL10 we have removed all
information pertaining to the Sales dept.

3 Null Values in Tuples

 Relations should be designed such that their tuples
will have as few NULL values as possible

 Attributes that are NULL frequently could be placed
in separate relations (with the primary key)

 Reasons for nulls:
 Attribute not applicable or invalid
 Attribute value unknown (may exist)
 Value known to exist, but unavailable

Purpose of Normalization

 To avoid redundancy by storing each „fact‟ within the
database only once.

 To put data into a form that conforms to relational
principles - no repeating groups.

 To put the data into a form that is more able to accurately
accommodate change.

 To avoid certain updating „anomalies‟.

 To facilitate the enforcement of data constraints.

Normalization

 "Normalization" refers to the process of creating an
efficient, reliable, flexible, and appropriate "relational"
structure for storing information.
 Normalized data must be in a "relational" data structure.

 Usually involves dividing a database into two or more
tables and defining relationships between the tables.

 The objective is to isolate data so that additions, deletions,
and modifications of a field can be made in just one table
and then propagated through the rest of the database via
the defined relationships

The Process of Normalization

• Normalization is often executed as a series of steps.
• Each step corresponds to a specific normal form that
has known properties.

• As normalization proceeds,
• the relations become progressively more restricted in
format, and
• less vulnerable to update anomalies.

Stages of Normalisation
Unnormalised
(UDF)
Remove repeating groups
First normal form
(1NF)
Remove partial dependencies
Second normal form
(2NF)
Remove transitive dependencies
Third normal form
(3NF) Remove remaining functional
dependency anomalies
Boyce-Codd normal
form (BCNF)
Remove multivalued dependencies
Fourth normal form
(4NF)
Remove remaining anomalies
Fifth normal form
(5NF)

Unnormalized Normal Form (UNF)

 Definition: A relation is unnormalized when it has not
had any normalization rules applied to it, and it suffers
from various anomalies.

 the capturing of attributes to a ‘Universal Relation’
from a screen layout, manual report, manual
document, etc...

ClientRental relation in UNF
Repeating group = (propertyNo, pAddress,
rentStart, rentFinish, rent, ownerNo, oName)
Unnormalized form (UNF)
A table that contains one or more repeating groups.

ClientNo cName propertyNo pAddress rentStart rentFinish rent ownerNo oName
6 lawrence Tina
1-Jul-00 31-Aug-01 350 CO40 Murphy
PG4 St,Glasgow
John
CR76
kay Tony
PG16 5 Novar Dr, Shaw
1-Sep-02 1-Sep-02 450 CO93
Glasgow

6 lawrence Tina
PG4 1-Sep-99 10-Jun-00 350 CO40 Murphy
St,Glasgow

Tony
Aline 2 Manor Rd,
CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Shaw
Stewart Glasgow

Tony
5 Novar Dr, Shaw
PG16 1-Nov-02 1-Aug-03 450 CO93
Glasgow

Figure ClientRental unnormalized table

First Normal Form (1NF)

 Definition: A relation is in 1NF if, and only if, all its
underlying attributes contain atomic values only.
 the intersection of each row and column contains one and only
one value.
Remove repeating groups into a new relation

 1NF disallows having
 a set of values,
 a tuple of values, or
 a combination of both as an attribute value for a
single tuple.

1NF

There are two approaches to removing repeating groups from
unnormalized tables:

1. Removes the repeating groups by entering appropriate
data in the empty columns of rows containing the
repeating data.

2. Removes the repeating group by placing the repeating
data, along with a copy of the original key attribute(s), in
a separate relation. A primary key is identified for the
new relation.

1NF ClientRental relation with the first
approach

The ClientRental relation is defined as follows,
ClientRental first approach, we remove the repeating group
With the ( clientNo, propertyNo, cName, pAddress, rentStart,
rentFinish, rent, ownerNo, oName) entering the appropriate client
(property rented details) by
data into each row.
ClientNo propertyNo cName pAddress rentStart rentFinish rent ownerNo oName
John 6 lawrence Tina
CR76 PG4 1-Jul-00 31-Aug-01 350 CO40
Kay St,Glasgow Murphy
John 5 Novar Dr, Tony
CR76 PG16 1-Sep-02 1-Sep-02 450 CO93
Kay Glasgow Shaw
Aline 6 lawrence Tina
CR56 PG4 1-Sep-99 10-Jun-00 350 CO40
Stewart St,Glasgow Murphy
Tony
Aline 2 Manor Rd,
CR56 PG36 10-Oct-00 1-Dec-01 370 CO93 Shaw
Stewart Glasgow
Tony
Aline 5 Novar Dr,
CR56 PG16 1-Nov-02 1-Aug-03 450 CO93 Shaw
Stewart Glasgow

Figure 1NF ClientRental relation with the first approach

1NF ClientRental relation with the
second approach

Client (clientNo, cName)
With the second approach, we remove the repeating group
PropertyRentalOwner (clientNo, propertyNo, pAddress, rentStart,
(property rented details) by placing the repeating data along wit
rentFinish, rent, ownerNo, oName)
a copy of the original key attribute (clientNo) in a separte relatio
ClientNo cName
CR76 John Kay
CR56 Aline Stewart

ClientNo propertyNo pAddress rentStart rentFinish rent ownerNo oName
6 lawrence Tina
CR76 PG4 1-Jul-00 31-Aug-01 350 CO40
St,Glasgow Murphy
5 Novar Dr, Tony
CR76 PG16 1-Sep-02 1-Sep-02 450 CO93
Glasgow Shaw
6 lawrence Tina
CR56 PG4 1-Sep-99 10-Jun-00 350 CO40
St,Glasgow Murphy
2 Manor Rd, Tony
CR56 PG36 10-Oct-00 1-Dec-01 370 CO93
Glasgow Shaw
5 Novar Dr, Tony
CR56 PG16 1-Nov-02 1-Aug-03 450 CO93
Glasgow Shaw

Figure 1NF ClientRental relation with the second approach

Second Normal Form (2NF)

 A database table is said to be in 2NF if
 it is in 1NF and
 contains only those fields/columns that are
functionally dependent on the primary key.

 In 2NF the partial dependencies can be removed
of any non-key field.

 Note:
 It is still possible for a table in 2NF to exhibit transitive
dependency; that is, one or more attributes may be
functionally dependent on nonkey attributes.

The process of converting the database
table into 2NF:

 Identify the primary key for the 1NF relation.

 Identify the functional dependencies in the
relation.

 If partial dependencies exist on the primary key
remove them by placing then in a new relation
along with a copy of their determinant.

2NF ClientRental relation

The ClientRental relation has the following functional
dependencies:

fd1 clientNo, propertyNo  rentStart, rentFinish (Primary Key)
fd2 clientNo  cName (Partial
dependency)
fd3 propertyNo  pAddress, rent, ownerNo, oName (Partial
dependency)
fd4 ownerNo  oName (Full
dependency)
fd5 clientNo, rentStart  propertyNo, pAddress,
rentFinish, rent, ownerNo, oName
(Candidate key)
fd6 propertyNo, rentStart  clientNo, cName, rentFinish
(Candidate key)


After removing the partial dependencies, the creation of the three
new relations called Client, Rental, andrentStart, rentFinish)
Rental (clientNo, propertyNo, PropertyOwner
PropertyOwner (propertyNo, pAddress, rent, ownerNo, oName
Client Rental
ClientNo cName ClientNo propertyNo rentStart rentFinish
CR76 John Kay CR76 PG4 1-Jul-00 31-Aug-01
CR56 Aline Stewart CR76 PG16 1-Sep-02 1-Sep-02
CR56 PG4 1-Sep-99 10-Jun-00
CR56 PG36 10-Oct-00 1-Dec-01
CR56 PG16 1-Nov-02 1-Aug-03
PropertyOwner

propertyNo pAddress rent ownerNo oName
PG4 6 lawrence St,Glasgow 350 CO40 Tina Murphy
PG16 5 Novar Dr, Glasgow 450 CO93 Tony Shaw

PG36 2 Manor Rd, Glasgow 370 CO93 Tony Shaw

Figure 2NF ClientRental relation

Third Normal Form (3NF)
Transitive dependency
A condition where A, B, and C are attributes of a relation such th
if A  B and B  C, then C is transitively dependent on A via B
(provided that A is not functionally dependent on B or C).

Third normal form (3NF)

 A relation that is in first and second normal form,
and in which no non-primary-key attribute is
transitively dependent on the primary key.

 The normalization of 2NF relations to 3NF
involves the removal of transitive dependencies
by placing the attribute(s) in a new relation along
with a copy of the determinant.

The functional dependencies for the Client, Rental and
PropertyOwner relations are as follows:

Client
fd2 clientNo  cName
(Primary Key)

Rental
fd1 clientNo, propertyNo  rentStart, rentFinish (Primary Key)
fd5 clientNo, rentStart  propertyNo, rentFinish (Candidate key)
fd6 propertyNo, rentStart  clientNo, rentFinish (Candidate key)

PropertyOwner
fd3 propertyNo  pAddress, rent, ownerNo, oName
(Primary Key)
fd4 ownerNo  oName (Transitive Dependency)


The resulting 3NF relations have the forms:

Rental (clientNo, propertyNo, rentStart, rentFinish)
PropertyOwner (propertyNo, pAddress, rent, ownerNo)
Owner (ownerNo, oName)


Client Rental

ClientNo cName ClientNo propertyNo rentStart rentFinish

CR76 John Kay CR76 PG4 1-Jul-00 31-Aug-01

CR56 Aline Stewart CR76 PG16 1-Sep-02 1-Sep-02

CR56 PG4 1-Sep-99 10-Jun-00

CR56 PG36 10-Oct-00 1-Dec-01

CR56 PG16 1-Nov-02 1-Aug-03
PropertyOwner
Owner
propertyNo pAddress rent ownerNo

PG4 6 lawrence St,Glasgow 350 CO40 ownerNo oName

PG16 5 Novar Dr, Glasgow 450 CO93 CO40 Tina Murphy

PG36 2 Manor Rd, Glasgow 370 CO93 CO93 Tony Shaw

Figure 3NF ClientRental relation

Boyce-Codd Normal Form (BCNF)
 A relation is in BCNF if, and only if, every
determinant is a candidate key.

 BCNF is a refinement to third normal form,

 A relation schema R is in Boyce-Codd Normal Form
(BCNF) if whenever an FD X -> A holds in R, then X
is a superkey of R

 That is every relation in BCNF is also in 3NF but a
relation in 3NF is not necessary in BCNF.

3NF to BCNF

 Identify all candidate keys in the relation.

 Identify all functional dependencies in the relation.

 If functional dependencies exists in the relation
where their determinants are not candidate keys for
the relation, remove the functional dependencies by
placing them in a new relation along with a copy of
their determinant.

Example of BCNF

fd1 clientNo, interviewDate  interviewTime, staffNo, roomNo (Primary Key)
fd2 staffNo, interviewDate, interviewTime clientNo (Candidate
key)
fd3 roomNo, interviewDate, interviewTime  clientNo, staffNo (Candidate
key)
fd4 staffNo, interviewDate  roomNo (not a candidate
ClientInterview
key) ClientNo interviewDate interviewTime staffNo roomNo
CR76 13-May-02 10.30 SG5 G101

CR75 13-May-02 12.00 SG5 G101
CR74 13-May-02 12.00 SG37 G102
CR56 1-Jul-02 10.30 SG5 G102

Figure ClientInterview relation

Example of BCNF(2)
To transform the ClientInterview relation to BCNF, we must remove the violating
functional dependency by creating two new relations called Interview and
SatffRoom as shown below,

Interview (clientNo, interviewDate, interviewTime, staffNo)
StaffRoom(staffNo, interviewDate, roomNo)

Interview
ClientNo interviewDate interviewTime staffNo
CR76 13-May-02 10.30 SG5
CR75 13-May-02 12.00 SG5
CR74 13-May-02 12.00 SG37
CR56 1-Jul-02 10.30 SG5

StaffRoom
staffNo interviewDate roomNo
SG5 13-May-02 G101
SG37 13-May-02 G102
SG5 1-Jul-02 G102

Figure BCNF Interview and StaffRoom relations

Example - 1NF to 2NF
1NF:
Property_Inspection (Property_No, IDate, ITime,
Paddress, Comments, Staff_No, Sname, Car_Reg)

 Full Functional Dependency:
(Property_No+IDate)->(ITime, Comments, Staff_No,Sname,
Car_Reg)

 Partial Dependency:
(Property_No+IDate)->(PAddress)

2NF:
 Prop (Property_No, Paddress)
 Prop_Inspection (Property_No, IDate, ITime, Comments,
 Staff_No, Sname, Car_Reg)

Example - 2NF to 3NF

 Transitive Dependency in Prop_Inspect:
 (Property_No+IDate) -> Staff_No
 Staff_No -> Sname

3NF:
 Staff (Staff_No, Sname)
 Prop_Inspection (Property_No, IDate, ITime,
 Comments, Staff_No, Car_Reg)

Example - 3NF to BCNF
 Prop (Property_No,Paddress)
 Prop_Inspection (Property_No, IDate, ITime, Comments,
Staff_No, Car_Reg)

 Prop and Staff are already in BCNF.

 FDs of Prop_Inspect:
 (Property_No, IDate)->(ITime, Comments, Staff_No,
Car_Reg)
 (Staff_No, Idate) -> Car_Reg
 (Car_Reg, Idate, ITime) -> (Property_No, Comments,
Staff_No)
 (Staff_No, Idate, ITime) -> (Property_No, Comments)

Example – BCNF

 Prop (Property_No,Paddress)


 Inspection (Property_No, IDate, ITime,
Comments, Staff_No)

 Staff_Car (Staff_No, IDate, Car_Reg)

What is Decomposition?
 Decomposition – the process of breaking down in parts
or elements.

 Decomposition in database means breaking tables down
into multiple tables

 From Database perspective means going to a higher
normal form

 To break the modules to in smallest one to convert
the data models in to a normal forms to avoid
redundancies

Decomposition of relation schema
 Suppose R is a relation schema
 R = {A1,A2,A3,….An}

 This is decompose into a set of relational
schemas by
 D = {R1,R2,R3,…Rm } ,such that
 Ri ⊆ R for 1<= i <=m
 And R1 ⋃ R2 ⋃ R3….⋃ Rm = R

 Ex: gradeInfo(rollNo, studName, course, grade)
 R1 : gradeInfo(rollNo, course, grade)
 R2 : studInfo(rollNo, studName)

Decomposition
Important that decompositions are “good”,

Two Characteristics of Good Decompositions

1) Lossless

2) Preserve dependencies

Problem with Decomposition

 Given instances of the decomposed relations,
we may not be able to reconstruct the corresponding
instance of the original relation – information loss

Example : Problem with Decomposition
R
Model Name Price Category
a11 100 Canon
s20 200 Nikon
a70 150 Canon

R1 R2
Model Name Category Price Category

a11 Canon 100 Canon

s20 Nikon 200 Nikon
a70 Canon 150 Canon

Example : Problem with Decomposition
R1 U R2 Model Name Price Category
a11 100 Canon
a11 150 Canon
s20 200 Nikon
a70 100 Canon
a70 150 Canon

Model Name Price Category
R
a11 100 Canon
s20 200 Nikon
a70 150 Canon

Lossy decomposition

 In previous example, additional tuples are obtained
along with original tuples

 Although there are more tuples, this leads to less
information

 Due to the loss of information, decomposition for
previous example is called lossy decomposition or
lossy-join decomposition

Lossy decomposition (more example)

T
Employee Project Branch
Brown Mars L.A.
Green Jupiter San Jose
Green Venus San Jose
Hoskins Saturn San Jose
Hoskins Venus San Jose

Functional dependencies:

Employee Branch, Project Branch

Lossy decomposition

Decomposition of the previous relation
T1 T2

Employee Branch Project Branch
Mars L.A.
Brown L.A
Jupiter San Jose
Green San Jose
Saturn San Jose
Hoskins San Jose Venus San Jose

Lossy decomposition
After Natural Join Original Relation

Employee Project Branch Employee Project Branch
Brown Mars L.A. Brown Mars L.A.
Green Jupiter San Jose Green Jupiter San Jose
Green Venus San Jose Green Venus San Jose
Hoskins Saturn San Jose Hoskins Saturn San Jose
Hoskins Venus San Jose Hoskins Venus San Jose
Green Saturn San Jose
Hoskins Jupiter San Jose

After Natural Join, we get two extra tuples. Thus, there is loss of information

What is lossless?
 Lossless means functioning without a loss.
In other words, retain everything.

 Important for databases to have this feature.

Lossless Decomposition Property

R : relation
F : set of functional dependencies on R
X,Y : decomposition of R
Decomposition is lossles if :
 X ∩ Y  X, that is: all attributes common to both X and
Y functionally determine ALL the attributes in X
 OR
 X ∩ Y  Y, that is: all attributes common to both X and
Y functionally determine ALL the attributes in Y

 In other words, if X ∩ Y forms a superkey of either X or
Y, the decomposition of R is a lossless decomposition

Why lossless?
Ensures that attributes involved in the natural join
(X ∩ Y) are a candidate key for at least one of the two
relations.

This ensures we can never get the situation where
false tuples are generated,
as for any value on the join attributes there will
be a unique tuple in one of the relations.

Lossless Decomposition
A decomposition is lossless if we can recover:
R(A,B,C)
Decompose

R1(A,B) R2(A,C)

Recover

R‟(A,B,C) should be the same as
R(A,B,C)
Must ensure R‟ = R

Lossless Decomposition example
• Sometimes the same set of data is reproduced:
Name Price Category
Word 100 WP
Oracle 1000 DB
Access 100 DB

Name Price Name Category
Word 100 Word WP
Oracle 1000 Oracle DB
Access 100 Access DB

• (Word, 100) + (Word, WP)  (Word, 100, WP)
• (Oracle, 1000) + (Oracle, DB)  (Oracle, 1000, DB)
• (Access, 100) + (Access, DB)  (Access, 100, DB)

Lossy Decomposition
• Sometimes it‟s not:
Name Price Category
Word 100 WP
What’s
Oracle 1000 DB
wrong?
Access 100 DB

Category Name Category Price

WP Word WP 100

DB Oracle DB 1000

DB Access DB 100

• (Word, WP) + (100, WP) = (Word, 100, WP)
• (Oracle, DB) + (1000, DB) = (Oracle, 1000, DB)
• (Oracle, DB) + (100, DB) = (Oracle, 100, DB)
• (Access, DB) + (1000, DB) = (Access, 1000, DB)
• (Access, DB) + (100, DB) = (Access, 100, DB)

Ensuring lossless decomposition

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)

If A1, ..., An  B1, ..., Bm or A1, ..., An  C1, ..., Cp
Then the decomposition is lossless

Note: don‟t need both

Dependency preservation

 Dependency preservation refers to a specific case of
lossless decomposition, such that the normalized
relvars are independent of each other

 Some lossless decompositions do not exhibit
dependency preservation

 Let relation R(A,B,C,D) that has dependencies F that
include A ➙ B and A ➙ C
 decomposition: R1(A,B), R2(B,C,D)
 A ➙ C can not be preserved using only one relation.

 Not possible to preserve each and every
dependency in F
 But dependency that are preserved are equivalent to
F

 F dependency of Relation R
 R decompose in R1,R2,….Rn
 Dependency partition of F are F1,F2,….,Fn only
involve attributes of R1,R2,..,Rn respectively then

 Decomposition have Preserved Dependencies
F1⋃ F2 ⋃ .. ⋃ Fn ➙ F

 If decomposition does not preserve the dependency
than
 decomposed relation do not satisfy the F or

Dependency Preserving Decompositions (Contd.)

 Decomposition of R into X and Y is dependency
preserving
 if (FX FY ) + = F +
 i.e., if we consider only dependencies in the closure F + that can
be checked in X without considering Y, and in Y without
considering X, these imply all dependencies in F +.

 Important to consider F + in this definition:
 ABC, A B, B C, C A, decomposed into AB and BC.
 Is this dependency preserving? Is C A preserved?????
 note: F + contains F {A C, B A, C B}, so…

 FAB contains A B and B A; FBC contains B C and C
B
+

Dependency Preservation
 Example: decompose supplier, city, status where
supplier implies city and status, and city and status imply
each other

 Dependency is preserved in this projection:
SC {S#, CITY}
CS {CITY, STATUS}

 Dependency is not preserved in this one:
SC {S#, CITY}
CS {S#, STATUS}

 Although the second is nonloss, you still cannot update

Dependency Preservation
Ensures we can “easily” check whether a FD X Y
is violated during an update to a database:

 The projection of an FD set F onto a set of attributes
Z, FZ is
{X Y|X Y F +, X Y Z}
i.e., it is those FDs local to Z‟s attributes
 A decomposition R1, …, Rk is dependency preserving
if
F + = (FR1 ... FRk)+

The decomposition hasn‟t “lost” any essential FD‟s, so
we can check without doing a join

Example of Lossless and
Dependency-Preserving Decompositions
Given relation scheme
R(cno, name, street, city, st, zip, item, price)
And FD set cno name
name street, city
street, city st
street, city zip
name, item price
Consider the decomposition
R1(cno, name, street, city, st, zip) and R2(cno, name, item,
price)
 Is it lossless?
 Is it dependency preserving?
What if we replaced the first FD by name, street city?

Comparison of BCNF and 3NF

 It is always possible to decompose a relation into a
set of relations that are in 3NF such that:
 the decomposition is lossless
 the dependencies are preserved

 It is always possible to decompose a relation into a
set of relations that are in BCNF such that:
 the decomposition is lossless
 it may not be possible to preserve dependencies.

Database Design Guidelines and Normalization Explained

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (19)

Último

Último (20)

Database Design Guidelines and Normalization Explained