1. DATABASE
NORMALIZATION
Dan D’Urso
Orange Coast Database Associates
Laguna Niguel, Orange County, California
www.ocdatabases.com
slides.1@ocdatabases.com
Twitter: @ocdatabases
Database Normalization 1
2. DATABASE NORMALIZATION
ACTIVITY
• Activity extracted from a college textbook
• Shows how to put a database into 3rd normal form
• Explains the three normal forms, and
• The normalization process
• Additional exercises with answers at the end
Database Normalization 2
3. DATABASE NORMALIZATION
ACTIVITY
• Tracks large industrial equipment placed in large chemical
plants in the South
• Each piece of equipment has a unique identifier based on the
plant and equipment type
• Starting point is flat Excel spreadsheet
Database Normalization
3
4. NORMALIZATION
1st normal
form
2nd normal
form
(implies 1st)
3rd normal
form
(implies 2nd)
Database Normalization 4
3rd is normal goal; normal forms go all
the way to 5th but we will stick to 3rd for
this brief introductory exercise
5. TABLE DESCRIPTIONS
• This is a description of the original table using a common text
only notation (table name followed in parentheses by primary
keys underlined, required fields bolded, foreign keys italicized)
• Equipment (plant_name, eqpt_name, plant_mgr, ept_mfgr,
mfgr_addr)
• This is our 3rd normal form goal…
• Equipment (plant_name, eqpt_name, eqpt_mfgr)
• Plants (plant_name, plant_mgr)
• Manufacturers (eqpt_mfgr, mfgr_addr)
Database Normalization 5
6. 1ST NORMAL FORM
• All rows unique
• All values in same column have same data type
• All columns have a unique name
• All cells atomic
• No repeating groups or multi-valued attributes
• Order of rows and columns does not matter.
Database Normalization 6
7. REPEATING GROUPS
• Although called groups the concept could apply to single columns
as well. For example phone numbers: work, home, cell, etc.
• Here is a simplified example with student grades
• Anomaly: what happens if there is a 3rd subject?
• You have to add two more columns, forcing redesign of forms, queries
and reports.
• Subjects and grades should be moved to a separate table with foreign
keys to the studentID.
Database Normalization 7
StudentID Subj1 Grade1 Subj2 Grade2
120 Math B French A-
8. MULTI-PART FIELDS
• NOT part of normalization, per-se
• But, may want to consider while looking at 1st normal form
• Example: customer_name : Bob Smith
• Split into customer_first_name: Bob, customer_last_name:
Smith
• This makes searching and sorting by last name possible (or first
name)
• Generally you would want to split the parts in a database and
recombine them via a query for reports. But this depends on
your business rules. You may want to preserve the field as is.
Database Normalization 8
9. DATABASE NORMALIZATION
Database Normalization 9
Violates first normal – why?
Plant
Name
Eqpt
Name
Plant
Mgr
Eqpt
Mfgr
Mfgr
Addr
ethylene Final cooler,
feed heater
Jim Smith ABC
Exchanger
1247 Locust
styrene Final cooler Bill Gunn Delta Supply 88 Canal
styrene Feed heater Bill Gunn ABC
Exchanger
1247 Locust
Primary keys underlined
10. 1ST NORMAL FORM PROBLEM
• Violates 1st normal form
• There is a cell with both feed heater and final cooler which
violates the all cells must be atomic rule
Database Normalization 10
Plant
Name
Eqpt
Name
ethylene Final cooler,
feed heater
11. DATABASE NORMALIZATION
Database Normalization 11
Plant Name Eqpt Name Plant Mgr Eqpt Mfgr Mfgr Addr
ethylene Final cooler Jim Smith ABC
Exchanger
1247 Locust
ethylene Feed heater Jim Smith XYZ Pumps 432
Broadway
styrene Final cooler Bill Gunn Delta Supply 88 Canal
styrene Feed heater Bill Gunn XYZ Pumps 432
Broadway
1st Normal satisfied
12. 2ND NORMAL FORM
• All non key fields must be dependent on the entire key, not just part of
the key
• This would be called a partial key dependency
• Partial key dependency example (composite PK is SKU, Loc) in an
inventory database which tracks quantity on hand by warehouse:
• In this example the description and price fields depend only on the SKU.
The QOH field can remain as it represents the quantity on hand in a
particular warehouse.
• The descr and price fields should be moved into their own table with a
primary key of SKU.
Database Normalization 12
SKU Loc QOH Descr Price
12a TU 15 Bolts 5.95
13. DATABASE NORMALIZATION
Database Normalization 13
Plant Name Eqpt Name Plant Mgr Eqpt Mfgr Mfgr Addr
ethylene Final cooler Jim Smith ABC
Exchanger
1247 Locust
ethylene Feed heater Jim Smith XYZ Pumps 432
Broadway
styrene Final cooler Bill Gunn Delta Supply 88 Canal
styrene Feed heater Bill Gunn XYZ Pumps 432
Broadway
1st Normal satisfied Still violates 2nd normal– why?
14. 2ND NORMAL FORM PROBLEM
• Violates 2nd normal
form rule – contains a
partial key
dependency
• Plant manager
depends only on plant
name
• Plant manager should
be moved to a separate
table with plant name
as PK
Database Normalization 14
Plant Name Eqpt Name Plant Mgr
ethylene Final cooler Jim Smith
ethylene Feed heater Jim Smith
styrene Final cooler Bill Gunn
styrene Feed heater Bill Gunn
15. DATABASE NORMALIZATION
Database Normalization 15
Plant
Name
Eqpt
name
Eqpt
Mfgr
Mfgr
Addr
ethylene Final
cooler
ABC
Exchanger
1247 Locust
ethylene Feed
heater
XYZ Pumps 432 Broadway
styrene Final
cooler
Delta Supply 88 Canal
styrene Feed
heater
XYZ Pumps 432 Broadway
Plant
Name
Plant
Mgr
ethylene Jim Smith
styrene Bill Gunn
2nd OK - no partial
key dependencies.
16. 3RD NORMAL FORM
• Columns must not be dependent on a non-key column
• Called a transitive dependency
• Transitive dependency example (orderID is PK)
• In this example the customer name is dependent on the customerID
which is outside the key.
• It should be moved to a separate table with customerID as the PK column
(and other columns such as address and city in a more complete example).
Database Normalization 16
OrderID
Order
Date
Order
Amount
CustomerI
D
Customer
Name
1021 12/12/2012 150.00 CA88 Bob Smith
17. DATABASE NORMALIZATION
Database Normalization 17
Plant
Name
Eqpt
name
Eqpt
Mfgr
Mfgr
Addr
ethylene Final
cooler
ABC
Exchanger
1247 Locust
ethylene Feed
heater
XYZ Pumps 432 Broadway
styrene Final
cooler
Delta Supply 88 Canal
styrene Feed
heater
XYZ Pumps 432 Broadway
Plant
Name
Plant
Mgr
ethylene Jim Smith
styrene Bill Gunn
2nd OK - no partial
key dependencies.
Why does it violate
3rd normal?
18. 3RD NORMAL FORM PROBLEM
• Violates 3rd normal
form
• Contains a transitive
dependency
• If you know the
equipment mfgr you
know the address
• mfgr address should
be moved to a separate
table with eqpt mfgr
as PK
Database Normalization 18
Eqpt Mfgr Mfgr Addr
ABC
Exchanger
1247 Locust
XYZ Pumps 432 Broadway
Delta Supply 88 Canal
XYZ Pumps 432 Broadway
19. DATABASE NORMALIZATION
Database Normalization 19
Plant
Name
Eqpt
name
Eqpt
Mfgr
ethylene Final
cooler
ABC Exchanger
ethylene Feed
heater
XYZ Pumps
styrene Final
cooler
Delta Supply
styrene Feed
heater
XYZ Pumps
Plant
Name
Plant
Mgr
Ethylene Jim Smith
styrene Bill Gunn
EqptMfgr MfgrAddr
ABC
Exchanger
1247 Locust
Delta Supply 88 Canal
XYZ Pumps 432 Broadway
Satisfies 3rd normal form – no transitive dependencies
20. NORMALIZATION EXERCISES
• Normalize the following…
Scenario: recording labor time spent on factory work orders
• Labor(badgeno, workorderno, full name, salary, workorderno,
workorder_description, workorder_Budget) [hint: is there a partial key
dependency? Is there another action you might wat to take?]
• Scenario: admitting patients to a single hospital
• Patient(patientno, firstname, lastname, admission_date,
assigned_doctorno, doctor’s specialty(ies), symptom_code,
symptom_description) [hint: look for repeating groups]
• Scenario: tracking parts inventory by part # and warehouse
• Inventory(partno, warehouseid, quantity on hand, reorder qty, part
description, warehouse location)
• Do it in steps: 1st normal form, 2nd normal, 3rd
Database Normalization 20
21. CONCLUSION
• 3rd normal form is a common goal of normalization
• Three stages
• 1st normal form – eliminate non atomic cells, repeating groups
• 2nd normal – remove partial key dependencies
• 3rd normal – eliminate transitive dependencies
Database Normalization 21