SlideShare uma empresa Scribd logo
1 de 47
Baixar para ler offline
Rushdi Shams, Dept of CSE, KUET
Database SystemsDatabase Systems
Data WarehousingData Warehousing
Version 1.0Version 1.0
1
Rushdi Shams, Dept of CSE, KUET
The Advent of Data WarehousingThe Advent of Data Warehousing
 The existing database models were notThe existing database models were not
suitable to meet the requirements.suitable to meet the requirements.
 The requirements can be categorized intoThe requirements can be categorized into
two-two-
1.1. Operational UseOperational Use
2.2. Decision Support UseDecision Support Use
2
Rushdi Shams, Dept of CSE, KUET
Operational UseOperational Use
 Requires precise, accurate, andRequires precise, accurate, and instantinstant picture ofpicture of
databasedatabase
 Day to day basis business-Day to day basis business-
1.1. Customer comesCustomer comes
2.2. Orders partsOrders parts
1.1. Search the partsSearch the parts
2.2. Book/purchase the partsBook/purchase the parts
3.3. Add datesAdd dates
1.1. Bank transactions on the purchase/ bookingBank transactions on the purchase/ booking
2.2. InvoiceInvoice
3
Rushdi Shams, Dept of CSE, KUET
Operational UseOperational Use
 Customer-company direct interactionCustomer-company direct interaction
 All the information are processedAll the information are processed
instantaneously (or almost instantaneously)instantaneously (or almost instantaneously)
4
Rushdi Shams, Dept of CSE, KUET
Decision Support UseDecision Support Use
 Operational use magnifies the scope-Operational use magnifies the scope-
Which customer, where he lives, what is hisWhich customer, where he lives, what is his
phone number, which part he bought, howphone number, which part he bought, how
much he paid, what was the date, bla bla bla…much he paid, what was the date, bla bla bla…
 Decision support use narrows the scope-Decision support use narrows the scope-
I need only the business related issues- whichI need only the business related issues- which
customer, which part he bought, how muchcustomer, which part he bought, how much
he paid and what was the datehe paid and what was the date
5
Rushdi Shams, Dept of CSE, KUET
Decision Support UseDecision Support Use
 …… & the benefits are-& the benefits are-
 In december, the company may need to stockIn december, the company may need to stock
DDR RAM more than HDDDDR RAM more than HDD
 SATA HDDs are more sold than PATA HDDsSATA HDDs are more sold than PATA HDDs
 Mr. X is our honourable customer who boughtMr. X is our honourable customer who bought
most of the RAMs and Mr. Y is our honourablemost of the RAMs and Mr. Y is our honourable
customer who bought most of the SATA HDDscustomer who bought most of the SATA HDDs
6
Rushdi Shams, Dept of CSE, KUET
And The War Begins…And The War Begins…
 So, the conflict between lightspeedSo, the conflict between lightspeed
applications (OLTP) and slog futureapplications (OLTP) and slog future
predictions led an advent of datapredictions led an advent of data
warehousing.warehousing.
7
Rushdi Shams, Dept of CSE, KUET
Relational DatabasesRelational Databases
 Too granular, too many little piecesToo granular, too many little pieces
 Processing takes longer time for largerProcessing takes longer time for larger
transactions by joining those little piecestransactions by joining those little pieces
 Very effective for Front End applications thatVery effective for Front End applications that
are accessed by too many people tooare accessed by too many people too
frequentlyfrequently
 Requires less hardware specificationRequires less hardware specification
8
Rushdi Shams, Dept of CSE, KUET
Data warehousingData warehousing
 Processes large amount of informationProcesses large amount of information
 Too less users (basically the owners)Too less users (basically the owners)
 Mainly for reporting and analysisMainly for reporting and analysis
 Hardware requirements are hugeHardware requirements are huge
9
Rushdi Shams, Dept of CSE, KUET
The relation between themThe relation between them
 Data Warehousing is simplest form ofData Warehousing is simplest form of
relational databaserelational database
 Try to only add data and remove data…Try to only add data and remove data…
because most often changing requires hugebecause most often changing requires huge
data processingdata processing
 And you often do mistake in Keys for just twoAnd you often do mistake in Keys for just two
records, in this case you are dealing withrecords, in this case you are dealing with
millions of records- so, think about datamillions of records- so, think about data
modificationsmodifications
10
Rushdi Shams, Dept of CSE, KUET
The relation between themThe relation between them
 The one-many / many-many / many-oneThe one-many / many-many / many-one
relations and key constraints of relationalrelations and key constraints of relational
model is still present in data warehousingmodel is still present in data warehousing
11
Rushdi Shams, Dept of CSE, KUET
The Dimensional Data ModelThe Dimensional Data Model
 So, if data warehouse needs a different dataSo, if data warehouse needs a different data
model rather than relational model, what thatmodel rather than relational model, what that
would be?would be?
 The answer is dimensional data modelThe answer is dimensional data model
12
Rushdi Shams, Dept of CSE, KUET
The Dimensional Data ModelThe Dimensional Data Model
 Contains-Contains-
1.1. FactsFacts
2.2. DimensionsDimensions
 Fact table contains transactions. ForFact table contains transactions. For
example, invoices of all the customers forexample, invoices of all the customers for
the last 5 years.the last 5 years.
 The dimension tables describe the fact table.The dimension tables describe the fact table.
13
Rushdi Shams, Dept of CSE, KUET
The Dimensional Data ModelThe Dimensional Data Model
Static Data
Dynamic Data
14
Rushdi Shams, Dept of CSE, KUET
The Star SchemaThe Star Schema
 The most effective approach to model dataThe most effective approach to model data
using dimensional data model is theusing dimensional data model is the StarStar
SchemaSchema
15
Rushdi Shams, Dept of CSE, KUET
The Star SchemaThe Star Schema
16
Rushdi Shams, Dept of CSE, KUET
The Star Schema: Equivalent DiagramThe Star Schema: Equivalent Diagram
17
Rushdi Shams, Dept of CSE, KUET
The Star Schema: PropertiesThe Star Schema: Properties
 So, a star schema contains a fact table- whichSo, a star schema contains a fact table- which
is robust as the time goes by, very dynamic,is robust as the time goes by, very dynamic,
changes all the timechanges all the time
 A star schema contains dimension tables-A star schema contains dimension tables-
which are static, changes very little as thewhich are static, changes very little as the
time goes bytime goes by
 Star schema aids queries to join a bulky factStar schema aids queries to join a bulky fact
table with dimension tables to be simple andtable with dimension tables to be simple and
not time complexnot time complex
18
Rushdi Shams, Dept of CSE, KUET
The Snowflake SchemaThe Snowflake Schema
 Normalized star schemaNormalized star schema
 Only the dimensions are normalizedOnly the dimensions are normalized
 The result is a fact table connected directlyThe result is a fact table connected directly
with some dimension tables and somewith some dimension tables and some
dimension tables connected to otherdimension tables connected to other
dimension tablesdimension tables
19
Rushdi Shams, Dept of CSE, KUET
The Snowflake SchemaThe Snowflake Schema
Fact Table
Normalized
Dimension
Dimension
20
Rushdi Shams, Dept of CSE, KUET
The Snowflake Schema: EquivalentThe Snowflake Schema: Equivalent
ViewView
21
Rushdi Shams, Dept of CSE, KUET
The ProblemThe Problem
 Not too many tables but too many layersNot too many tables but too many layers
 The most used Relational algebra inThe most used Relational algebra in
dimensional database isdimensional database is JoinJoin
 Too many tables in joins, too many overheads.Too many tables in joins, too many overheads.
 There are not many tables hereThere are not many tables here 
 But too many layers, joining one tableBut too many layers, joining one table
requires joining other related tablesrequires joining other related tables 
 And if one of those tables (Fact) have trillionsAnd if one of those tables (Fact) have trillions
of data, you are dead!of data, you are dead!
22
Rushdi Shams, Dept of CSE, KUET
The ProblemThe Problem
 If the SALE fact table has 1 million records, and all
dimensions contain 10 records each, a Cartesian
product would return 106
multiplied by 109
records.
That makes for 1015
records
23
Rushdi Shams, Dept of CSE, KUET
The SolutionThe Solution
 Convert the snowflake schema into starConvert the snowflake schema into star
schema.schema.
24
Rushdi Shams, Dept of CSE, KUET
The SolutionThe Solution
25
Rushdi Shams, Dept of CSE, KUET
The SolutionThe Solution
 a join occurs between one fact table and six
dimensional tables. That is a Cartesian product of 106
multiple by 106
, resulting in 1012
records returned.
26
Rushdi Shams, Dept of CSE, KUET
The DifferenceThe Difference
 The difference between 1012
and 1015
is three
decimals.
 Three decimals is not just three zeroes and
thus 1,000 records. The difference is actually
1,000,000,000,000,000 – 1,000,000,000,000 =
999,000,000,000,000.
27
Rushdi Shams, Dept of CSE, KUET
Types of Dimension TablesTypes of Dimension Tables
 Dimension tables showed so far areDimension tables showed so far are
inadequateinadequate
 Typically, there are some conventions forTypically, there are some conventions for
dimension tables.dimension tables.
 Such as dates and locations are two commonSuch as dates and locations are two common
dimension tables in data warehouses.dimension tables in data warehouses.
 Why?? Most businesses have two commonWhy?? Most businesses have two common
issues- date of a transaction, place ofissues- date of a transaction, place of
shipment/ deliveryshipment/ delivery
28
Rushdi Shams, Dept of CSE, KUET
Types of Dimension Tables: DatesTypes of Dimension Tables: Dates
29
Rushdi Shams, Dept of CSE, KUET
Types of Dimension Tables: DatesTypes of Dimension Tables: Dates
30
Rushdi Shams, Dept of CSE, KUET
Types of Dimension Tables: LocationsTypes of Dimension Tables: Locations
 Locations, states, country, continent, etcLocations, states, country, continent, etc
31
Rushdi Shams, Dept of CSE, KUET
Let’s Create a DataLet’s Create a Data
Warehouse ModelWarehouse Model
32
Rushdi Shams, Dept of CSE, KUET
The Relational ModelThe Relational Model
33
Rushdi Shams, Dept of CSE, KUET
Step 1Step 1
 Identify the Fact tableIdentify the Fact table
 The Fact table contains (mostly) transactionsThe Fact table contains (mostly) transactions
that occur day-to-day basis/ that are relatedthat occur day-to-day basis/ that are related
with money/ anything that is the mainwith money/ anything that is the main
purpose of a businesspurpose of a business
34
Rushdi Shams, Dept of CSE, KUET
Step 1: Finding the Fact TableStep 1: Finding the Fact Table
35
Rushdi Shams, Dept of CSE, KUET
Step 1Step 1
 So, our fact table would be (in this case)So, our fact table would be (in this case)
RoyaltyRoyalty
36
Rushdi Shams, Dept of CSE, KUET
Step 2: Find Dimension TablesStep 2: Find Dimension Tables
 Find the tables that are static, not dynamic…Find the tables that are static, not dynamic…
dynamic one is the Fact table.dynamic one is the Fact table.
 We will take a look at both the staticWe will take a look at both the static
(dimension) tables and dynamic (fact) tables(dimension) tables and dynamic (fact) tables
when we will finish step 3when we will finish step 3
37
Rushdi Shams, Dept of CSE, KUET
Step 3Step 3
 Develop a snowflake schema with the fact andDevelop a snowflake schema with the fact and
dimension tablesdimension tables
38
Rushdi Shams, Dept of CSE, KUET
Step 3: Snowflake SchemaStep 3: Snowflake Schema
39
Rushdi Shams, Dept of CSE, KUET
Step 3: Snowflake SchemaStep 3: Snowflake Schema
40
Rushdi Shams, Dept of CSE, KUET
Step 4Step 4
 Develop a star schema by denormalizing theDevelop a star schema by denormalizing the
snowflake schemasnowflake schema
41
Rushdi Shams, Dept of CSE, KUET
Step 4: Star SchemaStep 4: Star Schema
42
Rushdi Shams, Dept of CSE, KUET
Step 4: Star SchemaStep 4: Star Schema
43
Rushdi Shams, Dept of CSE, KUET
Surrogate Key: Important Key in DataSurrogate Key: Important Key in Data
WarehouseWarehouse
 A customer is recognized in table 1 by customerA customer is recognized in table 1 by customer
namename
 The same person in table 2 is recognized byThe same person in table 2 is recognized by
telephone numbertelephone number
 The same person in table 3 is recognized by SSNThe same person in table 3 is recognized by SSN
numbernumber
 If you have to make table 1, 2, 3 as dimension tables,If you have to make table 1, 2, 3 as dimension tables,
then the fact table will not be able to recognize thethen the fact table will not be able to recognize the
same person having 3 foreign keys from those tablessame person having 3 foreign keys from those tables
44
Rushdi Shams, Dept of CSE, KUET
Surrogate Key: Important Key in DataSurrogate Key: Important Key in Data
WarehouseWarehouse
45
Rushdi Shams, Dept of CSE, KUET
Understanding the Fact TableUnderstanding the Fact Table
 Facts are numeric valuesFacts are numeric values
 Facts are not the foreign key fieldsFacts are not the foreign key fields
 The foreign keys are used to provide moreThe foreign keys are used to provide more
detail with a fact- which are the main focus ofdetail with a fact- which are the main focus of
the businessthe business
46
Rushdi Shams, Dept of CSE, KUET
ReferenceReference
 Beginning Database Design by GavinBeginning Database Design by Gavin
Powell, Wrox Publications, 2005Powell, Wrox Publications, 2005
47

Mais conteúdo relacionado

Semelhante a L16 l17 Data Warehousing

OLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseOLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseAtScale
 
Sql on hadoop the secret presentation.3pptx
Sql on hadoop  the secret presentation.3pptxSql on hadoop  the secret presentation.3pptx
Sql on hadoop the secret presentation.3pptxPaulo Alonso
 
Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopHadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopAdam Muise
 
Next Generation Hadoop Introduction
Next Generation Hadoop IntroductionNext Generation Hadoop Introduction
Next Generation Hadoop IntroductionAdam Muise
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayAjay Shriwastava
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really DoingDave Stokes
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerDataWorks Summit
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report Tom Donoghue
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Denodo
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Data ware house design
Data ware house designData ware house design
Data ware house designSayed Ahmed
 

Semelhante a L16 l17 Data Warehousing (20)

Big data analysis concepts and references
Big data analysis concepts and referencesBig data analysis concepts and references
Big data analysis concepts and references
 
OLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure SynapseOLAP on the Cloud with Azure Databricks and Azure Synapse
OLAP on the Cloud with Azure Databricks and Azure Synapse
 
ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Sql on hadoop the secret presentation.3pptx
Sql on hadoop  the secret presentation.3pptxSql on hadoop  the secret presentation.3pptx
Sql on hadoop the secret presentation.3pptx
 
Hadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of HadoopHadoop at the Center: The Next Generation of Hadoop
Hadoop at the Center: The Next Generation of Hadoop
 
Next Generation Hadoop Introduction
Next Generation Hadoop IntroductionNext Generation Hadoop Introduction
Next Generation Hadoop Introduction
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjay
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really Doing
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report
 
Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big Data
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Data Warehouse-Final
Data Warehouse-FinalData Warehouse-Final
Data Warehouse-Final
 
Chapter29.ppt
Chapter29.pptChapter29.ppt
Chapter29.ppt
 
Data Mining
Data MiningData Mining
Data Mining
 
Data ware house design
Data ware house designData ware house design
Data ware house design
 

Mais de Rushdi Shams

Research Methodology and Tips on Better Research
Research Methodology and Tips on Better ResearchResearch Methodology and Tips on Better Research
Research Methodology and Tips on Better ResearchRushdi Shams
 
Common evaluation measures in NLP and IR
Common evaluation measures in NLP and IRCommon evaluation measures in NLP and IR
Common evaluation measures in NLP and IRRushdi Shams
 
Machine learning with nlp 101
Machine learning with nlp 101Machine learning with nlp 101
Machine learning with nlp 101Rushdi Shams
 
Semi-supervised classification for natural language processing
Semi-supervised classification for natural language processingSemi-supervised classification for natural language processing
Semi-supervised classification for natural language processingRushdi Shams
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: ParsingRushdi Shams
 
Types of machine translation
Types of machine translationTypes of machine translation
Types of machine translationRushdi Shams
 
L1 l2 l3 introduction to machine translation
L1 l2 l3  introduction to machine translationL1 l2 l3  introduction to machine translation
L1 l2 l3 introduction to machine translationRushdi Shams
 
Syntax and semantics
Syntax and semanticsSyntax and semantics
Syntax and semanticsRushdi Shams
 
Propositional logic
Propositional logicPropositional logic
Propositional logicRushdi Shams
 
Probabilistic logic
Probabilistic logicProbabilistic logic
Probabilistic logicRushdi Shams
 
Knowledge structure
Knowledge structureKnowledge structure
Knowledge structureRushdi Shams
 
Knowledge representation
Knowledge representationKnowledge representation
Knowledge representationRushdi Shams
 
L5 understanding hacking
L5  understanding hackingL5  understanding hacking
L5 understanding hackingRushdi Shams
 
L2 Intrusion Detection System (IDS)
L2  Intrusion Detection System (IDS)L2  Intrusion Detection System (IDS)
L2 Intrusion Detection System (IDS)Rushdi Shams
 

Mais de Rushdi Shams (20)

Research Methodology and Tips on Better Research
Research Methodology and Tips on Better ResearchResearch Methodology and Tips on Better Research
Research Methodology and Tips on Better Research
 
Common evaluation measures in NLP and IR
Common evaluation measures in NLP and IRCommon evaluation measures in NLP and IR
Common evaluation measures in NLP and IR
 
Machine learning with nlp 101
Machine learning with nlp 101Machine learning with nlp 101
Machine learning with nlp 101
 
Semi-supervised classification for natural language processing
Semi-supervised classification for natural language processingSemi-supervised classification for natural language processing
Semi-supervised classification for natural language processing
 
Natural Language Processing: Parsing
Natural Language Processing: ParsingNatural Language Processing: Parsing
Natural Language Processing: Parsing
 
Types of machine translation
Types of machine translationTypes of machine translation
Types of machine translation
 
L1 l2 l3 introduction to machine translation
L1 l2 l3  introduction to machine translationL1 l2 l3  introduction to machine translation
L1 l2 l3 introduction to machine translation
 
Syntax and semantics
Syntax and semanticsSyntax and semantics
Syntax and semantics
 
Propositional logic
Propositional logicPropositional logic
Propositional logic
 
Probabilistic logic
Probabilistic logicProbabilistic logic
Probabilistic logic
 
L15 fuzzy logic
L15  fuzzy logicL15  fuzzy logic
L15 fuzzy logic
 
Knowledge structure
Knowledge structureKnowledge structure
Knowledge structure
 
Knowledge representation
Knowledge representationKnowledge representation
Knowledge representation
 
First order logic
First order logicFirst order logic
First order logic
 
Belief function
Belief functionBelief function
Belief function
 
L5 understanding hacking
L5  understanding hackingL5  understanding hacking
L5 understanding hacking
 
L4 vpn
L4  vpnL4  vpn
L4 vpn
 
L3 defense
L3  defenseL3  defense
L3 defense
 
L2 Intrusion Detection System (IDS)
L2  Intrusion Detection System (IDS)L2  Intrusion Detection System (IDS)
L2 Intrusion Detection System (IDS)
 
L1 phishing
L1  phishingL1  phishing
L1 phishing
 

Último

Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 

Último (20)

Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 

L16 l17 Data Warehousing

  • 1. Rushdi Shams, Dept of CSE, KUET Database SystemsDatabase Systems Data WarehousingData Warehousing Version 1.0Version 1.0 1
  • 2. Rushdi Shams, Dept of CSE, KUET The Advent of Data WarehousingThe Advent of Data Warehousing  The existing database models were notThe existing database models were not suitable to meet the requirements.suitable to meet the requirements.  The requirements can be categorized intoThe requirements can be categorized into two-two- 1.1. Operational UseOperational Use 2.2. Decision Support UseDecision Support Use 2
  • 3. Rushdi Shams, Dept of CSE, KUET Operational UseOperational Use  Requires precise, accurate, andRequires precise, accurate, and instantinstant picture ofpicture of databasedatabase  Day to day basis business-Day to day basis business- 1.1. Customer comesCustomer comes 2.2. Orders partsOrders parts 1.1. Search the partsSearch the parts 2.2. Book/purchase the partsBook/purchase the parts 3.3. Add datesAdd dates 1.1. Bank transactions on the purchase/ bookingBank transactions on the purchase/ booking 2.2. InvoiceInvoice 3
  • 4. Rushdi Shams, Dept of CSE, KUET Operational UseOperational Use  Customer-company direct interactionCustomer-company direct interaction  All the information are processedAll the information are processed instantaneously (or almost instantaneously)instantaneously (or almost instantaneously) 4
  • 5. Rushdi Shams, Dept of CSE, KUET Decision Support UseDecision Support Use  Operational use magnifies the scope-Operational use magnifies the scope- Which customer, where he lives, what is hisWhich customer, where he lives, what is his phone number, which part he bought, howphone number, which part he bought, how much he paid, what was the date, bla bla bla…much he paid, what was the date, bla bla bla…  Decision support use narrows the scope-Decision support use narrows the scope- I need only the business related issues- whichI need only the business related issues- which customer, which part he bought, how muchcustomer, which part he bought, how much he paid and what was the datehe paid and what was the date 5
  • 6. Rushdi Shams, Dept of CSE, KUET Decision Support UseDecision Support Use  …… & the benefits are-& the benefits are-  In december, the company may need to stockIn december, the company may need to stock DDR RAM more than HDDDDR RAM more than HDD  SATA HDDs are more sold than PATA HDDsSATA HDDs are more sold than PATA HDDs  Mr. X is our honourable customer who boughtMr. X is our honourable customer who bought most of the RAMs and Mr. Y is our honourablemost of the RAMs and Mr. Y is our honourable customer who bought most of the SATA HDDscustomer who bought most of the SATA HDDs 6
  • 7. Rushdi Shams, Dept of CSE, KUET And The War Begins…And The War Begins…  So, the conflict between lightspeedSo, the conflict between lightspeed applications (OLTP) and slog futureapplications (OLTP) and slog future predictions led an advent of datapredictions led an advent of data warehousing.warehousing. 7
  • 8. Rushdi Shams, Dept of CSE, KUET Relational DatabasesRelational Databases  Too granular, too many little piecesToo granular, too many little pieces  Processing takes longer time for largerProcessing takes longer time for larger transactions by joining those little piecestransactions by joining those little pieces  Very effective for Front End applications thatVery effective for Front End applications that are accessed by too many people tooare accessed by too many people too frequentlyfrequently  Requires less hardware specificationRequires less hardware specification 8
  • 9. Rushdi Shams, Dept of CSE, KUET Data warehousingData warehousing  Processes large amount of informationProcesses large amount of information  Too less users (basically the owners)Too less users (basically the owners)  Mainly for reporting and analysisMainly for reporting and analysis  Hardware requirements are hugeHardware requirements are huge 9
  • 10. Rushdi Shams, Dept of CSE, KUET The relation between themThe relation between them  Data Warehousing is simplest form ofData Warehousing is simplest form of relational databaserelational database  Try to only add data and remove data…Try to only add data and remove data… because most often changing requires hugebecause most often changing requires huge data processingdata processing  And you often do mistake in Keys for just twoAnd you often do mistake in Keys for just two records, in this case you are dealing withrecords, in this case you are dealing with millions of records- so, think about datamillions of records- so, think about data modificationsmodifications 10
  • 11. Rushdi Shams, Dept of CSE, KUET The relation between themThe relation between them  The one-many / many-many / many-oneThe one-many / many-many / many-one relations and key constraints of relationalrelations and key constraints of relational model is still present in data warehousingmodel is still present in data warehousing 11
  • 12. Rushdi Shams, Dept of CSE, KUET The Dimensional Data ModelThe Dimensional Data Model  So, if data warehouse needs a different dataSo, if data warehouse needs a different data model rather than relational model, what thatmodel rather than relational model, what that would be?would be?  The answer is dimensional data modelThe answer is dimensional data model 12
  • 13. Rushdi Shams, Dept of CSE, KUET The Dimensional Data ModelThe Dimensional Data Model  Contains-Contains- 1.1. FactsFacts 2.2. DimensionsDimensions  Fact table contains transactions. ForFact table contains transactions. For example, invoices of all the customers forexample, invoices of all the customers for the last 5 years.the last 5 years.  The dimension tables describe the fact table.The dimension tables describe the fact table. 13
  • 14. Rushdi Shams, Dept of CSE, KUET The Dimensional Data ModelThe Dimensional Data Model Static Data Dynamic Data 14
  • 15. Rushdi Shams, Dept of CSE, KUET The Star SchemaThe Star Schema  The most effective approach to model dataThe most effective approach to model data using dimensional data model is theusing dimensional data model is the StarStar SchemaSchema 15
  • 16. Rushdi Shams, Dept of CSE, KUET The Star SchemaThe Star Schema 16
  • 17. Rushdi Shams, Dept of CSE, KUET The Star Schema: Equivalent DiagramThe Star Schema: Equivalent Diagram 17
  • 18. Rushdi Shams, Dept of CSE, KUET The Star Schema: PropertiesThe Star Schema: Properties  So, a star schema contains a fact table- whichSo, a star schema contains a fact table- which is robust as the time goes by, very dynamic,is robust as the time goes by, very dynamic, changes all the timechanges all the time  A star schema contains dimension tables-A star schema contains dimension tables- which are static, changes very little as thewhich are static, changes very little as the time goes bytime goes by  Star schema aids queries to join a bulky factStar schema aids queries to join a bulky fact table with dimension tables to be simple andtable with dimension tables to be simple and not time complexnot time complex 18
  • 19. Rushdi Shams, Dept of CSE, KUET The Snowflake SchemaThe Snowflake Schema  Normalized star schemaNormalized star schema  Only the dimensions are normalizedOnly the dimensions are normalized  The result is a fact table connected directlyThe result is a fact table connected directly with some dimension tables and somewith some dimension tables and some dimension tables connected to otherdimension tables connected to other dimension tablesdimension tables 19
  • 20. Rushdi Shams, Dept of CSE, KUET The Snowflake SchemaThe Snowflake Schema Fact Table Normalized Dimension Dimension 20
  • 21. Rushdi Shams, Dept of CSE, KUET The Snowflake Schema: EquivalentThe Snowflake Schema: Equivalent ViewView 21
  • 22. Rushdi Shams, Dept of CSE, KUET The ProblemThe Problem  Not too many tables but too many layersNot too many tables but too many layers  The most used Relational algebra inThe most used Relational algebra in dimensional database isdimensional database is JoinJoin  Too many tables in joins, too many overheads.Too many tables in joins, too many overheads.  There are not many tables hereThere are not many tables here   But too many layers, joining one tableBut too many layers, joining one table requires joining other related tablesrequires joining other related tables   And if one of those tables (Fact) have trillionsAnd if one of those tables (Fact) have trillions of data, you are dead!of data, you are dead! 22
  • 23. Rushdi Shams, Dept of CSE, KUET The ProblemThe Problem  If the SALE fact table has 1 million records, and all dimensions contain 10 records each, a Cartesian product would return 106 multiplied by 109 records. That makes for 1015 records 23
  • 24. Rushdi Shams, Dept of CSE, KUET The SolutionThe Solution  Convert the snowflake schema into starConvert the snowflake schema into star schema.schema. 24
  • 25. Rushdi Shams, Dept of CSE, KUET The SolutionThe Solution 25
  • 26. Rushdi Shams, Dept of CSE, KUET The SolutionThe Solution  a join occurs between one fact table and six dimensional tables. That is a Cartesian product of 106 multiple by 106 , resulting in 1012 records returned. 26
  • 27. Rushdi Shams, Dept of CSE, KUET The DifferenceThe Difference  The difference between 1012 and 1015 is three decimals.  Three decimals is not just three zeroes and thus 1,000 records. The difference is actually 1,000,000,000,000,000 – 1,000,000,000,000 = 999,000,000,000,000. 27
  • 28. Rushdi Shams, Dept of CSE, KUET Types of Dimension TablesTypes of Dimension Tables  Dimension tables showed so far areDimension tables showed so far are inadequateinadequate  Typically, there are some conventions forTypically, there are some conventions for dimension tables.dimension tables.  Such as dates and locations are two commonSuch as dates and locations are two common dimension tables in data warehouses.dimension tables in data warehouses.  Why?? Most businesses have two commonWhy?? Most businesses have two common issues- date of a transaction, place ofissues- date of a transaction, place of shipment/ deliveryshipment/ delivery 28
  • 29. Rushdi Shams, Dept of CSE, KUET Types of Dimension Tables: DatesTypes of Dimension Tables: Dates 29
  • 30. Rushdi Shams, Dept of CSE, KUET Types of Dimension Tables: DatesTypes of Dimension Tables: Dates 30
  • 31. Rushdi Shams, Dept of CSE, KUET Types of Dimension Tables: LocationsTypes of Dimension Tables: Locations  Locations, states, country, continent, etcLocations, states, country, continent, etc 31
  • 32. Rushdi Shams, Dept of CSE, KUET Let’s Create a DataLet’s Create a Data Warehouse ModelWarehouse Model 32
  • 33. Rushdi Shams, Dept of CSE, KUET The Relational ModelThe Relational Model 33
  • 34. Rushdi Shams, Dept of CSE, KUET Step 1Step 1  Identify the Fact tableIdentify the Fact table  The Fact table contains (mostly) transactionsThe Fact table contains (mostly) transactions that occur day-to-day basis/ that are relatedthat occur day-to-day basis/ that are related with money/ anything that is the mainwith money/ anything that is the main purpose of a businesspurpose of a business 34
  • 35. Rushdi Shams, Dept of CSE, KUET Step 1: Finding the Fact TableStep 1: Finding the Fact Table 35
  • 36. Rushdi Shams, Dept of CSE, KUET Step 1Step 1  So, our fact table would be (in this case)So, our fact table would be (in this case) RoyaltyRoyalty 36
  • 37. Rushdi Shams, Dept of CSE, KUET Step 2: Find Dimension TablesStep 2: Find Dimension Tables  Find the tables that are static, not dynamic…Find the tables that are static, not dynamic… dynamic one is the Fact table.dynamic one is the Fact table.  We will take a look at both the staticWe will take a look at both the static (dimension) tables and dynamic (fact) tables(dimension) tables and dynamic (fact) tables when we will finish step 3when we will finish step 3 37
  • 38. Rushdi Shams, Dept of CSE, KUET Step 3Step 3  Develop a snowflake schema with the fact andDevelop a snowflake schema with the fact and dimension tablesdimension tables 38
  • 39. Rushdi Shams, Dept of CSE, KUET Step 3: Snowflake SchemaStep 3: Snowflake Schema 39
  • 40. Rushdi Shams, Dept of CSE, KUET Step 3: Snowflake SchemaStep 3: Snowflake Schema 40
  • 41. Rushdi Shams, Dept of CSE, KUET Step 4Step 4  Develop a star schema by denormalizing theDevelop a star schema by denormalizing the snowflake schemasnowflake schema 41
  • 42. Rushdi Shams, Dept of CSE, KUET Step 4: Star SchemaStep 4: Star Schema 42
  • 43. Rushdi Shams, Dept of CSE, KUET Step 4: Star SchemaStep 4: Star Schema 43
  • 44. Rushdi Shams, Dept of CSE, KUET Surrogate Key: Important Key in DataSurrogate Key: Important Key in Data WarehouseWarehouse  A customer is recognized in table 1 by customerA customer is recognized in table 1 by customer namename  The same person in table 2 is recognized byThe same person in table 2 is recognized by telephone numbertelephone number  The same person in table 3 is recognized by SSNThe same person in table 3 is recognized by SSN numbernumber  If you have to make table 1, 2, 3 as dimension tables,If you have to make table 1, 2, 3 as dimension tables, then the fact table will not be able to recognize thethen the fact table will not be able to recognize the same person having 3 foreign keys from those tablessame person having 3 foreign keys from those tables 44
  • 45. Rushdi Shams, Dept of CSE, KUET Surrogate Key: Important Key in DataSurrogate Key: Important Key in Data WarehouseWarehouse 45
  • 46. Rushdi Shams, Dept of CSE, KUET Understanding the Fact TableUnderstanding the Fact Table  Facts are numeric valuesFacts are numeric values  Facts are not the foreign key fieldsFacts are not the foreign key fields  The foreign keys are used to provide moreThe foreign keys are used to provide more detail with a fact- which are the main focus ofdetail with a fact- which are the main focus of the businessthe business 46
  • 47. Rushdi Shams, Dept of CSE, KUET ReferenceReference  Beginning Database Design by GavinBeginning Database Design by Gavin Powell, Wrox Publications, 2005Powell, Wrox Publications, 2005 47