SlideShare uma empresa Scribd logo
1 de 33
Data Warehousing

1
Overview
•
•
•
•
•
•

What is data warehouse?
Why data warehouse?
Data reconciliation – ETL process
Data warehouse architectures
Star schema – dimensional modeling
Data analysis

2
What is Data Warehouse?
• Defined in many different ways, but not rigorously.
– A decision support database that is maintained separately from
the organization’s operational database
– Support information processing by providing a solid platform of
consolidated, historical data for analysis.

• “A data warehouse is a subject-oriented, integrated, timevariant, and nonvolatile collection of data in support of
management’s decision-making process.”—W. H. Inmon
• Data warehousing:
– The process of constructing and using data warehouses

3
Data Warehouse—SubjectOriented
• Organized around major subjects, such as
customer, product, sales
• Focusing on the modeling and analysis of data
for decision makers, not on daily operations or
transaction processing
• Provide a simple and concise view around
particular subject issues by excluding data that
are not useful in the decision support process
4
Data Warehouse—Integrated
• Constructed by integrating multiple,
heterogeneous data sources
– relational databases, flat files, on-line transaction
records

• Data cleaning and data integration techniques
are applied.
– Ensure consistency in naming conventions,
encoding structures, attribute measures, etc. among
different data sources
• E.g., Hotel price: currency, tax, breakfast covered, etc.

– When data is moved to the warehouse, it is
converted.
5
Data Warehouse—Time Variant
• The time horizon for the data warehouse is
significantly longer than that of operational
systems
– Operational database: current value data
– Data warehouse data: provide information from a
historical perspective (e.g., past 5-10 years)

• Every key structure in the data warehouse
– Contains an element of time, explicitly or implicitly
– But the key of operational data may or may not
6
contain “time element”
Data Warehouse—Nonvolatile
• A physically separate store of data transformed
from the operational environment
• Operational update of data does not occur in the
data warehouse environment
– Does not require transaction processing, recovery,
and concurrency control mechanisms
– Requires only two operations in data accessing:
• initial loading of data and access of data
7
Trends in Organisations that encourage
the need for data warehousing
• No single system of record
• Multiple systems are not synchronized
• Organisations want to analyse the activities in
a balanced way
• Customer relationship management
• Supplier relationship management

8
Need for Data Warehousing
• Integrated, company-wide view of high-quality
information (from different databases)
• Separation of operational and informational systems
and data (for improved performance)

9
Operational & Informational System
The need to separate operational and informational
systems is based on three primary factors:
• A data warehouse centralizes data that are scattered
throughout disparate operational systems and make them
a available for decision support applications
• A properly designed data warehouse adds value to data
by improving their quality
• A separate data warehouse eliminates much of contention
for resources that result when informational application
confounded with operational processing
10
11
Data Reconciliation
• Typical operational data is:
– Transient – not historical
– Not normalised (perhaps due to denormalisation for
performance)
– Restricted in scope – not comprehensive
– Sometimes poor quality – inconsistencies and errors

• After ETL (Extract, Transform, Load), data
should be:
–
–
–
–
–

Detailed – not summarized yet
Historical – periodic
Normalised – 3rd normal form or higher
Comprehensive – enterprise-wide perspective
Timely – data should be current enough to assist decisionmaking
– Quality controlled – accurate with full integrity

12
The ETL Process/ Data
Reconciliation Main Steps
•
•
•
•

Capture/Extract
Scrub or data cleansing
Transform
Load and Index

13
Static extract = capturing a

Incremental extract =

snapshot of the source data at a point
in time

capturing changes that have
occurred since the last static extract
14
Fixing errors: misspellings,

Also: decoding, reformatting, time

erroneous dates, incorrect field usage,
mismatched addresses, missing data,
duplicate data, inconsistencies

stamping, conversion, key generation,
merging, error detection/logging, locating
15
missing data
Record-level:

Field-level:

Selection – data partitioning
Joining – data combining
Aggregation – data summarization

single-field – from one field to one field
multi-field – from many fields to one, or
16
one field to many
Refresh mode: bulk rewriting of
target data at periodic intervals

Update mode: only changes in
source data are written to data
17
warehouse
Data Warehouse Architectures
• Generic Two-Level Architecture
• Independent Data Mart
• Dependent Data Mart and Operational
Data Store
• Logical Data Mart and @ctive
Warehouse

18
Generic two-level architecture

L
One companywide warehouse

T
E
Periodic extraction  data is not completely current in warehouse
19
Independent data mart

Data marts:
Mini-warehouses, limited in scope

L

T
E

Separate ETL for each
independent data mart

Data access complexity
due to multiple data marts
20
Dependent data mart with
operational data store

ODS provides option for
obtaining current data

L

T
E
Single ETL for
enterprise data warehouse (EDW)

Dependent data marts
loaded from21
EDW
ODS and data warehouse
are one and the same

L

T
E
Near real-time ETL for
@active Data Warehouse

Data marts are NOT separate
databases, but logical views of the
data warehouse
22
 Easier to create new data marts
Data Characteristics
Status vs. Event Data
Status

Event – a database action
(create/update/delete) that
results from a transaction

Status
23
Data Characteristics
Transient vs.
Periodic Data

Changes to existing
records are written
over previous
records, thus
destroying the
previous data content
Data are never
physically altered or
deleted once they
have been added to
the store

24
star schema
Fact tables contain
factual or quantitative
data

1:N relationship
between dimension
tables and fact
tables

Dimension tables
are denormalized to
maximize
performance

Dimension tables contain
descriptions about the
subjects of the business

Star Schema: Simple database design in
which dimensional data are separated from
fact data. Excellent for queries, but bad for
25
online transaction processing
Star schema example
Fact table provides statistics for sales broken
down by product, period and store dimensions

26
27
On-Line Analytical Processing (OLAP)
• The use of a set of graphical tools that
provides users with multidimensional views of
their data and allows them to analyze the
data using simple windowing techniques
• Relational OLAP (ROLAP)
– Traditional relational representation

• Multidimensional OLAP (MOLAP)
– Cube structure

• OLAP Operations
– Cube slicing – come up with 2-D view of data
– Drill-down – going from summary to more
detailed views
28
Data Warehouse vs. Operational
DBMS
• OLTP (on-line transaction processing)
– Major task of traditional relational DBMS
– Day-to-day operations: purchasing, inventory, banking,
manufacturing, payroll, registration, accounting, etc.

• OLAP (on-line analytical processing)
– Major task of data warehouse system
– Data analysis and decision making

• Distinct features (OLTP vs. OLAP):
– User and system orientation: customer vs. market
– Data contents: current, detailed vs. historical, consolidated
– Database design: ER + application vs. star + subject
– View: current, local vs. evolutionary, integrated
– Access patterns: update vs. read-only but complex queries
29
OLTP vs. OLAP
OLTP

OLAP

users

clerk, IT professional

knowledge worker

function

day to day operations

decision support

DB design

application-oriented

subject-oriented

data

current, up-to-date
detailed, flat relational
isolated
repetitive

historical,
summarized, multidimensional
integrated, consolidated
ad-hoc
lots of scans

unit of work

read/write
index/hash on prim. key
short, simple transaction

# records accessed

tens

millions

#users

thousands

hundreds

DB size

100MB-GB

100GB-TB

metric

transaction throughput

query throughput, response

usage
access

complex query

30
Slicing a data cube

31
Summary report

Example:
Drill-down
Drill-down with color added

32
Data Warehouse Usage
• Three kinds of data warehouse applications
– Information processing
• supports querying, basic statistical analysis, and reporting
using crosstabs, tables, charts and graphs
– Analytical processing
• multidimensional analysis of data warehouse data
• supports basic OLAP operations, slice-dice, drilling, pivoting
– Data mining
• knowledge discovery from hidden patterns
• supports associations, constructing analytical models,
performing classification and prediction, and presenting the
mining results using visualization tools

33

Mais conteúdo relacionado

Mais procurados

ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingVibrant Event
 
Data warehousing
Data warehousingData warehousing
Data warehousingVarun Jain
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data martAmit Sarkar
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architectureuncleRhyme
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEyad Manna
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyAnkita Dubey
 
Business intelligence and data warehouses
Business intelligence and data warehousesBusiness intelligence and data warehouses
Business intelligence and data warehousesDhani Ahmad
 
Basics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration TechniquesBasics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration TechniquesValmik Potbhare
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etlAashish Rathod
 
Classification of data mart
Classification of data martClassification of data mart
Classification of data martkhush_boo31
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technologyDataminingTools Inc
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehousekiran14360
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 

Mais procurados (19)

Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehousing ppt
Data warehousing pptData warehousing ppt
Data warehousing ppt
 
ETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testingETL Testing - Introduction to ETL testing
ETL Testing - Introduction to ETL testing
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubey
 
Datawarehouse org
Datawarehouse orgDatawarehouse org
Datawarehouse org
 
Business intelligence and data warehouses
Business intelligence and data warehousesBusiness intelligence and data warehouses
Business intelligence and data warehouses
 
Basics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration TechniquesBasics of Microsoft Business Intelligence and Data Integration Techniques
Basics of Microsoft Business Intelligence and Data Integration Techniques
 
data warehouse , data mart, etl
data warehouse , data mart, etldata warehouse , data mart, etl
data warehouse , data mart, etl
 
Classification of data mart
Classification of data martClassification of data mart
Classification of data mart
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data warehousing unit 1
Data warehousing unit 1Data warehousing unit 1
Data warehousing unit 1
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 

Semelhante a Ch1 data-warehousing

Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Vibrant Technologies & Computers
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introductionMurli Jha
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxSalehaMariyam
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptxhqlm1
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biA P
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptRafiulHasan19
 
Data Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.pptData Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.pptMutiaSari53
 
SAP HANA Architecture Overview | SAP HANA Tutorial
SAP HANA Architecture Overview | SAP HANA TutorialSAP HANA Architecture Overview | SAP HANA Tutorial
SAP HANA Architecture Overview | SAP HANA TutorialZaranTech LLC
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptSubrata Kumer Paul
 
Master data management and data warehousing
Master data management and data warehousingMaster data management and data warehousing
Master data management and data warehousingZahra Mansoori
 
Introduction to data warehouse dmbi
Introduction to data warehouse dmbiIntroduction to data warehouse dmbi
Introduction to data warehouse dmbiShaishavShah8
 

Semelhante a Ch1 data-warehousing (20)

Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Chpt2.ppt
Chpt2.pptChpt2.ppt
Chpt2.ppt
 
dataWarehouse.pptx
dataWarehouse.pptxdataWarehouse.pptx
dataWarehouse.pptx
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
 
DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
 
Data Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.pptData Mining Concept & Technique-ch04.ppt
Data Mining Concept & Technique-ch04.ppt
 
SAP HANA Architecture Overview | SAP HANA Tutorial
SAP HANA Architecture Overview | SAP HANA TutorialSAP HANA Architecture Overview | SAP HANA Tutorial
SAP HANA Architecture Overview | SAP HANA Tutorial
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.pptChapter 4. Data Warehousing and On-Line Analytical Processing.ppt
Chapter 4. Data Warehousing and On-Line Analytical Processing.ppt
 
Master data management and data warehousing
Master data management and data warehousingMaster data management and data warehousing
Master data management and data warehousing
 
Introduction to data warehouse dmbi
Introduction to data warehouse dmbiIntroduction to data warehouse dmbi
Introduction to data warehouse dmbi
 
2. olap warehouse
2. olap warehouse2. olap warehouse
2. olap warehouse
 

Último

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxElton John Embodo
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 

Último (20)

YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 

Ch1 data-warehousing

  • 2. Overview • • • • • • What is data warehouse? Why data warehouse? Data reconciliation – ETL process Data warehouse architectures Star schema – dimensional modeling Data analysis 2
  • 3. What is Data Warehouse? • Defined in many different ways, but not rigorously. – A decision support database that is maintained separately from the organization’s operational database – Support information processing by providing a solid platform of consolidated, historical data for analysis. • “A data warehouse is a subject-oriented, integrated, timevariant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon • Data warehousing: – The process of constructing and using data warehouses 3
  • 4. Data Warehouse—SubjectOriented • Organized around major subjects, such as customer, product, sales • Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing • Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process 4
  • 5. Data Warehouse—Integrated • Constructed by integrating multiple, heterogeneous data sources – relational databases, flat files, on-line transaction records • Data cleaning and data integration techniques are applied. – Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources • E.g., Hotel price: currency, tax, breakfast covered, etc. – When data is moved to the warehouse, it is converted. 5
  • 6. Data Warehouse—Time Variant • The time horizon for the data warehouse is significantly longer than that of operational systems – Operational database: current value data – Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) • Every key structure in the data warehouse – Contains an element of time, explicitly or implicitly – But the key of operational data may or may not 6 contain “time element”
  • 7. Data Warehouse—Nonvolatile • A physically separate store of data transformed from the operational environment • Operational update of data does not occur in the data warehouse environment – Does not require transaction processing, recovery, and concurrency control mechanisms – Requires only two operations in data accessing: • initial loading of data and access of data 7
  • 8. Trends in Organisations that encourage the need for data warehousing • No single system of record • Multiple systems are not synchronized • Organisations want to analyse the activities in a balanced way • Customer relationship management • Supplier relationship management 8
  • 9. Need for Data Warehousing • Integrated, company-wide view of high-quality information (from different databases) • Separation of operational and informational systems and data (for improved performance) 9
  • 10. Operational & Informational System The need to separate operational and informational systems is based on three primary factors: • A data warehouse centralizes data that are scattered throughout disparate operational systems and make them a available for decision support applications • A properly designed data warehouse adds value to data by improving their quality • A separate data warehouse eliminates much of contention for resources that result when informational application confounded with operational processing 10
  • 11. 11
  • 12. Data Reconciliation • Typical operational data is: – Transient – not historical – Not normalised (perhaps due to denormalisation for performance) – Restricted in scope – not comprehensive – Sometimes poor quality – inconsistencies and errors • After ETL (Extract, Transform, Load), data should be: – – – – – Detailed – not summarized yet Historical – periodic Normalised – 3rd normal form or higher Comprehensive – enterprise-wide perspective Timely – data should be current enough to assist decisionmaking – Quality controlled – accurate with full integrity 12
  • 13. The ETL Process/ Data Reconciliation Main Steps • • • • Capture/Extract Scrub or data cleansing Transform Load and Index 13
  • 14. Static extract = capturing a Incremental extract = snapshot of the source data at a point in time capturing changes that have occurred since the last static extract 14
  • 15. Fixing errors: misspellings, Also: decoding, reformatting, time erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies stamping, conversion, key generation, merging, error detection/logging, locating 15 missing data
  • 16. Record-level: Field-level: Selection – data partitioning Joining – data combining Aggregation – data summarization single-field – from one field to one field multi-field – from many fields to one, or 16 one field to many
  • 17. Refresh mode: bulk rewriting of target data at periodic intervals Update mode: only changes in source data are written to data 17 warehouse
  • 18. Data Warehouse Architectures • Generic Two-Level Architecture • Independent Data Mart • Dependent Data Mart and Operational Data Store • Logical Data Mart and @ctive Warehouse 18
  • 19. Generic two-level architecture L One companywide warehouse T E Periodic extraction  data is not completely current in warehouse 19
  • 20. Independent data mart Data marts: Mini-warehouses, limited in scope L T E Separate ETL for each independent data mart Data access complexity due to multiple data marts 20
  • 21. Dependent data mart with operational data store ODS provides option for obtaining current data L T E Single ETL for enterprise data warehouse (EDW) Dependent data marts loaded from21 EDW
  • 22. ODS and data warehouse are one and the same L T E Near real-time ETL for @active Data Warehouse Data marts are NOT separate databases, but logical views of the data warehouse 22  Easier to create new data marts
  • 23. Data Characteristics Status vs. Event Data Status Event – a database action (create/update/delete) that results from a transaction Status 23
  • 24. Data Characteristics Transient vs. Periodic Data Changes to existing records are written over previous records, thus destroying the previous data content Data are never physically altered or deleted once they have been added to the store 24
  • 25. star schema Fact tables contain factual or quantitative data 1:N relationship between dimension tables and fact tables Dimension tables are denormalized to maximize performance Dimension tables contain descriptions about the subjects of the business Star Schema: Simple database design in which dimensional data are separated from fact data. Excellent for queries, but bad for 25 online transaction processing
  • 26. Star schema example Fact table provides statistics for sales broken down by product, period and store dimensions 26
  • 27. 27
  • 28. On-Line Analytical Processing (OLAP) • The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques • Relational OLAP (ROLAP) – Traditional relational representation • Multidimensional OLAP (MOLAP) – Cube structure • OLAP Operations – Cube slicing – come up with 2-D view of data – Drill-down – going from summary to more detailed views 28
  • 29. Data Warehouse vs. Operational DBMS • OLTP (on-line transaction processing) – Major task of traditional relational DBMS – Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. • OLAP (on-line analytical processing) – Major task of data warehouse system – Data analysis and decision making • Distinct features (OLTP vs. OLAP): – User and system orientation: customer vs. market – Data contents: current, detailed vs. historical, consolidated – Database design: ER + application vs. star + subject – View: current, local vs. evolutionary, integrated – Access patterns: update vs. read-only but complex queries 29
  • 30. OLTP vs. OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date detailed, flat relational isolated repetitive historical, summarized, multidimensional integrated, consolidated ad-hoc lots of scans unit of work read/write index/hash on prim. key short, simple transaction # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response usage access complex query 30
  • 31. Slicing a data cube 31
  • 33. Data Warehouse Usage • Three kinds of data warehouse applications – Information processing • supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs – Analytical processing • multidimensional analysis of data warehouse data • supports basic OLAP operations, slice-dice, drilling, pivoting – Data mining • knowledge discovery from hidden patterns • supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools 33