SlideShare uma empresa Scribd logo
1 de 40
Data
Warehousing
represented by
Murli
Data Warehousing
•Aims of information technology:
• To help workers in their everyday business activity
and improve their productivity – clerical data
processing tasks
• To help knowledge Employee (executives,
managers, analysts) make faster and better decisions
– decision support systems
•Two types of applications:
• Operational applications
• Analytical applications
•In most organizations, data about specific parts of
business is there - lots and lots of data, somewhere, in
some form.
•Data is available but not information -- and not the
right information at the right time.
•There is a need to
• bring together information .
• off-load decision support applications from the on-line
transaction system
Data Warehousing (Contd..)
Data Warehouse
•“A data warehouse is a subject-oriented, integrated, time-
variant, and nonvolatile collection of data in support of
management’s decision-making process.” --- W. H. Inmon
•Collection of data that is used primarily in organizational
decision making
•A decision support database that is maintained separately
from the organization’s operational database
Data Warehouse - Subject
Oriented
•Data that gives information about a particular subject.
•Data for Model& Analysis.
•Provide a simple and concise view around particular
subject issues by excluding data that are not useful in the
decision support process.
Data Warehouse – Integrated
•It Constructed by integrating multiple, heterogeneous
data sources.
•Data cleaning and data integration techniques are
applied.
•When data is moved to the warehouse, it is converted
-
•
Data Warehouse - Time Variant
•Data is stable in a data warehouse.
•Its adds historical as well as current data.
•Every key structure in the data warehouse -
Contains an element of time, explicitly or implicitly
•But the key of operational data may or may not
contain “time element”.
Data Warehouse - Non-Volatile
• A physically separate store of data transformed from
the operational environment.
• No update & delete on historical data .
•Operational update of data does not occur in the data
warehouse
•Appended
• Initial loading of data and access of data.
Data modifications & schema
design
• A data warehouse is updated on a regular
basis by the ETL process (run nightly or weekly)
using bulk data modification techniques.
• Data warehouses often use denormalized or
partially denormalized schemas (such as a star
schema) to optimize query performance.
Why Separate Data Warehouse?
•Separate & historical data are needed for decision support.
•Complex decision .
•Missing Data.
•Data consolidation
.
•Data quality.
Advantages of Data Warehousing
•High query performance
•Queries not visible outside warehouse
•Local processing at sources unaffected
•Can operate when sources unavailable
•Can query data not stored in a DBMS
•Extra information at warehouse
• Modify, summarize (store aggregates)
• Add historical information
Decision Support System
• Information technology to help knowledge employees
(executives, managers, analysts) make faster and
better decisions
• OLAP is an element of decision support system
• Data mining is a powerful, high-performance data
analysis tool for decision support.
Three-Tier Decision Support
Systems•Warehouse database server
• Almost always a relational DBMS, rarely flat files
•OLAP servers
• Relational OLAP (ROLAP): extended relational DBMS that
maps operations on multidimensional data to standard
relational operators
• Multidimensional OLAP (MOLAP): special-purpose server
that directly implements multidimensional data and operations
•Clients
• Query and reporting tools
• Analysis tools
• Data mining tools
The Complete Decision Support
System
Information Sources Data Warehouse
Server
(Tier 1)
OLAP Servers
(Tier 2)
Clients
(Tier 3)
Operational
DB’s
Semistructured
Sources
extract
transform
load
refresh
etc.
Data Marts
Data
Warehouse
e.g., MOLAP
e.g., ROLAP
serve
OLAP
Query/Reporting
Data Mining
serve
serve
Data Sources
•Data sources are often the operational systems,
providing the lowest level of data.
•Data sources are designed for operational use, not for
decision support, and the data reflect this fact.
•Multiple data sources are often from different systems,
run on a wide range of hardware and much of the
software is built in-house or highly customized.
•Multiple data sources introduce a large number of
issues -- semantic conflicts.
Creating and Maintaining a
Warehouse
•Data warehouse needs several tools that automate or support tasks
such as:
• Data extraction from different external data sources,
operational databases, files of standard applications
• Data cleaning (finding and resolving inconsistency in the
source data)
• inconsistent field lengths, inconsistent descriptions, inconsistent value
assignments, missing entries and violation of integrity constraints.
• optional fields in data entry are significant sources of inconsistent data.
• Integration and transformation of data (between different data
formats, languages, etc.)
• Data loading (loading the data into the data warehouse)
• checking integrity constraints, sorting, summarizing, etc.
• Data replication (replicating source database into the data
warehouse)
• used to incrementally refresh a warehouse when sources change
• Data refreshment
• propagating updates on source data to the data stored in the warehouse
• Periodically or immediately
• Data archiving
Creating and Maintaining a
Warehouse
The Data Warehousing Models
•Enterprise Warehouse
• collects all the information about subjects spanning entire
organization
•Data Mart
• a subset of corporate-wide data that is of value to a specific
group of users
• its scope is confined to specific, selected groups, such as
marketing data mart
• Independent Vs. Dependent (directly from warehouse) data
mart
•Virtual warehouse
• a set of views over operational databases
• only some summary views are materialized
Physical Structure of Data
Warehouse
•There are three basic architectures for constructing a
data warehouse:
• Centralized
• Distributed
• Federated
• Tiered
•The data warehouse is distributed for: load
balancing, scalability and higher availability
The logical data
warehouse is
only virtual
•The central data
warehouse is physical
•There exist local data
marts on different tiers
which store copies or
summarization of the
previous tier.
Physical Structure of Data
Warehouse
(Contd..)
Data Processing Models
•There are two basic data processing models:
• OLTP (On-Line Transaction Processing)
• Describes processing at operational sites
• aim is reliable and efficient processing of a large number
of transactions and ensuring data consistency.
• OLAP (On-Line Analytical Processing)
• Describes processing at warehouse
• aim is efficient multidimensional processing of large data
volumes.
OLTP vs. OLAP
• OLTP OLAP
•users Clerk, IT professional Knowledge worker
•Function day to day operations decision support
•DB design application-oriented subject-oriented
•data current, up-to-date historical, summarized
• detailed, flat relational multidimensional
• isolated integrated,
consolidated
•usage repetitive ad-hoc
•access read/write, lots of scans
• index/hash on prim. key
•unit of work short, simple transaction complex query
•# records accessed tens millions
•#users thousands hundreds
•DB size 100MB-GB 100GB-TB
•metric transaction throughput query throughput, response
OLAP
•Main goal: support ad-hoc but complex querying
performed by business analysts
•Interactive process of creating, managing, analyzing
and reporting on data
•Extends spreadsheet-like analysis to work with huge
amounts of data in a data warehouse
•Data exploration and aggregation in various ways
•Typical applications include accessing the effectiveness
of a marketing campaign, product sales forecasting, spot
trends
•Allows a sophisticated user to analyse data using complex,
multi-dimensional views
•Place key performance indicators (measures) into context
(dimensions)
• Measures are pre-aggregated
• Data retrieval is significantly faster
•The proposed cube is made available to business analysts
who can browse the data using a variety of tools, making ad
hoc interatctive and analytical processing
OLAP (Contd..)
OLAP Server Architectures
•Relational OLAP (ROLAP):
• Use relational or extended-relational DBMS to store and manage
warehouse data and OLAP middleware to support missing pieces
• Include optimization of DBMS backend, implementation of
aggregation navigation logic, and additional tools and services
• Greater scalability
• schema design: Star, Snowflake, Fact Constellation
•Multidimensional OLAP (MOLAP):
• Array based multidimensional storage engine (sparse matrix
techniques)
• Fast indexing to pre-computed summarized data
• Schema design: Cube
•Hybrid OLAP (HOLAP):
• User flexibility - low level: relational, high level:array
ROLAP
•Special schema design: snow flake
•Special indexes: bitmap, multi-table join
•Proven technology (relational models, DBMS)
• Tend to outperform specialized MDDB especially
on large data sets
•Products
• IBM DB2, Oracle, Sybase IQ, RedBrick, Informix
Measures and Dimensions
•Measures: key performance indicators that you want to
evaluate
• Typically numerical, including volume, sales and cost
• A rule of thumb: if a number makes business sense
when aggregated, then it is a measure
• Examples
• Aggregate daily volume to month, quarter and year
• Aggregating telephone numbers would not make sense-
not measures
• Affects what should be stored in the data warehouse
Measures and Dimensions
(Contd..)
•Dimensions: categories of data analysis
• Typical dimensions include product, time,
region
• A rule of thumb: when a report is requested
“by” something, that something is usually a
dimension
• Example
• Sales report: view sales by month, by region
• Two dimensions needed are time and region
Conceptual Modeling
•Star schema.
•Snowflake schema.
•Fact constellations, or Galaxy schema .
Sta
r
Star Schema
•Fact table
•Dimension tables
•Measures
•A single fact table and for each dimension one dimension table
•Does not capture hierarchies directly
tim
e ite
m
time_ke
y day item_ke
yday_of_the_wee
k
Sales Fact
Table
item_nam
emont
h
bran
dquarte
r
time_ke
y
typ
eyea
r
supplier_typ
e
item_ke
ybranch_ke
y locatio
n
branc
h
location_ke
y location_ke
y
branch_ke
y
units_sol
d
stree
t
branch_nam
e dollars_sol
d
cit
y
branch_typ
e province_or_stree
t
avg_sale
s
countr
y
Measure
s
1
2
Example - Star
Schema
Dimension Hierarchies
stor
e
sTyp
e
city regio
n
•snowflake schema
•constellations
tim
e ite
m
time_ke
y day item_ke
y
supplie
r
Sales Fact
Table
day_of_the_wee
k
item_nam
e
supplier_ke
y
mont
h
bran
d
time_ke
y
supplier_typ
e
quarte
r
typ
eyea
r
item_ke
y
supplier_ke
ybranch_ke
ybranc
h
location_ke
ybranch_ke
y
locatio
n
units_sol
d
branch_nam
e
location_ke
y
dollars_sol
d
branch_typ
e
cit
y
stree
tavg_sale
s
city_ke
y
city_ke
y city
Measure
s
province_or_stree
tcountr
y
1
3
•Represent dimensional hierarchy directly by normalizing tables.
•Easy to maintain and saves storage
Example of Snowflake
Schema
tim
e Shipping Fact
Table
ite
m
time_ke
y day time_ke
y
item_ke
yday_of_the_wee
k
Sales Fact
Table
item_nam
e
item_ke
y
mont
h
bran
dquarte
r
time_ke
y
shipper_ke
y
typ
eyea
r
supplier_typ
e
item_ke
y
from_locatio
nbranch_ke
y
to_locatio
nlocatio
n
branc
h
location_ke
y
dollars_cos
tlocation_ke
y
branch_ke
y
units_sol
d
units_shippe
d
stree
t
branch_nam
e dollars_sol
d
cit
y
branch_typ
e province_or_stree
t
avg_sale
s
countr
y
shippe
rMeasure
s
shipper_ke
yshipper_nam
e
location_key
shipper_type 1
4
Multiple fact tables that share many dimension tables
Example of Fact
Constellation
Aggregates
• Add up amounts for day 1
In SQL: SELECT sum(amt) FROM SALE
WHERE date = 1
8
1
Aggregates (Contd..)
• Add up amounts by day
In SQL: SELECT date, sum(amt) FROM
SALE
GROUP BY date
• Add up amounts by day, product
In SQL: SELECT date, sum(amt) FROM
SALE
GROUP BY date, prodId
drill-down
rollu
p
Aggregates (Contd..)
Points to be noticed
about ROLAP
•Defines complex, multi-dimensional data with simple model
•Reduces the number of joins a query has to process
•Allows the data warehouse to evolve with relatively low
maintenance
•Can contain both detailed and summarized data.
•ROLAP is based on familiar, proven, and already selected
technologies.
•BUT!!!
•SQL for multi-dimensional manipulation of calculations.
Thank You

Mais conteúdo relacionado

Mais procurados

data warehousing
data warehousingdata warehousing
data warehousing143sohil
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyAnkita Dubey
 
Enterprise Data Warehousing Positioning
Enterprise Data Warehousing PositioningEnterprise Data Warehousing Positioning
Enterprise Data Warehousing PositioningEdenH6
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biA P
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its conceptsGaurav Garg
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overviewashok kumar
 
Business intelligence and data warehouses
Business intelligence and data warehousesBusiness intelligence and data warehouses
Business intelligence and data warehousesDhani Ahmad
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Vibrant Technologies & Computers
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Lesa Cote
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingEyad Manna
 

Mais procurados (20)

Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Data Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubeyData Warehouses & Deployment By Ankita dubey
Data Warehouses & Deployment By Ankita dubey
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Enterprise Data Warehousing Positioning
Enterprise Data Warehousing PositioningEnterprise Data Warehousing Positioning
Enterprise Data Warehousing Positioning
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its concepts
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
Business intelligence and data warehouses
Business intelligence and data warehousesBusiness intelligence and data warehouses
Business intelligence and data warehouses
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Chapter 5 data resource management
Chapter 5  data resource managementChapter 5  data resource management
Chapter 5 data resource management
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Oracle: DW Design
Oracle: DW DesignOracle: DW Design
Oracle: DW Design
 
Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse Role of Database Management System in A Data Warehouse
Role of Database Management System in A Data Warehouse
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
OLAP
OLAPOLAP
OLAP
 

Destaque (13)

Tong fen mu jian fa
Tong fen mu   jian faTong fen mu   jian fa
Tong fen mu jian fa
 
Coordination
CoordinationCoordination
Coordination
 
Tong fen mu jian fa
Tong fen mu   jian faTong fen mu   jian fa
Tong fen mu jian fa
 
Yifenmu jia fa
Yifenmu jia faYifenmu jia fa
Yifenmu jia fa
 
Tarea 1- Tecn. aplicadas a la Educacion.
Tarea 1- Tecn. aplicadas a la Educacion.Tarea 1- Tecn. aplicadas a la Educacion.
Tarea 1- Tecn. aplicadas a la Educacion.
 
1
11
1
 
matematik pecahan
matematik pecahanmatematik pecahan
matematik pecahan
 
Yifenmu jia fa
Yifenmu jia faYifenmu jia fa
Yifenmu jia fa
 
Multipart verbs
Multipart verbsMultipart verbs
Multipart verbs
 
Cas ti
Cas tiCas ti
Cas ti
 
Tong fen mu
Tong fen muTong fen mu
Tong fen mu
 
Tarea
TareaTarea
Tarea
 
Akamai
AkamaiAkamai
Akamai
 

Semelhante a Data warehouse introduction

Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptRafiulHasan19
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data martAmit Sarkar
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxParnalSatle
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptxAnusuya123
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanNivetha Durganathan
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousingAhmad Shlool
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousingAhmad Shlool
 
Management information system database management
Management information system database managementManagement information system database management
Management information system database managementOnline
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerAntonios Chatzipavlis
 

Semelhante a Data warehouse introduction (20)

DW (1).ppt
DW (1).pptDW (1).ppt
DW (1).ppt
 
Various Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.pptVarious Applications of Data Warehouse.ppt
Various Applications of Data Warehouse.ppt
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Lecture1
Lecture1Lecture1
Lecture1
 
Datawarehouse org
Datawarehouse orgDatawarehouse org
Datawarehouse org
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
 
data warehousing
data warehousingdata warehousing
data warehousing
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
Data Warehouse
Data WarehouseData Warehouse
Data Warehouse
 
kalyani.ppt
kalyani.pptkalyani.ppt
kalyani.ppt
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha Durganathan
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousing
 
Ch1 data-warehousing
Ch1 data-warehousingCh1 data-warehousing
Ch1 data-warehousing
 
Management information system database management
Management information system database managementManagement information system database management
Management information system database management
 
Building Data Warehouse in SQL Server
Building Data Warehouse in SQL ServerBuilding Data Warehouse in SQL Server
Building Data Warehouse in SQL Server
 

Último

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 

Último (20)

Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 

Data warehouse introduction

  • 2. Data Warehousing •Aims of information technology: • To help workers in their everyday business activity and improve their productivity – clerical data processing tasks • To help knowledge Employee (executives, managers, analysts) make faster and better decisions – decision support systems •Two types of applications: • Operational applications • Analytical applications
  • 3. •In most organizations, data about specific parts of business is there - lots and lots of data, somewhere, in some form. •Data is available but not information -- and not the right information at the right time. •There is a need to • bring together information . • off-load decision support applications from the on-line transaction system Data Warehousing (Contd..)
  • 4. Data Warehouse •“A data warehouse is a subject-oriented, integrated, time- variant, and nonvolatile collection of data in support of management’s decision-making process.” --- W. H. Inmon •Collection of data that is used primarily in organizational decision making •A decision support database that is maintained separately from the organization’s operational database
  • 5. Data Warehouse - Subject Oriented •Data that gives information about a particular subject. •Data for Model& Analysis. •Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.
  • 6. Data Warehouse – Integrated •It Constructed by integrating multiple, heterogeneous data sources. •Data cleaning and data integration techniques are applied. •When data is moved to the warehouse, it is converted - •
  • 7. Data Warehouse - Time Variant •Data is stable in a data warehouse. •Its adds historical as well as current data. •Every key structure in the data warehouse - Contains an element of time, explicitly or implicitly •But the key of operational data may or may not contain “time element”.
  • 8. Data Warehouse - Non-Volatile • A physically separate store of data transformed from the operational environment. • No update & delete on historical data . •Operational update of data does not occur in the data warehouse •Appended • Initial loading of data and access of data.
  • 9. Data modifications & schema design • A data warehouse is updated on a regular basis by the ETL process (run nightly or weekly) using bulk data modification techniques. • Data warehouses often use denormalized or partially denormalized schemas (such as a star schema) to optimize query performance.
  • 10. Why Separate Data Warehouse? •Separate & historical data are needed for decision support. •Complex decision . •Missing Data. •Data consolidation . •Data quality.
  • 11. Advantages of Data Warehousing •High query performance •Queries not visible outside warehouse •Local processing at sources unaffected •Can operate when sources unavailable •Can query data not stored in a DBMS •Extra information at warehouse • Modify, summarize (store aggregates) • Add historical information
  • 12. Decision Support System • Information technology to help knowledge employees (executives, managers, analysts) make faster and better decisions • OLAP is an element of decision support system • Data mining is a powerful, high-performance data analysis tool for decision support.
  • 13. Three-Tier Decision Support Systems•Warehouse database server • Almost always a relational DBMS, rarely flat files •OLAP servers • Relational OLAP (ROLAP): extended relational DBMS that maps operations on multidimensional data to standard relational operators • Multidimensional OLAP (MOLAP): special-purpose server that directly implements multidimensional data and operations •Clients • Query and reporting tools • Analysis tools • Data mining tools
  • 14. The Complete Decision Support System Information Sources Data Warehouse Server (Tier 1) OLAP Servers (Tier 2) Clients (Tier 3) Operational DB’s Semistructured Sources extract transform load refresh etc. Data Marts Data Warehouse e.g., MOLAP e.g., ROLAP serve OLAP Query/Reporting Data Mining serve serve
  • 15. Data Sources •Data sources are often the operational systems, providing the lowest level of data. •Data sources are designed for operational use, not for decision support, and the data reflect this fact. •Multiple data sources are often from different systems, run on a wide range of hardware and much of the software is built in-house or highly customized. •Multiple data sources introduce a large number of issues -- semantic conflicts.
  • 16. Creating and Maintaining a Warehouse •Data warehouse needs several tools that automate or support tasks such as: • Data extraction from different external data sources, operational databases, files of standard applications • Data cleaning (finding and resolving inconsistency in the source data) • inconsistent field lengths, inconsistent descriptions, inconsistent value assignments, missing entries and violation of integrity constraints. • optional fields in data entry are significant sources of inconsistent data.
  • 17. • Integration and transformation of data (between different data formats, languages, etc.) • Data loading (loading the data into the data warehouse) • checking integrity constraints, sorting, summarizing, etc. • Data replication (replicating source database into the data warehouse) • used to incrementally refresh a warehouse when sources change • Data refreshment • propagating updates on source data to the data stored in the warehouse • Periodically or immediately • Data archiving Creating and Maintaining a Warehouse
  • 18. The Data Warehousing Models •Enterprise Warehouse • collects all the information about subjects spanning entire organization •Data Mart • a subset of corporate-wide data that is of value to a specific group of users • its scope is confined to specific, selected groups, such as marketing data mart • Independent Vs. Dependent (directly from warehouse) data mart •Virtual warehouse • a set of views over operational databases • only some summary views are materialized
  • 19. Physical Structure of Data Warehouse •There are three basic architectures for constructing a data warehouse: • Centralized • Distributed • Federated • Tiered •The data warehouse is distributed for: load balancing, scalability and higher availability
  • 20. The logical data warehouse is only virtual •The central data warehouse is physical •There exist local data marts on different tiers which store copies or summarization of the previous tier. Physical Structure of Data Warehouse (Contd..)
  • 21. Data Processing Models •There are two basic data processing models: • OLTP (On-Line Transaction Processing) • Describes processing at operational sites • aim is reliable and efficient processing of a large number of transactions and ensuring data consistency. • OLAP (On-Line Analytical Processing) • Describes processing at warehouse • aim is efficient multidimensional processing of large data volumes.
  • 22. OLTP vs. OLAP • OLTP OLAP •users Clerk, IT professional Knowledge worker •Function day to day operations decision support •DB design application-oriented subject-oriented •data current, up-to-date historical, summarized • detailed, flat relational multidimensional • isolated integrated, consolidated •usage repetitive ad-hoc •access read/write, lots of scans • index/hash on prim. key •unit of work short, simple transaction complex query •# records accessed tens millions •#users thousands hundreds •DB size 100MB-GB 100GB-TB •metric transaction throughput query throughput, response
  • 23. OLAP •Main goal: support ad-hoc but complex querying performed by business analysts •Interactive process of creating, managing, analyzing and reporting on data •Extends spreadsheet-like analysis to work with huge amounts of data in a data warehouse •Data exploration and aggregation in various ways •Typical applications include accessing the effectiveness of a marketing campaign, product sales forecasting, spot trends
  • 24. •Allows a sophisticated user to analyse data using complex, multi-dimensional views •Place key performance indicators (measures) into context (dimensions) • Measures are pre-aggregated • Data retrieval is significantly faster •The proposed cube is made available to business analysts who can browse the data using a variety of tools, making ad hoc interatctive and analytical processing OLAP (Contd..)
  • 25. OLAP Server Architectures •Relational OLAP (ROLAP): • Use relational or extended-relational DBMS to store and manage warehouse data and OLAP middleware to support missing pieces • Include optimization of DBMS backend, implementation of aggregation navigation logic, and additional tools and services • Greater scalability • schema design: Star, Snowflake, Fact Constellation •Multidimensional OLAP (MOLAP): • Array based multidimensional storage engine (sparse matrix techniques) • Fast indexing to pre-computed summarized data • Schema design: Cube •Hybrid OLAP (HOLAP): • User flexibility - low level: relational, high level:array
  • 26. ROLAP •Special schema design: snow flake •Special indexes: bitmap, multi-table join •Proven technology (relational models, DBMS) • Tend to outperform specialized MDDB especially on large data sets •Products • IBM DB2, Oracle, Sybase IQ, RedBrick, Informix
  • 27. Measures and Dimensions •Measures: key performance indicators that you want to evaluate • Typically numerical, including volume, sales and cost • A rule of thumb: if a number makes business sense when aggregated, then it is a measure • Examples • Aggregate daily volume to month, quarter and year • Aggregating telephone numbers would not make sense- not measures • Affects what should be stored in the data warehouse
  • 28. Measures and Dimensions (Contd..) •Dimensions: categories of data analysis • Typical dimensions include product, time, region • A rule of thumb: when a report is requested “by” something, that something is usually a dimension • Example • Sales report: view sales by month, by region • Two dimensions needed are time and region
  • 29. Conceptual Modeling •Star schema. •Snowflake schema. •Fact constellations, or Galaxy schema .
  • 30. Sta r
  • 31. Star Schema •Fact table •Dimension tables •Measures •A single fact table and for each dimension one dimension table •Does not capture hierarchies directly
  • 32. tim e ite m time_ke y day item_ke yday_of_the_wee k Sales Fact Table item_nam emont h bran dquarte r time_ke y typ eyea r supplier_typ e item_ke ybranch_ke y locatio n branc h location_ke y location_ke y branch_ke y units_sol d stree t branch_nam e dollars_sol d cit y branch_typ e province_or_stree t avg_sale s countr y Measure s 1 2 Example - Star Schema
  • 34. tim e ite m time_ke y day item_ke y supplie r Sales Fact Table day_of_the_wee k item_nam e supplier_ke y mont h bran d time_ke y supplier_typ e quarte r typ eyea r item_ke y supplier_ke ybranch_ke ybranc h location_ke ybranch_ke y locatio n units_sol d branch_nam e location_ke y dollars_sol d branch_typ e cit y stree tavg_sale s city_ke y city_ke y city Measure s province_or_stree tcountr y 1 3 •Represent dimensional hierarchy directly by normalizing tables. •Easy to maintain and saves storage Example of Snowflake Schema
  • 35. tim e Shipping Fact Table ite m time_ke y day time_ke y item_ke yday_of_the_wee k Sales Fact Table item_nam e item_ke y mont h bran dquarte r time_ke y shipper_ke y typ eyea r supplier_typ e item_ke y from_locatio nbranch_ke y to_locatio nlocatio n branc h location_ke y dollars_cos tlocation_ke y branch_ke y units_sol d units_shippe d stree t branch_nam e dollars_sol d cit y branch_typ e province_or_stree t avg_sale s countr y shippe rMeasure s shipper_ke yshipper_nam e location_key shipper_type 1 4 Multiple fact tables that share many dimension tables Example of Fact Constellation
  • 36. Aggregates • Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1 8 1
  • 37. Aggregates (Contd..) • Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date
  • 38. • Add up amounts by day, product In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId drill-down rollu p Aggregates (Contd..)
  • 39. Points to be noticed about ROLAP •Defines complex, multi-dimensional data with simple model •Reduces the number of joins a query has to process •Allows the data warehouse to evolve with relatively low maintenance •Can contain both detailed and summarized data. •ROLAP is based on familiar, proven, and already selected technologies. •BUT!!! •SQL for multi-dimensional manipulation of calculations.

Notas do Editor

  1. Data modificationsThe end users of a data warehouse do not directly update the data warehouse.