1. DWH-Ahsan AbdullahDWH-Ahsan Abdullah
11
Data WarehousingData Warehousing
Lecture-4Lecture-4
Introduction and BackgroundIntroduction and Background
Virtual University of PakistanVirtual University of Pakistan
Ahsan Abdullah
Assoc. Prof. & Head
Center for Agro-Informatics Research
www.nu.edu.pk/cairindex.asp
FAST National University of Computers & Emerging Sciences, IslamabadFAST National University of Computers & Emerging Sciences, Islamabad
3. DWH-Ahsan Abdullah
3
How is it Different?How is it Different?
Starts with a 6x12 availability requirement ...Starts with a 6x12 availability requirement ...
but 7x24 usually becomes the goal.but 7x24 usually becomes the goal.
Decision makers typically don’t work 24 hrs a day and 7
days a week. An ATM system does.
Once decision makers start using the DWH, and start
reaping the benefits, they start liking it…
Start using the DWH more often, till want it available
100% of the time.
4. DWH-Ahsan Abdullah
4
How is it Different?How is it Different?
Starts with a 6x12 availability requirement ...Starts with a 6x12 availability requirement ...
but 7x24 usually becomes the goal.but 7x24 usually becomes the goal.
For business across the globe, 50% of the world may be
sleeping at any one time, but the businesses are up 100%
of the time.
100% availability not a trivial task, need to take into
account loading strategies, refresh rates etc.
5. DWH-Ahsan Abdullah
5
How is it Different?How is it Different?
Does not follows the traditional developmentDoes not follows the traditional development
modelmodel
Classical SDLC
Requirements gathering
Analysis
Design
Programming
Testing
Integration
Implementation
Requirements
Program
6. DWH-Ahsan Abdullah
6
How is it Different?How is it Different?
Does not follows the traditional developmentDoes not follows the traditional development
modelmodel
DWH SDLC (CLDS)
Implement warehouse
Integrate data
Test for biasness
Program w.r.t data
Design DSS system
Analyze results
Understand requirement
Requirements
Program
DWH
7. DWH-Ahsan Abdullah
7
Data Warehouse Vs. OLTP
OLTP (On Line Transaction Processing)OLTP (On Line Transaction Processing)
Select tx_date, balance from tx_table
Where account_ID = 23876;
8. DWH-Ahsan Abdullah
8
Data Warehouse Vs. OLTPData Warehouse Vs. OLTP
DWHDWH
Select balance, age, sal, gender from
customer_table, tx_table
Where age between (30 and 40) and
Education = ‘graduate’ and
CustID.customer_table =
Customer_ID.tx_table;
9. DWH-Ahsan Abdullah
9
Data Warehouse Vs. OLTPData Warehouse Vs. OLTP
OLTP DWH
Primary key used Primary key NOT used
No concept of Primary Index Primary index used
Few rows returned Many rows returned
May use a single table Uses multiple tables
High selectivity of query Low selectivity of query
Indexing on primary key
(unique)
Indexing on primary index
(non-unique)
10. DWH-Ahsan Abdullah
10
Data Warehouse Vs. OLTPData Warehouse Vs. OLTP
Data Warehouse OLTP
Scope * Application –Neutral
* Single source of “truth”
* Evolves over time
* How to improve business
* Application specific
* Multiple databases with repetition
* Off the shelf application
* Runs the business
Data
Perspective
* Historical, detailed data
* Some summary
* Lightly denormalized
* Operational data
* No summary
* Fully normalized
Queries * Hardly uses PK
* Number of results
returned in thousands
* Based on PK
* Number of results returned in
hundreds
Time factor * Minutes to hours
* Typical availability 6x12
* Sub seconds to seconds
* Typical availability 24x7
OLTP: OnLine Transaction Processing (MIS or Database System)
11. DWH-Ahsan Abdullah
11
Comparison of Response TimesComparison of Response Times
On-line analytical processing (OLAP) queries mustOn-line analytical processing (OLAP) queries must
be executed in a small number of seconds.be executed in a small number of seconds.
Often requires denormalizationOften requires denormalization and/or sampling.and/or sampling.
Complex query scripts and large list selections canComplex query scripts and large list selections can
generally be executed in a small number ofgenerally be executed in a small number of
minutes.minutes.
Sophisticated clustering algorithms (e.g., dataSophisticated clustering algorithms (e.g., data
mining) can generally be executed in a smallmining) can generally be executed in a small
number of hours (even for hundreds of thousandsnumber of hours (even for hundreds of thousands
of customers).of customers).
12. DWH-Ahsan Abdullah
12
Data Warehouse Server
(Tier 1)
Data
Warehouse
Operational
Data Bases
Semistructured
Sources Query/Reporting
Data Marts
MOLAP
ROLAP
Clients
(Tier 3)
Tools
Meta
Data
Data sources
Data
(Tier 0)
IT
Users
Business
Users
Business Users
Data Mining
Archived
data
Analysis
OLAP Servers
(Tier 2)
Extract
Transform
Load
(ETL)
www data
Putting the pieces togetherPutting the pieces together