In computing, a data warehouse (DW, DWH), or an enterprise data warehouse (EDW), is a database used for reporting (1) and data analysis (2). Integrating data from one or more disparate sources creates a central repository of data, a data warehouse (DW). Data warehouses store current and historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.
3. What is Data
Warehousing?
Data Warehouse is a subject-oriented,
integrated, nonvolatile and timevariant collection of data in support of
management’s decisions.
6. The architecture
Operational
data source1
High
summarized data
Meta-data
Operational
data source 2
Reporting, query,
application development,
and EIS(executive
information system) tools
Query Manage
Lightly
summarized
data
Load Manager
Operational
data source n
Operational
data store (ods)
DBMS
Detailed data
OLAP(online
analytical processing) tools
Warehouse Manager
Operational data store (ODS)
Data mining
Archive/backup
data
Typical architecture of a data warehouse
End-user
access tools
7. The benefits of data
warehousing
• The potential benefits of data warehousing
are high returns on investment..
• substantial competitive advantage..
• increased productivity of corporate
decision-makers..
9. Subject Oriented
• Example for an insurance company :
Applications Area
Data Warehouse
Auto and Fire
Auto and Fire
Policy
Policy
Processing
Processing
Systems
Systems
Commercial
Commercial
and Life
and Life
Insurance
Insurance
Systems
Systems
Data
Data
Accounting
Accounting
System
System
Billing
Billing
System
System
Policy
Policy
Customer
Customer
Claims
Claims
Processing
Processing
System
System
Losses
Losses
Premium
Premium
9
10. Integrated
• Data is stored once in a single integrated location
(e.g. insurance company)
Auto Policy
Auto Policy
Processing
Processing
System
System
Customer
data
stored
in several
databases
Data Warehouse
Database
Fire Policy
Fire Policy
Processing
Processing
System
System
FACTS, LIFE
FACTS, LIFE
Commercial, Accounting
Commercial, Accounting
Applications
Applications
Subject = Customer
10
11. Time - Variant
Data is stored as a series of snapshots or views which record how it is
collected across time.
Data Warehouse Data
Time
Data
{
•
Key
Data is tagged with some element of time - creation date, as of
date, etc.
Data is available on-line for long periods of time for trend
analysis and forecasting. For example, five or more years
11
12. Non-Volatile
• Existing data in the warehouse is not overwritten or
updated.
External
Sources
Production
Databases
Data
Data
Warehouse
Warehouse
Environment
Environment
Production
Production
Applications
Applications
• Update
• Insert
• Delete
Data
Warehouse
Database
• Load
• Read-Only
12
13. Comparision of OLTP systems and data
warehousing system
OLTP systems
Hold current data
Stores detailed data
Data is dynamic
Repetitive processing
High level of transaction throughput
Predictable pattern of usage
Transaction-driven
Application-orented
Supports day-to-day decisions
Serves large number of clerical/operation users
Data warehousing systems
Holds historical data
Stores detailed, lightly, and highly summarized
data
Data is largely static
Ad hoc, unstructured, and heuristic processing
Medium to how level of transaction throughput
Unpredictable pattern of usage
Analysis driven
Subject-oriented
supports strategic decisions
Serves relatively how number of managerial
users
15. On Line Transaction
Processing
• What is a Transaction ?
– A Logical unit of work
–
–
–
Examples:
Drawing Money from a bank account
Booking a seat on an airline
16. Transactions
• It is a unit of program execution that
accesses & possibly updates various data
items.
• A transaction is a logical unit of work that
performs some useful function for a user.
• In end of the transaction the system must
be:
– in the prior state (if the transaction fails) or
– the status of the system should reflect the
successful completion (if the transaction
succeeded).
• May take a database from one consistent
22. HOLAP
• HOLAP (Hybrid online analytical
Processing)
• Used for Data modeling
• This will support both MOLAP and ROLAP
• Tools: Framework manager, Query Studio.
25. Facts
• Fact is containing measures and IDs.
• Ex; Revenue, Cost, Amount etc
26. Measure Types
• Additive Measures: Which can be added
across all the dimensions
• Non Additive Measures: Which can not be
added across all the dimensions
• Semi Additive Measures: Which can be
added across some dimensions and which
can not be added across some other
dimensions
27. Schema’s In Data warehousing
•
•
•
STAR SHEMA
SNOW-FLAKE SCHEMA
STAR-FLAKE SCHEMA
28. Star Schema
Dimension Tables
Region_Dimension_Table
region _id
NE
NW
SE
SW
Product_Dimension_Table
prod_grp_id
prod_id
prod_grp_desc
prod_desc
10
20
30
100
140
220
Fewer devices
Circuit boards
Components
region _doc
Northeast
Northwest
Southeast
Southwest
account _id
Power supply
Motherboard
Co-processor
100000
110000
120000
130000
140000
account _doc
ABC Electronics
Midway Electric
Victor Components
Washburn, Inc.
Zerox
Account_Dimension_Table
month
prod_id
region_id
account_id
vend_id
net-sales
gross_sales
01-1996
02-1996
03-1996
100
140
220
SW
NE
SW
100000
110000
100000
100
200
300
30,000
23,000
32,000
50,000
42,000
49,000
Fact Table
Monthly_Sales_Summary_Table
month
01-1996
02-1996
03-1996
mo_in_fiscal_yr
4
5
6
month_name
January
February
March
Time_Dimension_Table
Vendor_Dimension_Table
vend_id
100
200
300
vendor_desc
PowerAge, Inc.
Advanced Micro Devices
Farad Incorporated
28
Let us look at a transaction which transfer money from one account to another. The transaction has to do two updates. But this should be transparent to the end user. To the user either the transfer goes thru or it doesn’t.
Before and after the transaction the database should be in a consistent state
Each transaction should be made to feel that it is the only transaction executing at that instant
After the transaction completes the changes made to the db should be visible to other transactions