This document provides an overview of key concepts related to data warehousing including:
- It defines data warehousing and lists some of its key components like dimensional data modeling, star and snowflake schemas, and slowly changing dimensions.
- It also discusses conceptual, logical, and physical data models and how they relate to each other.
- Finally, it briefly introduces the concepts of data integrity, OLAP, and different OLAP server types like MOLAP, ROLAP, and HOLAP.
2. Data Warehousing Concepts
What is Data Warehousing?
Dimensional Data Model
Star Schema
Snowflake Schema
Slowly Changing Dimension
Conceptual Data Model
Logical Data Model
Physical Data Model
Conceptual, Logical, and Physical Data Model
Data Integrity
What is OLAP
MOLAP, ROLAP, and HOLAP
3. What is Data Warehousing?
Different people have different definitions for a data warehouse. The most popular
definition came from Bill Inmon, who provided the following:
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile
collection of data in support of management's decision making process.
A process of transforming data
into information and making it
available to users in a timely
enough manner to make a
difference
4. To summarize ...
• OLTP Systems
are used to “run”
a business
• The Data Warehouse
helps to “optimize” the
business
5. Corporate Data
It includes
• human resource data
• financial data
• facilities data
• sales data
• expenses on marketing data
• production planning cost
• manufacturing cost
• service delivery cost
• inventory management
• shipping and payment data
What is enterprise-wide corporate data?
How is the Business Intelligence in Retail Banking? Or Retail
Industry?
6. KPI’s
The KPI can be used as the performance measurement tool
(Key Performance Indicator)
The KPI’s in Retail Banking:
The Total cash deposits held in a month
The average annual deposit held
Average number of deposits per retail bank growth
Average withdrawals made by each depositor
Ratio of active depositor or dormant depositor
Average number of default borrowers in a year
Average number of credit cards issued by the retail bank
Rate of borrowing risk
Rate of default risk
Average number of customers served in a day
Average number of closed bank accounts
7. KPI’s
The KPI can be used as the performance measurement tool
(Key Performance Indicator)
The KPI’s in Retail Industry:
• Sales compared to Budget & Target
• Sales compared to last year (or any other period)
• Wage cost recovery
• Average sale per customer/transaction
• Units per customer/transaction
• Sales per hour
• Sales & Gross Margin
8. KPI’s (Key Performance Indicator)
Examples of common departmental KPIs
Sales Growth
Analyze the pace at which your organization's
sales revenue is growing and use that
information in strategic decision-making
Marketing
Analyze the pace at which your organization's
sales revenue is growing and use that
information in strategic decision-making
Financial
Measures your organization's financial health
by analyzing readily available resources that
could be used to meet any short-term
obligations.
12. • Duplicate data
• Inconsistent values
• Missing data
• Unexpected use of fields
• Impossible or wrong values
Data Quality
• Data-Type Constraints:
• Range Constraints:
• Mandatory Constraints:
• Unique Constraints:
• Set-Membership constraints:
• Foreign-key constraints: Regular expression patterns:
Validations for Data Cleansing
13. Views to build warehouse
• The top-down view
• The data source view
• The data warehouse view
• The business query view
What approach is better to design data warehouse?
16. Data Warehousing Design
• Requirement Gathering
• Physical Environment Setup
• Data Modeling
• ETL
• OLAP Cube Design
• Front End Development
• Report Development
• Performance Tuning
• Query Optimization
• Quality Assurance
• Rolling out to Production
• Production Maintenance
• Incremental Enhancements
17. Why Data Warehousing?
Need to see daily, weekly, monthly, quarterly profit of each
store.
Comparison of sales and profit on various time periods.
Comparison of sales in various time bands of the day.
Need to know which product has more demand on which
location?
Need to study trend of sales by time period of the day over
the week, month, and year?
On what day sales is higher?
18. Phases of Data Warehousing Project
1. Identify and collect requirements
Need to see daily, weekly, monthly, quarterly profit of each store.
Comparison of sales and profit on various time periods.
Comparison of sales in various time bands of the day.
Need to know which product has more demand on which location?
Need to study trend of sales by time period of the day over the week, month, and year?
On what day sales is higher?
Will be handled by business analyst and leads
Who collects the requirements?
19. Phases of Data Warehousing Project
2. Design the dimensional model
Pharmacy_Claims_Fact
Drug_Id (FK)
Org_Id (FK)
Practitioner_Id (FK)
Product_Id (FK)
Time_ID (FK)
Claim_status_Id (FK)
Provider_Id (FK)
Subscriber_id (FK)
Demographic_key (FK)
InsuranceType_Id (FK)
Incurred_Date
Claim_Date
Claim_Settled_Date
Days_Supply
Dispensing_Fee
Incentive_Savings_Amount
Incentive_Fee_Paid_Amount
Amount_Claimed
Amount_Paid
Amount_Pending
Amount_Adjusted
CoPayment_Amount
CoInsurance_Amount
Deductible
Refill_Indicator
Claim_Production_Key
Claim_Production_Txn_No
Status_Change_Date
Last_Record_Flag
Practitioner
Practitioner_Id
Practitioner_Name
Practitioner_Type
practioner_type_desc
Qualification
Specialisation
ssn
Medical_Assoc_Enroll_No
Organisation
Org_Id
Org_prod_id
Org_Name
Address
City
County
State
Zip
Industry_Classification
Subscriber
Subscriber_id
Subscriber_prod_key
Member_prod_key
Member_Name
Date_of_Birth
Subscriber_type
Address
City
County
State
Zip
Hobby1
Hobby2
Smoker_YN
Alcoholic_YN
Pre_Existing_Ailments
Demographics
Demographic_key
Age_group
Income_group
Race
Country_of_birth
Marital_status
Gender
Citizenship_status
Provider
Provider_Id
Provider_Name
Provider_Type
Address
City
County
State
Zip
Service_Area
Netwrok_Provider
Insurance_Type
InsuranceType_Id
InsuranceType_Name
InsuranceType_Desc
Product
Product_Id
Product_Name
Product_Category
LoB
Claim_Status
Claim_status_Id
Claim_Status_Reason
Claim_stat_catg
Time
Time_ID
Day
Week
Month
Quarter
Year
Season
Drugs
Drug_Id
Drug_Name_Generic
Drug_Name_Trade
National_Drug_Code
Drug_Description
Drug_Category
Formulary
Manufacturer
Data Model will be designed by Data Modelers
20. Phases of Data Warehousing Project
3. Create and Maintain the tables
Database will be maintained by DBA’s
21. Phases of Data Warehousing Project
4. Loading the data into Data Warehouse and Data Marts
Will be taken care by ETL Team
23. Phases of Data Warehousing Project
5. Develop Reports / Dashboards
Will be taken care by Reporting Team
24. Phases of Data Warehousing Project
6. Testing ETL Mappings and Reports / Dashboards
Will be taken care by QA Department
7. Deploying to the Production and Maintaining by Production
Team
Will be taken care by Production Department
Where do we fit after learning this training?
25. Phases of Data Warehousing Project
Where do we fit after learning this training?
We can work as a
1. ETL Developer
2. ETL Administrator
3. ETL Tester
27. What is Data Modeling?
• Data model defines relationships between
data
• Dimensional data model is most often used in
data warehousing systems.
• Data modeling is the process of learning about
the data.
Data modeling will be designed by data modelers
28. What is Dimensional Modeling?
• It help us store the data
Goals and benefits of Dimensional Modeling
• Faster Data retrieval
• Better Understandability
• Extensibility
It has 2 distinct categories
• Dimension and
• Measures
29. Scenarios of Dimensional Data Modeling
McDonald’s client:
I want to store information of how many burgers and fries are getting
sold per day from a single McDonald’s outlet.
what is dimension and what is a measure in this example
Step1: Identify the Dimensions
1.Food (ex: Burgers and fries)
2. Store (McDonald’s)
3. Some specific day
Step2: Identify the measures
Number of burgers/fries sold is a measure.
The Fact table captures the data that measures the organizations business
operations
30. Scenarios of Dimensional Data Modeling
Step3: Identify the attributes or properties of dimensions
KEY NAME
1 Burger
2 Fries
KEY NAME
1 Store 1
2 Store 2
... ...
KEY DAY
1 01 Jan 2012
2 02 Jan 2012
3 03 Jan 2012
... ...
31. Scenarios of Dimensional Data Modeling
Step 4: Identify the granularity of the measures
What is meant by "Granularity"?
Granularity refers to the lowest (or most granular) level of information
stored in any table
32. Scenarios of Dimensional Data Modeling
Step 5: History Preservation (Optional)
This can be solved by designing the dimension tables as "slowly changing
dimension".
Entities:
Entities are the things about which you want to store information.
For example: EMPLOYEE
33. Cardinalities:
Scenarios of Dimensional Data Modeling
The cardinality shows how much of one side of the relationship belongs to
how much of the other side of the relationship.
For example:
• How many customers belong to 1 sale?;
• How many sales belong to 1 customer?;
• How many sales take place in 1 shop?
Customers --> Sales; 1 customer can buy something several times
Sales --> Customers; 1 sale is always made by 1 customer at the time
Customers --> Products; 1 customer can buy multiple products
Products --> Customers; 1 product can be purchased by multiple customers