Mais conteúdo relacionado Semelhante a Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka (20) Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka2. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Agenda
▪ What Is Data Warehousing?
▪ Data Warehousing Concepts:
▪ OLAP (On-Line Analytical Processing)
▪ Types Of OLAP Cubes
▪ Dimensions, Facts & Measures
▪ Schemas
3. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
What Is Data Warehousing?
Let’s first understand what is Data Warehousing, why it’s
needed and what are the added benefits.
4. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
What Is A Data Warehouse?
➢ Data Warehouse is like a relational database designed for analytical needs.
➢ It functions on the basis of OLAP (Online Analytical Processing).
➢ It is a central location where consolidated data from multiple locations (databases) are stored.
Data Analysis & Visualization
Data
Warehouse
5. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
What Is Data Warehousing?
➢ Data Warehousing is the act of organizing & storing data in a way so as to make its retrieval efficient and insightful.
➢ It’s also called as the process of transforming data into information.
6. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Data Warehousing Concepts
Now let’s understand the various concepts revolving around
Data Warehousing like: OLAP, Dimensions, Facts & Schemas
7. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
OLAP (Online Analytical Processing)
➢ OLAP is a flexible way for you to make complicated analysis of multidimensional data.
➢ DWH is modeled on the concept of OLAP. DBs are modeled on the concept of OLTP (Online Transaction Processing).
➢ OLTP systems use data stored in the form of two-dimensional tables, with rows and columns.
OLAPOLTP
1. Opens up new views of looking at data.
2. Supports filtering/ sorting of data.
3. Data can be refined.
Advantages Of OLAP Over OLTP
8. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Types Of OLAP Cubes
MOLAP is a form of OLAP that processes and stores the data directly into a
multidimensional database.
Advantage:- Excellent performance; Can perform complex calculations.
Disadvantage:- Only limited data can be handled.
MOLAP1
ROLAP is a form of OLAP that performs dynamic multidimensional analysis of data
stored in a relational database rather than in a multidimensional database.
Advantage:- Greater amount of data can be processed.
Disadvantage:- Requires more processing time/ disk space.
ROLAP2
HOLAP (Hybrid OLAP) is a combination of the advantages of MOLAP and ROLAP.
Advantages: HOLAP can "drill through" from the cube into underlying relational data.
HOLAP3
9. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
OLAP Operations:- Roll-up
Roll-up performs aggregation on a data cube by either:
1. Climbing up a concept hierarchy for a dimension
2. Dimension reduction
The following diagram illustrates how roll-up works.
10. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
OLAP Operations:- Drill-down
Drill-down is the reverse operation of roll-up.
It is performed by either:
1. Stepping down a concept hierarchy for a
dimension
2. Introducing a new dimension.
The following diagram illustrates how drill-down works.
11. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
OLAP Operations:- Slice
The slice operation provides a new sub-cube from one
particular dimension in a given cube.
Consider the following diagram that shows how slice works.
12. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
OLAP Operations:- Dice
The Dice operation provides a new sub-cube from two or more
dimensions in a given cube.
Consider the following diagram that shows the dice operation.
13. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
OLAP Operations:- Pivot
The pivot operation is also known as rotation operation.
It transposes the axes in order to provide an alternative
presentation of data.
Consider the following diagram that shows the pivot operation.
14. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Dimensions
➢ The tables that describe the dimensions involved are called Dimension tables.
➢ Dividing a Data Warehouse project into dimensions provides structured information for analysis & reporting.
Dimensions
Subject
Attributes
E-commerce Company
Customer Product Date
ID Name Address ID Name Type
Order
date
Shipment
date
Delivery
date
15. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Dimensions
➢ End users fire queries on these dimension tables which contain descriptive information.
E-commerce Company
Customer Product Date
ID Name Address ID Name Type
Order
date
Shipment
date
Delivery
date
1 Rita ABC 001 CD 1A 1/06/14 3/06/14 5/06/14
2 John XYZ 002 AC 2B 6/06/14 9/06/14 11/06/14
3 Paul PQR 003 TV 3C 10/06/14 14/06/14 16/06/14
Result
Query
16. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Facts & Measures
➢ A fact is a measure that can be summed, averaged or manipulated.
➢ A Fact table contains 2 kinds of data – a dimension key and a measure.
➢ Every Dimension table is linked to a Fact table.
Dimension
Product
Number of units sold
Fact Table
Product_ID Dimension key
Measure
17. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Schemas
➢ A schema gives the logical description of the entire data base.
➢ It gives details about the constraints placed on the tables, key values present & how the key values are linked
between the different tables.
➢ A database uses relational model, while a data warehouse uses Star, Snowflake and Fact Constellation schema.
Employee
ID First Name Last Name Age Dept_ID
1234 Rita Joe 25 0674
4321 John Smith 35 0825
5678 Paul Brady 45 0752
7890 Rose Michael 65 0825
Department
Dept_ID Dept_Name
0674 Sales
0752 HR
0825 Production
Linked
18. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Types Of Schemas:- Star Schema
➢ Each dimension in a star schema is represented with a one-dimension table which contains a set of attributes.
➢ Fact table is at the center. which contains keys to every dimension table & attributes like: units sold and revenue.
Revenue
Dealer_ID
Model_ID
Branch_ID
Date_ID
Units_Sold
Revenue
Dealer
Dealer_ID
Location_ID
Country_ID
Dealer_NM
Dealer_CNTCT
Branch Dim
Branch _ID
Name
Address
Country
Date Dim
Date_ID
Year
Month
Quarter
Date
Product
Product_ID
Product_Name
Model_ID
Variant_ID
Fact Table
Dimension Table
Dimension TableDimension Table
Dimension Table
19. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Types Of Schemas:- Snowflake Schema
➢ Dimension tables in the Snowflake schema are normalized. (Split into additional tables).
➢ Dealer dimension table is split into Location & Country. Product dimension table is split into Product & Variant.
Revenue
Dealer_ID
Model_ID
Branch_ID
Date_ID
Units_Sold
Revenue
Dealer
Dealer_ID
Location_ID
Country_ID
Dealer_NM
Dealer_CNTCT
Branch Dim
Branch _ID
Name
Address
Country
Date Dim
Date_ID
Year
Month
Quarter
Date
Product
Product_ID
Product_Name
Model_ID
Variant_ID
Fact Table
Dimension Table
Dimension TableDimension Table
Dimension Table
Location
Location_ID
Region
Country
Country_ID
Country_Name
Dimension Table
Dimension Table
Variant
Variant_ID
Variant_Name
Fuel type
Dimension Table
20. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Types Of Schemas:- Galaxy Schema
➢ Also known as Fact Constellation schema. Contains more than 1 Fact table.
➢ Below, there are two fact tables: Revenue and Product.
➢ Dimensions which are shared are called Conformed Dimensions.
Revenue
Dealer_ID
Branch_ID
Date_ID
Units_Sold
Revenue
Dealer
Dealer_ID
Location_ID
Country_ID
Dealer_NM
Dealer_CNTCT
Branch Dim
Branch _ID
Name
Address
Country
Date Dim
Date_ID
Year
Month
Quarter
Date
Product
Product_ID
Product_Name
Model_ID
Variant_ID
Fact Table
Dimension Table
Dimension Table
Dimension Table
Dimension Table
Product
Product_ID
Product_Name
Variant_ID
Fact Table
21. Copyright © 2017, edureka and/or its affiliates. All rights reserved.
Session In A Minute
What Is Data Warehousing?
Dimensions, Facts & Measures
OLAP
Schemas