SlideShare uma empresa Scribd logo
1 de 124
Baixar para ler offline
A company of Daimler AG
LECTURE @DHBW: DATA WAREHOUSE
PART II: DWH DATA MODELING & OLAP
ANDREAS BUCKENHOFER, DAIMLER TSS
ABOUT ME
https://de.linkedin.com/in/buckenhofer
https://twitter.com/ABuckenhofer
https://www.doag.org/de/themen/datenbank/in-memory/
http://wwwlehre.dhbw-stuttgart.de/~buckenhofer/
https://www.xing.com/profile/Andreas_Buckenhofer2
Andreas Buckenhofer
Senior DB Professional
andreas.buckenhofer@daimler.com
Since 2009 at Daimler TSS
Department: Big Data
Business Unit: Analytics
DAIMLER TSS. IT EXCELLENCE: COMPREHENSIVE, INNOVATIVE, CLOSE.
We're a specialist and strategic business partner for innovative IT Solutions within Daimler –
not just another supplier!
As a 100% subsidiary of Daimler, we live the culture of excellence and aspire to take an
innovative and technological lead.
With our outstanding technological and methodical competence we are a competent provider of
services that help those who benefit from them to stand out from the competition. When it
comes to demanding IT questions we create impetus, especially in the core fields car IT and
mobility, information security, analytics, shared services and digital customer
experience.
Data Warehouse / DHBWDaimler TSS GmbH 3
TSS 2 0 2 0 ALWAYS ON THE MOVE.
Daimler TSS GmbH 4
LOCATIONS
Data Warehouse / DHBW
Daimler TSS China
Hub Beijing
6 Employees
Daimler TSS Malaysia
Hub Kuala Lumpur
38 Employees
Daimler TSS India
Hub Bangalore
16 Employees
Daimler TSS Germany
6 Locations
More than1000 Employees
Ulm (Headquarters)
Stuttgart Area
Böblingen, Echterdingen,
Leinfelden, Möhringen
Berlin
After the end of this lecture you will be able to
Understand differences in data modeling between OLTP and OLAP
Understand why data modeling is important
Understand data modeling in the Core Warehouse Layer and Data Mart Layer
• Data Vault
• Dimensional Model / Star schema
Understand dimensions and facts
Understand ROLAP & MOLAP
WHAT YOU WILL LEARN TODAY
Data Warehouse / DHBWDaimler TSS 5
 Requirements
• Efficient update and delete operations
• Efficient read operations
• Avoid contradiction in the data – don’t store data twice or multiple times
• Easy maintenance of the data model
As little redundancy as possible in the data model
DATA MODELING FOR OLTP APPLICATIONS
Data Warehouse / DHBWDaimler TSS 6
First Normal Form (1NF):
• A relation/table is in first normal form if
• the domain of each attribute contains only atomic (simple, indivisible) values.
• the value of any attribute in a tuple/row must be a single value from the domain
of that attribute, i.e. no attribute values can be sets
CODD‘S NORMAL FORMS FOR DB RELATIONS: 1NF
Data Warehouse / DHBWDaimler TSS 7
CODD‘S NORMAL FORMS FOR DB RELATIONS: 1NF
Data Warehouse / DHBWDaimler TSS 8
CD_ID Album Founded Titels
11 Anastacia – Not that kind 1999 1. Not that kind, 2. I‘m outta love, 3 Cowboys & Kisses
12 Pink Floyd – Wish you were here 1964 1. Shine on you crazy diamond
13 Anastacia – Freak of Nature 1999 1. Paid my dues
CD_ID Album Performer Founded Track Titels
11 Not that kind Anastacia 1999 1 Not that kind
11 Not that kind Anastacia 1999 2 I‘m outta love
11 Not that kind Anastacia 1999 3 Cowboys & Kisses
12 Wish you were here Pink Floyd 1964 1 Shine on you crazy diamond
13 Freak of Nature Anastacia 1999 1 Paid my dues
Second Normal Form (2NF):
• In 1st normal form
• Every non-key attribute is fully dependent on the key. There are no dependencies
between a partial key and a non-key field.
CODD‘S NORMAL FORMS FOR DB RELATIONS: 2NF
Data Warehouse / DHBWDaimler TSS 9
CODD‘S NORMAL FORMS FOR DB RELATIONS: 2NF
Data Warehouse / DHBWDaimler TSS 10
CD_ID Album Performer Founded Track Titels
11 Not that kind Anastacia 1999 1 Not that kind
11 Not that kind Anastacia 1999 2 I‘m outta love
11 Not that kind Anastacia 1999 3 Cowboys & Kisses
12 Wish you were here Pink Floyd 1964 1 Shine on you crazy diamond
13 Freak of Nature Anastacia 1999 1 Paid my dues
CD_ID Track Titels
11 1 Not that kind
11 2 I‘m outta love
11 3 Cowboys & Kisses
12 1 Shine on you crazy diamond
13 1 Paid my dues
CD_ID Album Performer Founded
11 Not that kind Anastacia 1999
12 Wish you were here Pink Floyd 1964
13 Freak of Nature Anastacia 1999
Third Normal Form (3FN):
• In 2nd normal form
• No functional dependencies between non key fields: a non-key attribute is
dependent from a PK only
CODD‘S NORMAL FORMS FOR DB RELATIONS: 3NF
Data Warehouse / DHBWDaimler TSS 11
CODD‘S NORMAL FORMS FOR DB RELATIONS: 3NF
Data Warehouse / DHBWDaimler TSS 12
CD_ID Track Titels
11 1 Not that kind
11 2 I‘m outta love
11 3 Cowboys & Kisses
12 1 Shine on you crazy diamond
13 1 Paid my dues
CD_ID Album Performer Founded
11 Not that kind Anastacia 1999
12 Wish you were here Pink Floyd 1964
13 Freak of Nature Anastacia 1999
CD_ID Album Performer
11 Not that kind Anastacia
12 Wish you were here Pink Floyd
13 Freak of Nature Anastacia
Performer Founded
Anastacia 1999
Pink Floyd 1964
CODD‘S NORMAL FORMS - SUMMARY FROM 1NF TO 3NF
Data Warehouse / DHBWDaimler TSS 13
CD_ID Track Titels
11 1 Not that kind
11 2 I‘m outta love
11 3 Cowboys & Kisses
12 1 Shine on you crazy diamond
13 1 Paid my dues
CD_ID Album Performer
11 Not that kind Anastacia
12 Wish you were here Pink Floyd
13 Freak of Nature Anastacia
Performer Founded
Anastacia 1999
Pink Floyd 1964
CD_ID Album Founded Titels
11 Anastacia – Not that kind 1999 1. Not that kind, 2. I‘m outta love, 3 Cowboys & Kisses
12 Pink Floyd – Wish you were here 1964 1. Shine on you crazy diamond
13 Anastacia – Freak of Nature 1999 1. Paid my dues
n
n
WHY DATA MODELING?
Data Warehouse / DHBWDaimler TSS 14
“Data modeling is the process of learning about the data, and regardless of technology,
this process must be performed for a successful application.”
• Learn about the data and promote collective data understanding
• Derive security classification and measures
• Design for performance
• Accelerate development
• Improve Software quality
• Reduce maintenance costs
• Generate code
• NoSQL Schema-on-read: understand model versions after years
WHY DATA MODELING?
Data Warehouse / DHBWDaimler TSS 15
Source quote: Steve Hoberman: Data Modeling for Mongo DB, Technics Publications 2014
The diagram shows a typical OLTP data model
• Customers and products have unique
ids and some descriptive attributes
• A customer can place an order
on a specific date
• The order contains
one or more products
EXERCISE: OLTP DATA MODEL FOR DWH
Data Warehouse / DHBWDaimler TSS 16
Now consider DWH requirements like non-volatile and time-variant data
• Customer Bush marries and takes
her husband’s last name
• Product number 5 gets a price
increase
How would you solve such
requirements in a data model
for the Core Warehouse Layer?
EXERCISE: OLTP DATA MODEL FOR DWH
Data Warehouse / DHBWDaimler TSS 17
Possible solutions:
• Add timestamp column as part of the primary key
• For all tables, not only for specific tables (e.g. product, customer)
• New tables with head and version data to avoid redundancy
• Head table contains static data that does not change (e.g. customer id, birthdate)
• Version table contains data that changes (e.g. last name, comments)
DWH needs own data modeling approaches for the Core Warehouse Layer
and the Mart Layer
EXERCISE: OLTP DATA MODEL FOR DWH
Data Warehouse / DHBWDaimler TSS 18
LOGICAL STANDARD DATA WAREHOUSE ARCHITECTURE
Data Warehouse / DHBWDaimler TSS 19
Data Warehouse
FrontendBackend
External data sources
Internal data sources
Staging
Layer
(Input
Layer)
OLTP
OLTP
Core
Warehouse
Layer
(Storage
Layer)
Mart Layer
(Output
Layer)
(Reporting
Layer)
Integration
Layer
(Cleansing
Layer)
Aggregation
Layer
Metadata Management
Security
DWH Manager incl. Monitor
DATA MODELING IN THE CORE WAREHOUSE LAYER
DATA MODELS IN THE DWH
Data Warehouse / DHBWDaimler TSS 21
Layer Characteristics Data Model
Staging Layer Temporary storage
Ingest of source data
Normally 1:1 copy of source table structure –
usually without constraints and indexes
Core Warehouse
Layer
Historization / bitemporal data
Integration
Tool-independent
Non-redundant data storage
Historization
3NF with historization
Head and Version modelling
Data Vault
Anchor modeling
Dimensional model with historization (possible)
Data Mart Layer Performance for end user queries
required, Tool-dependent
Lots of joins necessary to answer
complex questions
Flat structures, esp. Dimensional model
(ROLAP / MOLAP / HOLAP)
DATA MODELING: 3NF, STAR SCHEMA, DATA VAULT
Data Warehouse / DHBWDaimler TSS 22
Business Key
Relationships
Context
and
history
3NF
Star:
Dimensions
Star:
Facts
Data Vault:
Hub
Data Vault:
Link
Data Vault:
Sat
Vehicle identifier
Manufacturer
Model
Type
Plant
Delivery Date
Production Date
Price
Source system
Load date/time
Buyer
Salesperson
Engine
Vehicle data
DATA VAULT - ARCHITECTURE, METHODOLOGY, MODEL
Data Warehouse / DHBWDaimler TSS 23
Lecture part 2:
DWH Architectures
Lecture part 3:
DWH Data Modeling
Architecture
• Multi-Tier
• Scalable
• Supports NoSQL
Methodology
• Repeatable
• Measureable
• Agile
Model
• Flexible
• Hash based
• Hub & Spoke
Implementation: Automation,
Pattern based, High speed
Unique
identification
by
Natural keys
(Business Keys)
HUB
STRUCTURE HUB TABLES
Data Warehouse / DHBWDaimler TSS 25
HUB TABLES: TYPICAL CHARACTERISTICS
Data Warehouse / DHBWDaimler TSS 26
Business Keys should be natural keys used by the business (e.g. Vehicle
Identifier, Serial number)
Business Keys should stand alone and have meaning to the business
Business Keys should never change, have the same semantic meaning and
the same granularity
Must reference at least 1 SAT table
TYING BUSINESS PROCESSES TO BUSINESS KEYS
Time
Procurement
Sales
$$
Revenue
DeliveryContracts
Finance
Planning
Manufacturing
Customer
Contact
Finance
Sales Procurement
SLS123 SLS123 SLS123 *P123MFG
*P123MFG
Excel Spreadsheet
Manual Process
NO VISIBILITY!
© Copyright 1990-2017, Dan Linstedt, all rights reserved
Data Warehouse / DHBW 28Daimler TSS
LINK
Unique
relationships
between
Business Keys
(HUBs)
STRUCTURE LINK TABLES
Data Warehouse / DHBWDaimler TSS 29
LINK TABLES: TYPICAL CHARACTERISTICS
Data Warehouse / DHBWDaimler TSS 30
A LINK models a relationship between 2 or more HUBs
The relationship is always n:m
The composed key must be unique. One of the foreign keys is driving key
Link to Link allowed but should be avoided in a physical implementation
due to load dependency
• Relationships / Associations
• Foreign Keys in OLTP systems
• Hierarchies and Redefinitions
• Hierarchical relationships are modeled by one link and two connections to HUBs:
HAL (parent-child LINK) and SAL (same-as LINK)
• Transactions and events are often modeled as link (could also be a Hub)
• E.g. sales order or sensor data
• Intensive discussions about modeling as Hub or Link on conferences or social
media (modeling solution depends from requirements, context, etc)
CANDIDATES FOR LINKS
Data Warehouse / DHBWDaimler TSS 31
Data Warehouse / DHBW 32Daimler TSS
SAT
Descriptive,
detailled,
current
and
historized
data
STRUCTURE SAT TABLES
Data Warehouse / DHBWDaimler TSS 33
SAT TABLES: TYPICAL CHARACTERISTICS
Data Warehouse / DHBWDaimler TSS 34
Contains all non-key attributes
Is connected to exactly one Hub or Link
HUB or LINK tables can (should) have several SAT tables, e.g. by source
system
Can contain in the extreme case one column only (or any number of
columns)
Different criteria to design SAT tables (separate data into different SAT
tables)
• Source system
• Rate of change
• Data types (e.g. separate CLOBS or other lengthy textual fields)
SAT TABLE DESIGN
Data Warehouse / DHBWDaimler TSS 35
SAT TABLE DESIGN
Rate of change in order to avoid redundant storage of data
Data Warehouse / DHBWDaimler TSS 36
Data that change oftenData that do not change
Delivery date SW-Version Controlunit
Theft message
Color CommentsInterior
EXERCISE DATA VAULT
The following data model shows
vehicle sales with entities
• Person (sales_person and owner)
• Vehicle
• Production_plant
Architect a Data Vault model for the
Core Warehouse Layer
Data Warehouse / DHBWDaimler TSS 37
SAMPLE SOLUTION DATA VAULT
Data Warehouse / DHBWDaimler TSS 38
DIMENSIONAL DATA MODELING IN THE MART LAYER:
ROLAP AND MOLAP
DATA MODELS IN THE DWH
Data Warehouse / DHBWDaimler TSS 40
Layer Characteristics Data Model
Staging Layer Temporary storage
Ingest of source data
Normally 1:1 copy of source table structure –
usually without constraints and indexes
Core Warehouse
Layer
Historization / bitemporal data
Integration
Tool-independent
Non-redundant data storage
Historization
3NF with historization
Head and Version modelling
Data Vault
Anchor modeling
Dimensional model with historization (possible)
Data Mart Layer Performance for end user queries
required, Tool-dependent
Lots of joins necessary to answer
complex questions
Flat structures, esp. Dimensional model
(ROLAP / MOLAP / HOLAP)
Design technique to present data
• in a standard, intuitive framework
• Easily understandable for end users
• Logical data model
• Physical data model: Not necessarily relational
• for high performance end user access
• Analysis / Reporting of numerical measures (metrics) by different attributes
DIMENSIONAL MODELING
Data Warehouse / DHBWDaimler TSS 41
DIMENSIONAL MODEL – IMPLEMENTATION TYPES
Data Warehouse / DHBWDaimler TSS 42
Implementation types of dimensional models
Star Schema = Relational model (ROLAP) consists of
•Fact Tables
•Dimension Tables
Cube = Multidimensional model (MOLAP) consists of
•Edges = Attributes
•Cells = Measures (facts)
Dimensions
• Are entities that contain descriptive textual attributes for analysis
• E.g. Car (model, manufacturer, etc), Time period (day, week, month, year)
Facts
• Contain key numerical figures – “Measures” – “Metrics”
• E.g. Sales amount
(for dimensions: product X in region y and time period z)
DIMENSIONAL MODEL
Data Warehouse / DHBWDaimler TSS 43
DIMENSIONAL MODEL – LOGICAL VIEW
Data Warehouse / DHBWDaimler TSS 44
Sales
Inven
toryStock
#Items
Price
Store City Country
Customer
Product Productgroup
Day Month Year
Measure
Fact table / Cube
Dimension
SAMPLE PRODUCT HIERARCHY
Dimensions can be organized in hierarchies
• i.e. product hierarchy
Data Warehouse / DHBWDaimler TSS 45
Other hierarchies:
• Date Month/Year Quarter/Year Year
• Customer Company Industry
• City County/Landkreis State Country Continent
Arbitrary number of hierarchy levels
Purpose:
• group and structure data
• enable view on data at different levels of granularity
• Hierarchies define aggregations on measures
HIERARCHIES
Data Warehouse / DHBWDaimler TSS 46
ROLAP
Physical data structure: relational tables
• Advantage: can use well-engineered, reliable and high-performance database systems
and query languages
Special table structure
• Star / Snowflake Schema
• Dimension tables with textual attributes
• Fact table with measures consisting of foreign keys to dimension tables
ROLAP
Data Warehouse / DHBWDaimler TSS 48
Special table structure (continued)
• Memory amount depends mainly on the number of facts
• One row per fact
• Size of a row approx. (#dimensions + #measures) * column size
• Aggregated totals are computed dynamically in general
• Longer response times
ROLAP
Data Warehouse / DHBWDaimler TSS 49
Dimensions
• Relational table for each dimension like product, region, time period
• Primary key (surrogates) identifies each dimension element
• Additional fields contain descriptive information like product name
• E.g. Dimensions: Product, Region, Time period (day, week, month, year)
Facts
• Relational table containing key figures – “Measures”
• Stores foreign keys to dimension tables
• The other fields contain the values of the key figures/measures
• E.g. Sales amount (for product X in region y and time period z)
RELATIONAL DATA MODEL
Data Warehouse / DHBWDaimler TSS 50
RELATIONAL MODEL: STAR SCHEMA
Data Warehouse / DHBWDaimler TSS 51
Sales Fact
Time_key (FK)
Product_key (FK)
Location_key (FK)
Branch_key (FK)
Sales_amout
Discount
Time Dimension
Time_key (PK)
Date
Day
Month
Quarter
Year
Product Dimension
Product_key (PK)
Product_name
Supplier_Name
Branch Dimension
Branch_key (PK)
Branch_name
Location Dimension
Location_key (PK)
Street
City
Country
n
n
n
n
Denormalized Dimensions
• 1 Table with all hierarchy levels
• Advantage: 
• Efficient aggregations
• Performance
• Disadvantage:
• Complex updates if hierarchies change
DATA MODELS FOR HIERARCHIES
Data Warehouse / DHBWDaimler TSS 52
Normalized Dimensions
• 1 table for each hierarchy level
• Advantage: 
• Minimal updates for changes
in the hierarchies
• Disadvantage:
• More complex queries
when computing aggregations
• Multiple joins
DATA MODELS FOR HIERARCHIES
Data Warehouse / DHBWDaimler TSS 53
RELATIONAL MODEL: SNOWFLAKE SCHEMA WITH
DENORMALIZED DIMENSIONS
Data Warehouse / DHBWDaimler TSS 54
Sales Fact
Time_key (FK)
Product_key (FK)
Location_key (FK)
Branch_key (FK)
Sales_amout
Discount
Time Dimension
Time_key (PK)
Date
Day
Month
Quarter
Year
Product Dimension
Product_key (PK)
Product_name
Supplier_Key (FK)
Branch Dimension
Branch_key (PK)
Branch_name
Location Dimension
Location_key (PK)
Street
City_key (FK)
City Dimension
City_key (PK)
City
Country_Key (FK)
Supplier Dimension
Supplier_key (PK)
Supplier_Name
Country Dimension
Country_key (PK)
Country
n
n
n
n
n n
n
EXERCISE STAR SCHEMA
The following data model shows
vehicle sales with entities
• Person (sales_person and owner)
• Vehicle
• Production_plant
Architect a Star Schema for the
Data Mart Layer
Data Warehouse / DHBWDaimler TSS 55
SAMPLE SOLUTION STAR SCHEMA
Data Warehouse / DHBWDaimler TSS 56
Used for accelerating data warehouse queries in general
• Precomputation of aggregated values
• Materialized views / query tables store data physically
• Relational Columnar (in-memory) databases
ROLAP ENHANCEMENTS
Data Warehouse / DHBWDaimler TSS 57
Query processing in the Mart Layer
• SQL statements can become complex, e.g. many joins
• SQL statements can become slow if many rows are aggregated
• E.g. sum of sales amount for city X AND product Y AND year 2016 compared to city X
AND product Y AND year 2015
• If aggregated values are stored in Fact tables, new data from the Core Warehouse layer
have to be integrated into such aggregated fact tables
PRECOMPUTATION OF AGGREGATED TOTALS
Data Warehouse / DHBWDaimler TSS 58
The DBMS takes care of solving these problems
• The user defines views containing aggregated values for certain hierarchy levels
• These views are materialized as tables
• Update options
• immediate
• deferred
• When performing a query against a fact table the DB optimizer takes advantage of
these materialized views, i.e., no special queries have to be written for this by a user or
application program
• The user has not to rewrite the original query to use the materialized views
MATERIALIZED VIEWS/QUERY TABLES
Data Warehouse / DHBWDaimler TSS 59
Example statement Oracle to precompute values (similar DB2 and other RDBMS)
CREATE MATERIALIZED VIEW sales_agg
BUILD IMMEDIATE
REFRESH FAST
ON DEMAND
AS
SELECT p.productname, s.city, EXTRACT(MONTH FROM s.date)
, sum(s.sales_amount)
, sum(no_items)
FROM product p
JOIN sales s ON p.productid = s.productid
GROUP by p.productname, s.city, EXTRACT(MONTH FROM s.date);
MATERIALIZED VIEWS / MATERIALIZED QUERY TABLES
Data Warehouse / DHBWDaimler TSS 60
Row-oriented storage
• Data of a relational table is stored row wise:
<values of Row 1><values of Row 2> … <values of Row N>
Column-oriented storage
• The values of each column are stored separately:
<values of Column 1><values of Column 2> … <values of Column M>
RELATIONAL COLUMNAR DATABASES
Data Warehouse / DHBWDaimler TSS 61
ROW AND COLUMN ORIENTED DB BLOCK STORAGE
Data Warehouse / DHBWDaimler TSS
62
Id Name Birthdate
1 Bush 1967
2 Schmitt 1980
3 Bush 1993
4 Berger 1980
5 Miller 1967
6 Bush 1970
7 Miller 1980
Column-oriented storage
1, Bush,
1967, 2
Schmitt,
1980, 3
Bush, 1993,
4, Berger,
1980, 5
Miller,
1967, 6,
Bush, 1970,
…
1, 2, 3, 4,
5, 6, 7, …
Bush,
Schmitt,
Bush,
Berger,
Miller,
Bush,
Miller, …
Row-oriented storage
DB-Page/Block
Row-oriented storage
• Data of one row is grouped on disk and can be retrieved through one read operation
• Single values can be retrieved through efficient index and off-set computations
• Good Insert, update and delete operations performance
• Suited for OLTP systems
ROW VS COLUMN ORIENTED STORAGE
Data Warehouse / DHBWDaimler TSS 63
Column-oriented storage
• Data-of one column is grouped on disk and can be retrieved with far less read
operations than for row-oriented storage
• This makes computation of aggregations much faster in particular for tables with a lot
of columns
• In general better suited for queries involving partial table scans
• Bad Insert, update and delete operations performance
• Normally excellent compression as identical data types are stored in same blocks
• Products: SAP HANA, HP Vertica, Exasol, IBM DB2 BLU, Oracle In-Memory Option, SQL
Server (Columnar Indexes), etc
• Suited for OLAP systems
ROW VS COLUMN ORIENTED STORAGE
Data Warehouse / DHBWDaimler TSS 64
Data changes, e.g.
• new employees
• employees change departments
• employees leave
• whole department reorganisations, etc
How are the changes handled? Insert-only approach in the Core Warehouse
Layer, but choices in the Mart Layer (reduce data amount to what end user
needs)
• What does the business want to see? (Reporting Scenarios)
• How is data inserted / updated in dimensions? (Slowly Changing Dimensions)
HOW TO COVER DATA CHANGES IN THE MART?
Data Warehouse / DHBWDaimler TSS 65
• As-is scenario
• As-of scenario
• As-posted scenario
• As-posted with comparable data scenario
REPORTING SCENARIOS
Data Warehouse / DHBWDaimler TSS 66
DATA MART – EXAMPLE BASELINE
Data Warehouse / DHBWDaimler TSS 67
Employee Organisation
Miller DWH
Rogers DWH
Douglas Database
Powell Database
EmployeeDimension2015
Employee Organisation
Miller DWH
Rogers DWH
Powell DWH
Douglas Database
Bush Database
EmployeeDimension2016
Employee Year #Pro-
jects
Miller 2015 10
Rogers 2015 10
Douglas 2015 10
Powell 2015 10
Miller 2016 10
Rogers 2016 10
Powell 2016 10
Douglas 2016 10
Bush 2016 10
FactsAssumption: current year: 2016
New employee
Other department
Reporting uses current structure
AS-IS SCENARIO
Data Warehouse / DHBWDaimler TSS 68
Employee Organisation
Miller DWH
Rogers DWH
Powell DWH
Douglas Database
Bush Database
EmployeeDimension2016
Employee Year #Pro-
jects
Miller 2015 10
Rogers 2015 10
Douglas 2015 10
Powell 2015 10
Miller 2016 10
Rogers 2016 10
Powell 2016 10
Douglas 2016 10
Bush 2016 10
Facts
Organisation #Projects ´15 #Projects ´16
DWH 30 30
Database 10 20
Reporting uses structure as demanded
e.g. requested for 2015
AS-OF SCENARIO
Data Warehouse / DHBWDaimler TSS 69
Employee Organisation
Miller DWH
Rogers DWH
Douglas Database
Powell Database
EmployeeDimension2015
Employee Year #Pro-
jects
Miller 2015 10
Rogers 2015 10
Douglas 2015 10
Powell 2015 10
Miller 2016 10
Rogers 2016 10
Powell 2016 10
Douglas 2016 10
Bush 2016 10
Facts
Organisation #Projects ´15 #Projects ´16
DWH 20 20
Database 20 20
Reporting uses „historical truth“
AS-POSTED SCENARIO
Data Warehouse / DHBWDaimler TSS 70
Organisation #Projects ´15 #Projects ´16
DWH 20 30
Database 20 20
AS-POSTED WITH COMPARABLE DATA SCENARIO
Data Warehouse / DHBWDaimler TSS 71
Reporting uses „historical truth“ for
identical dimension data
Organisation #Projects ´15 #Projects ´16
DWH 20 20
Database 10 10
Dimensions must absorb changes
Slowly changing dimensions according to Kimball / Ross (2002):
• SCD Type 0
• no changes, new data is ignored
• SCD Type 1 - 3
• See next slides
• And some more SCD types
• Rarely relevant
SLOWLY CHANGING DIMENSIONS
Data Warehouse / DHBWDaimler TSS 72
Changes:
• New data added: Albert, DWH
• Powell marries and has new name Parker
SLOWLY CHANGING DIMENSIONS – EXAMPLE BASELINE
Data Warehouse / DHBWDaimler TSS 73
ID Employee Organisation
1 Miller DWH
2 Powell Database
Employee
Dimension
• No History
• Dimension attributes
always contain current
data
Changes:
• New data added: Albert, DWH
• Powell marries and has new
name Parker
SLOWLY CHANGING DIMENSION TYPE 1
Data Warehouse / DHBWDaimler TSS 74
Employee
Dimension
ID Employee Organisation
1 Miller DWH
2 Parker Database
3 Albert DWH
Employee
Dimension
ID Employee Organisation
1 Miller DWH
2 Powell Database
• Full Historization
• Dimension contains
timestamps with NULLs or
future date like
31.12.2999
Changes:
• New data added: Albert, DWH
• Powell marries and has new
name Parker
SLOWLY CHANGING DIMENSION TYPE 2
Data Warehouse / DHBWDaimler TSS 75
Employee
Dimension
I
D
Emplo
yee
Organisa
tion
Valid From Valid To
1 Miller DWH 01.01.2015 NULL
2 Powell Database 21.12.2014 15.10.2016
3 Albert DWH 05.03.2014 NULL
2 Parker Database 15.10.2016 NULL
Employee
Dimension
ID Employee Organisation
1 Miller DWH
2 Powell Database
• Historization of latest
change only
• And storage of current
value
Changes:
• New data added: Albert, DWH
• Powell marries and has new
name Parker
SLOWLY CHANGING DIMENSION TYPE 3
Data Warehouse / DHBWDaimler TSS 76
Employee
Dimension
I
D
Employee
Name
Previous
Name
Organisa
tion
Previous
Organisation
1 Miller NULL DWH NULL
2 Parker Powell Database NULL
3 Albert NULL DWH NULL
Employee
Dimension
ID Employee Organisation
1 Miller DWH
2 Powell Database
• Conformed dimension
• Junk dimension
• Role-Playing dimension
• Degenerated dimension
• Transactional fact
• Periodic fact
• Accumulating fact
DIMENSION AND FACT TABLE TYPES
Data Warehouse / DHBWDaimler TSS 77
• Dimension that is used in several fact tables
• Fact tables can be connected by using conformed dimensions
DIMENSION TYPES: CONFORMED DIMENSION
Data Warehouse / DHBWDaimler TSS 78
Sales
Fact
Inventory
Fact
Product Dimension
Location Dimension
Kimball: Enterprise DWH Bus Matrix is a “design tool” to document the
organization’s processes
DIMENSION TYPES: CONFORMED DIMENSION
Data Warehouse / DHBWDaimler TSS 79
Date Product Location Customer Promotion
Sales Fact X X X X X
Inventory Fact X X X
Customer Returns Fact X X X X
Sales Forecast Fact X X X
Collection of lookup data / codes that could also form it’s own dimension
DIMENSION TYPES: JUNK DIMENSION
Data Warehouse / DHBWDaimler TSS 80
ID MartialStatus Gender
1 Single Male
2 Single Female
3 Married Male
4 Married Female
A single dimension is referenced several times by the same fact table
• E.g. several dates in fact table reference Date Dimension
DIMENSION TYPES: ROLE-PLAYING DIMENSION
Data Warehouse / DHBWDaimler TSS 81
ID OrderDate DeliveryDate ProductionDate
1 .. .. ..
2 .. .. ..
3 .. .. ..
4 .. .. ..
• A dimension without own dimension table. Data are stored in the fact
table only.
• Used e.g. for drill-through in reports
• E.g. OrderNumber in sales fact table
DIMENSION TYPES: DEGENERATED DIMENSION
Data Warehouse / DHBWDaimler TSS 82
ID OrderNumber
1 A51273 .. ..
2 72841 .. ..
3 732GT5 .. ..
4 624TR5K .. ..
Transactional
• Most common
• Usually one row per line/event in a transaction
• Most detailed level
• The grain must (should) be the same for all rows
• Measures can usually be aggregated: “additive measure” (e.g. sum over sales amount)
• E.g. fact table for sales data
TYPES OF FACT TABLES - TRANSACTIONAL
Data Warehouse / DHBWDaimler TSS 83
Periodic snapshots
• Picture of the time
• Often computed from transactional fact table, e.g. aggregated by month
• Measures can usually not be aggregated (e.g. sum over inventory does not make sense
as inventory is already snapshot / sum for a day)
• The grain must (should) be the same for all rows
• E.g. fact table for inventory data (summed up for each day)
TYPES OF FACT TABLES - PERIODIC
Data Warehouse / DHBWDaimler TSS 84
Accumulating snapshots
• Shows activity of a process/event over time
• The data is not complete at the beginning and is updated as soon as new data arrived
(e.g. delivery date can be unknown at the beginning)
• The grain must (should) be the same for all rows
• E.g. fact table for processing an order
TYPES OF FACT TABLES - ACCUMULATING
Data Warehouse / DHBWDaimler TSS 85
How many cabriolets (D_Model.model) have been
Built in January and February 2016?
Assume SCD1 and no history in fact tables
EXERCISE: QUERIES 1
Data Warehouse / DHBWDaimler TSS 86
Count
01/2016
02/2016
How many cabriolets (D_Model.model) have been
Built in January and February 2016?
SELECT d.month, d.year, sum(f.count)
FROM f_vehicle_built f
JOIN d_model m on m.modelid = f.modelid
JOIN d_production_date d on d.prod_date = f.prod_date
WHERE m.model = ‘Cabriolet‘
AND d.month IN (1, 2) AND d.year = 2016
GROUP BY d.month, d.year
EXERCISE: QUERIES 1
Data Warehouse / DHBWDaimler TSS 87
How many different models (D_Model.model) have
Currently a performance of 105PS (D_ENGINE.performance)?
Assume SCD1 and no history in fact tables
EXERCISE: QUERIES 2
Data Warehouse / DHBWDaimler TSS 88
Model Count
Cabriolet
SUV
…
How many different models (D_Model.model) have
Currently a performance of 105PS (D_ENGINE.performance)?
Select m.model, sum(f.count)
FROM f_vehicle_built f
JOIN d_model m on m.modelid = f.modelid
JOIN d_engine e on e.engineid = engineid
WHERE e.performance = 105
GROUP BY m.model
EXERCISE: QUERIES 2
Data Warehouse / DHBWDaimler TSS 89
How many different models (D_Model.model) have
Currently a performance of 105PS (D_ENGINE.performance)?
EXERCISE: QUERIES 3
Data Warehouse / DHBWDaimler TSS 90
Model Count
Cabriolet
SUV
…
How many different models (D_Model.model) have
Currently a performance of 105PS (D_ENGINE.performance)?
CREATE VIEW v_vehicle_sat as
SELECT h_vehicle_key, max(loaddate), model
FROM s_vehicle_base
GROUP BY h_vehicle_key;
CREATE VIEW v_engine_sat as
SELECT h_engine_key, max(loaddate), performance
FROM s_engine
GROUP BY h_engine_key;
EXERCISE: QUERIES 3
Data Warehouse / DHBWDaimler TSS 91
How many different models (D_Model.model) have
Currently a performance of 105PS (D_ENGINE.performance)?
SELECT model, count(*)
FROM v_vehicle_sat v
JOIN l_plugged_into l ON l.h_vehicle_key = v.h_vehicle_key
JOIN v_engine_sat e ON l.h_engine_key = e.h_engine_key
JOIN s_engine s ON s.h_engine_key = e.h_engine_key
AND s.loaddate = e.loaddate
WHERE s.performance = 105
GROUP by model;
EXERCISE: QUERIES 3
Data Warehouse / DHBWDaimler TSS 92
Many other solutions possible, e.g. using with clause
instead of views or using window functions – all
depending from DB vendor/version
MOLAP
Edges of a cube (“Dimension”)
• Attributes like Product, Region, Time period (day, week, month, year)
Cells of a cube (“Measures”)
• Key Figures (i.e. sales amount, profit) – “measures”
• For every combination of attribute values one value of each key figure, e.g. Sales
amount for product X in region y and time period z
• Can be NULL and is stored as empty cell
MULTIDIMENSIONAL DATA MODEL
Data Warehouse / DHBWDaimler TSS 94
A database specially designed to handle the organization of data in multiple
dimensions
• Good for DWH requirements only but not generally suited like a relational DBMS
• E.g. IBM Cognos TM1, Oracle Essbase, Microsoft Analysis Services, Oracle OLAP
Option, IBM Cognos Powerplay
Holds data cells in blocks that constitute a virtual cube
Optimized to handle numeric data
• Aggregated totals often precalculated
• Not intended for textual data
MOLAP - MULTIDIMENSIONAL DATABASES
Data Warehouse / DHBWDaimler TSS 95
Linearization of the cells in a cube into a one-dimensional array
Memory amount: #(dim1) x #(dim2) x ... x #(dimN)
Depends on the number of dimensions and their cardinality, not on
the number of facts
Example:
• Cube with 2 dimensions with 3 and 1 dimension with 2 elements
• Memory amount = size = 3*3*2 = 18 cells
• The numbers in the cube cells indicate the position in
the array
MULTIDIMENSIONAL STORAGE
Data Warehouse / DHBWDaimler TSS 96
Cube with 3 dimensions
• Product – 4 values – p1, p2, p3, p4
• Store – 3 values – s1, s2, s3
• Time (year) – 2 values - y1, y2
Number of cells in the cube: 4 x 3 x 2 = 24
EXAMPLE
Data Warehouse / DHBWDaimler TSS 97
Sales in year y2
EXAMPLE
Data Warehouse / DHBWDaimler TSS 98
Sales of store s1 in year y2
EXAMPLE
Data Warehouse / DHBWDaimler TSS 99
Sales of product p2 in year y1
EXAMPLE
Data Warehouse / DHBWDaimler TSS 100
Selection
• Definition of a filter
• Select data of a single cell with a condition for each dimension
• For instance:
• time = 'January 2006'
• location = 'Stuttgart'
• product = ‘ThinkPad T60'
MULTIDIMENSIONAL OPERATIONS - SELECTION
Data Warehouse / DHBWDaimler TSS 101
EXAMPLE SELECTION
Data Warehouse / DHBWDaimler TSS 102
Slice
• Definition of a filter
• Condition for one single dimension
• Select a new cube with one fewer dimension
• For instance
• Product = ‘ThinkPad T60'
MULTIDIMENSIONAL OPERATIONS - SLICE
Data Warehouse / DHBWDaimler TSS 103
EXAMPLE SLICE
Data Warehouse / DHBWDaimler TSS 104
Dice
• Definition of intervals/sets as filter
• Pick specific values of multiple dimensions
• Select a smaller cube
• Conditions for instance
• time = 1st quarter (January, February, March)
• location = region south (Stuttgart, Frankfurt, Munich)
MULTIDIMENSIONAL OPERATIONS - DICE
Data Warehouse / DHBWDaimler TSS 105
EXAMPLE DICE
Data Warehouse / DHBWDaimler TSS 106
Rotate/Pivot
• Rotate cube along its axes
• Get different view on data cube
• # of views on cube = (# of dimensions)!
• 2 dimensions, 2 views (2! = 2*1)
• 3 dimensions, 6 views (3! = 3*2*1)
• 4 dimensions, 24 views (4! = 4*3*2*1)
• ...
MULTIDIMENSIONAL OPERATIONS – ROTATE/PIVOT
Data Warehouse / DHBWDaimler TSS 107
EXAMPLE ROTATE/PIVOT
Data Warehouse / DHBWDaimler TSS 108
Roll-up & Drill-down
• Prerequisites:
• Hierarchies defined
• Aggregated data for all hierarchy levels available
• Roll up: change hierarchy level "upwards":
• get less detailed data (= higher aggregation)
• Drill down: change hierarchy level "downwards":
• get more detailed data (= lower aggregation)
MULTIDIMENSIONAL OPERATIONS – ROLL-UP/DRILL-
DOWN
Data Warehouse / DHBWDaimler TSS 109
EXAMPLE ROLL-UP & DRILL-DOWN
Data Warehouse / DHBWDaimler TSS 110
ROLAP = SQL is standard language
MOLAP = MDX - Multidimensional Expressions
• De-facto industry standard developed by Microsoft
• Very complex
• SQL like syntax
• Language elements
• Scalar – data type „string“ or „number“
• Dimension, Hierarchy, Level, Member
• …
MDX - OLAP QUERY LANGUAGE
Data Warehouse / DHBWDaimler TSS 111
SELECT
{ [Measures].[Store Sales] } ON COLUMNS,
{ [Date].[2002], [Date].[2003] } ON ROWS
FROM Sales
WHERE ( [Store].[USA].[CA] )
This query defines the following result set information:
• The SELECT clause sets the query axes as the Store Sales (amount) member and the
2002 and 2003 members of the Date dimension
• The FROM clause indicates that the data source is the Sales cube
• The WHERE clause defines the "slicer axis" for member California of Store dimension
MDX SAMPLE QUERY
Data Warehouse / DHBWDaimler TSS 112
Store Sales
2002 95863,66
2003 99764,01
MOLAP - ROLAP
MOLAP ROLAP
Database type Multidimensional Relational
Data storage Special storage engines for cube data Star schema – special relational data model
Size 100s of Gigabytes 10s of Terabytes
Query language MDX SQL
Data Warehouse / DHBWDaimler TSS 113
MOLAP - ROLAP
MOLAP ROLAP
Advantages • special database products optimized for
multidimensional analysis
• short response times, e.g. no joins
• suitable storage schema and query
processing for multidimensional data
• can use existing, well established DBMS
• easy data import, update
• user access, backup, security
mechanisms from DBMS can be used
Disadvantages • problems with sparsity (ratio occupied
/ not occupied cells): "null" is stored in
a field with same length as any value
• limited data volume: 5-6 dimensions
• cube data read-only accessible only for
end users
• expensive update operation
• Complex SQL queries for processing OLAP
requests longer response times
(solution: Materialized Views and In-
memory columnar databases)
Data Warehouse / DHBWDaimler TSS 114
Combines the advantages or ROLAP and MOLAP
Relational DBMS for storage of sparse, historic data
• Data of highest granularity level
Multidimensional DBMS for efficient storage of dense data cubes
• Multidimensional cache for aggregated totals
Complex architecture and maintenance processes
No uniform OLAP query processing
HOLAP – HYBRID OLAP
Data Warehouse / DHBWDaimler TSS 115
The following is a data model used by a supermarket chain to analyze their
business:
EXERCISE: OLAP
Data Warehouse / DHBWDaimler TSS 116
With each transaction, an average of 20 different articles are bought.
The data warehouse collects sales transactions data over 2 years.
There are 1000 stores with 2000 transactions per store and day.
Questions:
• 1.  What are the columns of the ROLAP fact table?
• 2. How many records are stored in the fact table?
• 3.  What is the size of the cube (number of cells) that stores the aggregated values at
the most detailed level?
• 4.  Compute the respective cube sizes for the other 3 (higher) hierarchy levels.
EXERCISE: OLAP
Data Warehouse / DHBWDaimler TSS 117
1.  What are the columns of the ROLAP fact table?
• Trans. No. (FK to dimension)
• Date (FK to dimension)
• Location (FK to dimension)
• Article (FK to dimension)
• No of articles (measure) and Article Price (measure)
2. How many records are stored in the fact table?
• One record per transaction and article (with quantity and price)
• 2 years * 365 days/year * 1000 stores * 2000 transactions/(store*day)* 20
articles/transaction = 29.200.000.000 articles/records
EXERCISE: OLAP
Data Warehouse / DHBWDaimler TSS 118
3.  What is the size of the cube (number of cells) that stores the aggregated
values at the most detailed level?
• 2 years * 365 [days]/year * 2000 [transactions] * 1000 [stores] * 50000 [articles]
= 73.000.000.000.000 cells
4.  Compute the respective cube sizes for the other 3 hierarchy levels.
• Level 2: 2 years * 12 [months]/year * 500 [cities] * 2000 [product groups]
= 24.000.000 cells
• Level 3: 2 years * 4 [quarters]/year * 20 [regions] * 200 [product categories]
= 32.000 cells
• Level 4: 2 [years] * 5 [regions] * 10 [product departments] = 100 cells
EXERCISE: OLAP
Data Warehouse / DHBWDaimler TSS 119
• Data modeling in the Core Warehouse Layer
• Choices like Data Vault
• Data modeling in the Mart Layer
• Dimensional Modeling
• ROLAP (Star Schema with fact and dimension tables)
• MOLAP (Cubes)
SUMMARY
Data Warehouse / DHBWDaimler TSS 120
Daimler TSS GmbH
Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99
tss@daimler.com / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSS
Domicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle
Data Warehouse / DHBWDaimler TSS 121
THANK YOU
OnLine Analytical Processing
• Term introduced by E. Codd in 1993 in a white paper for Arbor Essbase
• 12 criteria for OLAP systems like
• Multi-dimensionality
• Transparency
• Constant response-times
• Multi-user support
• Flexible definition of reports
• No limits on dimensions and hierarchy levels
OLAP – 12 CRITERIA BY CODD
Data Warehouse / DHBWDaimler TSS 122
FASMI – Fast Analysis of Shared Multidimensional Information
Criteria by Pendse/Creeth (1995)
• Fast
• maximum response time for regular queries 5 seconds and complex queries not more
20 seconds
• Analysis
• intuitive analysis, easy/no programming
• flexible: queries may contain arbitrary computations
OLAP – FASMI CRITERIA
Data Warehouse / DHBWDaimler TSS 123
• Shared
• Multi user capable: Shared usage and access control
• Multidimensional
• Multidimenional view on the data
• regardless of the underlying data model
• Full support of hierarchies
• Information
• User must be able to get all data without restrictions by the used OLAP system, no
restriction in regards to scalability
OLAP – FASMI CRITERIA
Data Warehouse / DHBWDaimler TSS 124

Mais conteúdo relacionado

Mais procurados

Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouse
Uday Kothari
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
kiran14360
 

Mais procurados (20)

HANA Modeling
HANA Modeling HANA Modeling
HANA Modeling
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouse
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
Data Warehousing and Bitmap Indexes - More than just some bits
Data Warehousing and Bitmap Indexes  - More than just some bitsData Warehousing and Bitmap Indexes  - More than just some bits
Data Warehousing and Bitmap Indexes - More than just some bits
 
An Introduction To BI
An Introduction To BIAn Introduction To BI
An Introduction To BI
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Tableau Architecture
Tableau ArchitectureTableau Architecture
Tableau Architecture
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
SAS Programming For Beginners | SAS Programming Tutorial | SAS Tutorial | SAS...
SAS Programming For Beginners | SAS Programming Tutorial | SAS Tutorial | SAS...SAS Programming For Beginners | SAS Programming Tutorial | SAS Tutorial | SAS...
SAS Programming For Beginners | SAS Programming Tutorial | SAS Tutorial | SAS...
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
SAS/Tableau integration
SAS/Tableau integrationSAS/Tableau integration
SAS/Tableau integration
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
Data warehouse system and its concepts
Data warehouse system and its conceptsData warehouse system and its concepts
Data warehouse system and its concepts
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Star schema
Star schemaStar schema
Star schema
 
Teradata sql-tuning-top-10
Teradata sql-tuning-top-10Teradata sql-tuning-top-10
Teradata sql-tuning-top-10
 
Datastage ppt
Datastage pptDatastage ppt
Datastage ppt
 
Inmon & kimball method
Inmon & kimball methodInmon & kimball method
Inmon & kimball method
 
02 Essbase
02 Essbase02 Essbase
02 Essbase
 

Destaque

Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
shady85
 
Olap 1p
Olap 1pOlap 1p
Olap 1p
Valldo
 
Data vault seminar May 5-6 Dommel - The factory and the workshop
Data vault seminar May 5-6 Dommel - The factory and the workshopData vault seminar May 5-6 Dommel - The factory and the workshop
Data vault seminar May 5-6 Dommel - The factory and the workshop
johannesvdb
 
TKA chinese version
TKA chinese versionTKA chinese version
TKA chinese version
Kwong Lam
 

Destaque (20)

Metadaten und Data Vault (Meta Vault)
Metadaten und Data Vault (Meta Vault)Metadaten und Data Vault (Meta Vault)
Metadaten und Data Vault (Meta Vault)
 
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der AutomobilindustrieCDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
CDC und Data Vault für den Aufbau eines DWH in der Automobilindustrie
 
Lambdaarchitektur für BigData
Lambdaarchitektur für BigDataLambdaarchitektur für BigData
Lambdaarchitektur für BigData
 
Data Warehouse e Data Mining
Data Warehouse e Data MiningData Warehouse e Data Mining
Data Warehouse e Data Mining
 
Datawarehouse
DatawarehouseDatawarehouse
Datawarehouse
 
Olap 1p
Olap 1pOlap 1p
Olap 1p
 
Data vault seminar May 5-6 Dommel - The factory and the workshop
Data vault seminar May 5-6 Dommel - The factory and the workshopData vault seminar May 5-6 Dommel - The factory and the workshop
Data vault seminar May 5-6 Dommel - The factory and the workshop
 
Data Vault ReConnect Speed Presenting PM Part Four
Data Vault ReConnect Speed Presenting PM Part FourData Vault ReConnect Speed Presenting PM Part Four
Data Vault ReConnect Speed Presenting PM Part Four
 
Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data Vault
 
Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011
 
Data Vault ReConnect Speed Presenting AM Part One
Data Vault ReConnect Speed Presenting AM Part OneData Vault ReConnect Speed Presenting AM Part One
Data Vault ReConnect Speed Presenting AM Part One
 
Business Intelligence Overview
Business Intelligence OverviewBusiness Intelligence Overview
Business Intelligence Overview
 
Slides d aschenckc02
Slides d aschenckc02Slides d aschenckc02
Slides d aschenckc02
 
How To Use Ecwid Part 1
How To Use Ecwid Part 1How To Use Ecwid Part 1
How To Use Ecwid Part 1
 
TKA chinese version
TKA chinese versionTKA chinese version
TKA chinese version
 
Resume_Sanjay1
Resume_Sanjay1Resume_Sanjay1
Resume_Sanjay1
 
наше героическое прошлое
наше героическое прошлоенаше героическое прошлое
наше героическое прошлое
 
Zaid Cv 1
Zaid Cv 1Zaid Cv 1
Zaid Cv 1
 
Sự sáng tạo
Sự sáng tạoSự sáng tạo
Sự sáng tạo
 
cv
cvcv
cv
 

Semelhante a Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Inside Analysis
 

Semelhante a Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW) (20)

SQL For Programmers -- Boston Big Data Techcon April 27th
SQL For Programmers -- Boston Big Data Techcon April 27thSQL For Programmers -- Boston Big Data Techcon April 27th
SQL For Programmers -- Boston Big Data Techcon April 27th
 
Prague data management meetup 2016-11-22
Prague data management meetup 2016-11-22Prague data management meetup 2016-11-22
Prague data management meetup 2016-11-22
 
SQL For PHP Programmers
SQL For PHP ProgrammersSQL For PHP Programmers
SQL For PHP Programmers
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
RDA Bootcamp
RDA BootcampRDA Bootcamp
RDA Bootcamp
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
MySQL vs. MonetDB
MySQL vs. MonetDBMySQL vs. MonetDB
MySQL vs. MonetDB
 
Prague data management meetup #31 2020-01-27
Prague data management meetup #31 2020-01-27Prague data management meetup #31 2020-01-27
Prague data management meetup #31 2020-01-27
 
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015
SQL for PHP Programmers -- Dallas PHP Users Group Jan 2015
 
Lecture 7 database
Lecture 7 databaseLecture 7 database
Lecture 7 database
 
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with InnovationNot Your Father’s Data Warehouse: Breaking Tradition with Innovation
Not Your Father’s Data Warehouse: Breaking Tradition with Innovation
 
IT talk SPb: Найдется все
IT talk SPb: Найдется всеIT talk SPb: Найдется все
IT talk SPb: Найдется все
 
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS FOR BIG DATA MANAGEMENT.SAP HANA DATABASE.
 
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.
IN-MEMORY DATABASE SYSTEMS.SAP HANA DATABASE.
 
Implementing Change Systems in SQL Server 2016
Implementing Change Systems in SQL Server 2016Implementing Change Systems in SQL Server 2016
Implementing Change Systems in SQL Server 2016
 
Spark cassandra integration 2016
Spark cassandra integration 2016Spark cassandra integration 2016
Spark cassandra integration 2016
 
Go fast in a graph world
Go fast in a graph worldGo fast in a graph world
Go fast in a graph world
 
The Data Architect Manifesto
The Data Architect ManifestoThe Data Architect Manifesto
The Data Architect Manifesto
 
Perl DBI Scripting with the ILS
Perl DBI Scripting with the ILSPerl DBI Scripting with the ILS
Perl DBI Scripting with the ILS
 
Spark Cassandra 2016
Spark Cassandra 2016Spark Cassandra 2016
Spark Cassandra 2016
 

Último

Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
gajnagarg
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 

Último (20)

Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 

Part 2 - Data Warehousing Lecture at BW Cooperative State University (DHBW)

  • 1. A company of Daimler AG LECTURE @DHBW: DATA WAREHOUSE PART II: DWH DATA MODELING & OLAP ANDREAS BUCKENHOFER, DAIMLER TSS
  • 3. DAIMLER TSS. IT EXCELLENCE: COMPREHENSIVE, INNOVATIVE, CLOSE. We're a specialist and strategic business partner for innovative IT Solutions within Daimler – not just another supplier! As a 100% subsidiary of Daimler, we live the culture of excellence and aspire to take an innovative and technological lead. With our outstanding technological and methodical competence we are a competent provider of services that help those who benefit from them to stand out from the competition. When it comes to demanding IT questions we create impetus, especially in the core fields car IT and mobility, information security, analytics, shared services and digital customer experience. Data Warehouse / DHBWDaimler TSS GmbH 3 TSS 2 0 2 0 ALWAYS ON THE MOVE.
  • 4. Daimler TSS GmbH 4 LOCATIONS Data Warehouse / DHBW Daimler TSS China Hub Beijing 6 Employees Daimler TSS Malaysia Hub Kuala Lumpur 38 Employees Daimler TSS India Hub Bangalore 16 Employees Daimler TSS Germany 6 Locations More than1000 Employees Ulm (Headquarters) Stuttgart Area Böblingen, Echterdingen, Leinfelden, Möhringen Berlin
  • 5. After the end of this lecture you will be able to Understand differences in data modeling between OLTP and OLAP Understand why data modeling is important Understand data modeling in the Core Warehouse Layer and Data Mart Layer • Data Vault • Dimensional Model / Star schema Understand dimensions and facts Understand ROLAP & MOLAP WHAT YOU WILL LEARN TODAY Data Warehouse / DHBWDaimler TSS 5
  • 6.  Requirements • Efficient update and delete operations • Efficient read operations • Avoid contradiction in the data – don’t store data twice or multiple times • Easy maintenance of the data model As little redundancy as possible in the data model DATA MODELING FOR OLTP APPLICATIONS Data Warehouse / DHBWDaimler TSS 6
  • 7. First Normal Form (1NF): • A relation/table is in first normal form if • the domain of each attribute contains only atomic (simple, indivisible) values. • the value of any attribute in a tuple/row must be a single value from the domain of that attribute, i.e. no attribute values can be sets CODD‘S NORMAL FORMS FOR DB RELATIONS: 1NF Data Warehouse / DHBWDaimler TSS 7
  • 8. CODD‘S NORMAL FORMS FOR DB RELATIONS: 1NF Data Warehouse / DHBWDaimler TSS 8 CD_ID Album Founded Titels 11 Anastacia – Not that kind 1999 1. Not that kind, 2. I‘m outta love, 3 Cowboys & Kisses 12 Pink Floyd – Wish you were here 1964 1. Shine on you crazy diamond 13 Anastacia – Freak of Nature 1999 1. Paid my dues CD_ID Album Performer Founded Track Titels 11 Not that kind Anastacia 1999 1 Not that kind 11 Not that kind Anastacia 1999 2 I‘m outta love 11 Not that kind Anastacia 1999 3 Cowboys & Kisses 12 Wish you were here Pink Floyd 1964 1 Shine on you crazy diamond 13 Freak of Nature Anastacia 1999 1 Paid my dues
  • 9. Second Normal Form (2NF): • In 1st normal form • Every non-key attribute is fully dependent on the key. There are no dependencies between a partial key and a non-key field. CODD‘S NORMAL FORMS FOR DB RELATIONS: 2NF Data Warehouse / DHBWDaimler TSS 9
  • 10. CODD‘S NORMAL FORMS FOR DB RELATIONS: 2NF Data Warehouse / DHBWDaimler TSS 10 CD_ID Album Performer Founded Track Titels 11 Not that kind Anastacia 1999 1 Not that kind 11 Not that kind Anastacia 1999 2 I‘m outta love 11 Not that kind Anastacia 1999 3 Cowboys & Kisses 12 Wish you were here Pink Floyd 1964 1 Shine on you crazy diamond 13 Freak of Nature Anastacia 1999 1 Paid my dues CD_ID Track Titels 11 1 Not that kind 11 2 I‘m outta love 11 3 Cowboys & Kisses 12 1 Shine on you crazy diamond 13 1 Paid my dues CD_ID Album Performer Founded 11 Not that kind Anastacia 1999 12 Wish you were here Pink Floyd 1964 13 Freak of Nature Anastacia 1999
  • 11. Third Normal Form (3FN): • In 2nd normal form • No functional dependencies between non key fields: a non-key attribute is dependent from a PK only CODD‘S NORMAL FORMS FOR DB RELATIONS: 3NF Data Warehouse / DHBWDaimler TSS 11
  • 12. CODD‘S NORMAL FORMS FOR DB RELATIONS: 3NF Data Warehouse / DHBWDaimler TSS 12 CD_ID Track Titels 11 1 Not that kind 11 2 I‘m outta love 11 3 Cowboys & Kisses 12 1 Shine on you crazy diamond 13 1 Paid my dues CD_ID Album Performer Founded 11 Not that kind Anastacia 1999 12 Wish you were here Pink Floyd 1964 13 Freak of Nature Anastacia 1999 CD_ID Album Performer 11 Not that kind Anastacia 12 Wish you were here Pink Floyd 13 Freak of Nature Anastacia Performer Founded Anastacia 1999 Pink Floyd 1964
  • 13. CODD‘S NORMAL FORMS - SUMMARY FROM 1NF TO 3NF Data Warehouse / DHBWDaimler TSS 13 CD_ID Track Titels 11 1 Not that kind 11 2 I‘m outta love 11 3 Cowboys & Kisses 12 1 Shine on you crazy diamond 13 1 Paid my dues CD_ID Album Performer 11 Not that kind Anastacia 12 Wish you were here Pink Floyd 13 Freak of Nature Anastacia Performer Founded Anastacia 1999 Pink Floyd 1964 CD_ID Album Founded Titels 11 Anastacia – Not that kind 1999 1. Not that kind, 2. I‘m outta love, 3 Cowboys & Kisses 12 Pink Floyd – Wish you were here 1964 1. Shine on you crazy diamond 13 Anastacia – Freak of Nature 1999 1. Paid my dues n n
  • 14. WHY DATA MODELING? Data Warehouse / DHBWDaimler TSS 14
  • 15. “Data modeling is the process of learning about the data, and regardless of technology, this process must be performed for a successful application.” • Learn about the data and promote collective data understanding • Derive security classification and measures • Design for performance • Accelerate development • Improve Software quality • Reduce maintenance costs • Generate code • NoSQL Schema-on-read: understand model versions after years WHY DATA MODELING? Data Warehouse / DHBWDaimler TSS 15 Source quote: Steve Hoberman: Data Modeling for Mongo DB, Technics Publications 2014
  • 16. The diagram shows a typical OLTP data model • Customers and products have unique ids and some descriptive attributes • A customer can place an order on a specific date • The order contains one or more products EXERCISE: OLTP DATA MODEL FOR DWH Data Warehouse / DHBWDaimler TSS 16
  • 17. Now consider DWH requirements like non-volatile and time-variant data • Customer Bush marries and takes her husband’s last name • Product number 5 gets a price increase How would you solve such requirements in a data model for the Core Warehouse Layer? EXERCISE: OLTP DATA MODEL FOR DWH Data Warehouse / DHBWDaimler TSS 17
  • 18. Possible solutions: • Add timestamp column as part of the primary key • For all tables, not only for specific tables (e.g. product, customer) • New tables with head and version data to avoid redundancy • Head table contains static data that does not change (e.g. customer id, birthdate) • Version table contains data that changes (e.g. last name, comments) DWH needs own data modeling approaches for the Core Warehouse Layer and the Mart Layer EXERCISE: OLTP DATA MODEL FOR DWH Data Warehouse / DHBWDaimler TSS 18
  • 19. LOGICAL STANDARD DATA WAREHOUSE ARCHITECTURE Data Warehouse / DHBWDaimler TSS 19 Data Warehouse FrontendBackend External data sources Internal data sources Staging Layer (Input Layer) OLTP OLTP Core Warehouse Layer (Storage Layer) Mart Layer (Output Layer) (Reporting Layer) Integration Layer (Cleansing Layer) Aggregation Layer Metadata Management Security DWH Manager incl. Monitor
  • 20. DATA MODELING IN THE CORE WAREHOUSE LAYER
  • 21. DATA MODELS IN THE DWH Data Warehouse / DHBWDaimler TSS 21 Layer Characteristics Data Model Staging Layer Temporary storage Ingest of source data Normally 1:1 copy of source table structure – usually without constraints and indexes Core Warehouse Layer Historization / bitemporal data Integration Tool-independent Non-redundant data storage Historization 3NF with historization Head and Version modelling Data Vault Anchor modeling Dimensional model with historization (possible) Data Mart Layer Performance for end user queries required, Tool-dependent Lots of joins necessary to answer complex questions Flat structures, esp. Dimensional model (ROLAP / MOLAP / HOLAP)
  • 22. DATA MODELING: 3NF, STAR SCHEMA, DATA VAULT Data Warehouse / DHBWDaimler TSS 22 Business Key Relationships Context and history 3NF Star: Dimensions Star: Facts Data Vault: Hub Data Vault: Link Data Vault: Sat Vehicle identifier Manufacturer Model Type Plant Delivery Date Production Date Price Source system Load date/time Buyer Salesperson Engine Vehicle data
  • 23. DATA VAULT - ARCHITECTURE, METHODOLOGY, MODEL Data Warehouse / DHBWDaimler TSS 23 Lecture part 2: DWH Architectures Lecture part 3: DWH Data Modeling Architecture • Multi-Tier • Scalable • Supports NoSQL Methodology • Repeatable • Measureable • Agile Model • Flexible • Hash based • Hub & Spoke Implementation: Automation, Pattern based, High speed
  • 25. STRUCTURE HUB TABLES Data Warehouse / DHBWDaimler TSS 25
  • 26. HUB TABLES: TYPICAL CHARACTERISTICS Data Warehouse / DHBWDaimler TSS 26 Business Keys should be natural keys used by the business (e.g. Vehicle Identifier, Serial number) Business Keys should stand alone and have meaning to the business Business Keys should never change, have the same semantic meaning and the same granularity Must reference at least 1 SAT table
  • 27. TYING BUSINESS PROCESSES TO BUSINESS KEYS Time Procurement Sales $$ Revenue DeliveryContracts Finance Planning Manufacturing Customer Contact Finance Sales Procurement SLS123 SLS123 SLS123 *P123MFG *P123MFG Excel Spreadsheet Manual Process NO VISIBILITY! © Copyright 1990-2017, Dan Linstedt, all rights reserved
  • 28. Data Warehouse / DHBW 28Daimler TSS LINK Unique relationships between Business Keys (HUBs)
  • 29. STRUCTURE LINK TABLES Data Warehouse / DHBWDaimler TSS 29
  • 30. LINK TABLES: TYPICAL CHARACTERISTICS Data Warehouse / DHBWDaimler TSS 30 A LINK models a relationship between 2 or more HUBs The relationship is always n:m The composed key must be unique. One of the foreign keys is driving key Link to Link allowed but should be avoided in a physical implementation due to load dependency
  • 31. • Relationships / Associations • Foreign Keys in OLTP systems • Hierarchies and Redefinitions • Hierarchical relationships are modeled by one link and two connections to HUBs: HAL (parent-child LINK) and SAL (same-as LINK) • Transactions and events are often modeled as link (could also be a Hub) • E.g. sales order or sensor data • Intensive discussions about modeling as Hub or Link on conferences or social media (modeling solution depends from requirements, context, etc) CANDIDATES FOR LINKS Data Warehouse / DHBWDaimler TSS 31
  • 32. Data Warehouse / DHBW 32Daimler TSS SAT Descriptive, detailled, current and historized data
  • 33. STRUCTURE SAT TABLES Data Warehouse / DHBWDaimler TSS 33
  • 34. SAT TABLES: TYPICAL CHARACTERISTICS Data Warehouse / DHBWDaimler TSS 34 Contains all non-key attributes Is connected to exactly one Hub or Link HUB or LINK tables can (should) have several SAT tables, e.g. by source system Can contain in the extreme case one column only (or any number of columns)
  • 35. Different criteria to design SAT tables (separate data into different SAT tables) • Source system • Rate of change • Data types (e.g. separate CLOBS or other lengthy textual fields) SAT TABLE DESIGN Data Warehouse / DHBWDaimler TSS 35
  • 36. SAT TABLE DESIGN Rate of change in order to avoid redundant storage of data Data Warehouse / DHBWDaimler TSS 36 Data that change oftenData that do not change Delivery date SW-Version Controlunit Theft message Color CommentsInterior
  • 37. EXERCISE DATA VAULT The following data model shows vehicle sales with entities • Person (sales_person and owner) • Vehicle • Production_plant Architect a Data Vault model for the Core Warehouse Layer Data Warehouse / DHBWDaimler TSS 37
  • 38. SAMPLE SOLUTION DATA VAULT Data Warehouse / DHBWDaimler TSS 38
  • 39. DIMENSIONAL DATA MODELING IN THE MART LAYER: ROLAP AND MOLAP
  • 40. DATA MODELS IN THE DWH Data Warehouse / DHBWDaimler TSS 40 Layer Characteristics Data Model Staging Layer Temporary storage Ingest of source data Normally 1:1 copy of source table structure – usually without constraints and indexes Core Warehouse Layer Historization / bitemporal data Integration Tool-independent Non-redundant data storage Historization 3NF with historization Head and Version modelling Data Vault Anchor modeling Dimensional model with historization (possible) Data Mart Layer Performance for end user queries required, Tool-dependent Lots of joins necessary to answer complex questions Flat structures, esp. Dimensional model (ROLAP / MOLAP / HOLAP)
  • 41. Design technique to present data • in a standard, intuitive framework • Easily understandable for end users • Logical data model • Physical data model: Not necessarily relational • for high performance end user access • Analysis / Reporting of numerical measures (metrics) by different attributes DIMENSIONAL MODELING Data Warehouse / DHBWDaimler TSS 41
  • 42. DIMENSIONAL MODEL – IMPLEMENTATION TYPES Data Warehouse / DHBWDaimler TSS 42 Implementation types of dimensional models Star Schema = Relational model (ROLAP) consists of •Fact Tables •Dimension Tables Cube = Multidimensional model (MOLAP) consists of •Edges = Attributes •Cells = Measures (facts)
  • 43. Dimensions • Are entities that contain descriptive textual attributes for analysis • E.g. Car (model, manufacturer, etc), Time period (day, week, month, year) Facts • Contain key numerical figures – “Measures” – “Metrics” • E.g. Sales amount (for dimensions: product X in region y and time period z) DIMENSIONAL MODEL Data Warehouse / DHBWDaimler TSS 43
  • 44. DIMENSIONAL MODEL – LOGICAL VIEW Data Warehouse / DHBWDaimler TSS 44 Sales Inven toryStock #Items Price Store City Country Customer Product Productgroup Day Month Year Measure Fact table / Cube Dimension
  • 45. SAMPLE PRODUCT HIERARCHY Dimensions can be organized in hierarchies • i.e. product hierarchy Data Warehouse / DHBWDaimler TSS 45
  • 46. Other hierarchies: • Date Month/Year Quarter/Year Year • Customer Company Industry • City County/Landkreis State Country Continent Arbitrary number of hierarchy levels Purpose: • group and structure data • enable view on data at different levels of granularity • Hierarchies define aggregations on measures HIERARCHIES Data Warehouse / DHBWDaimler TSS 46
  • 47. ROLAP
  • 48. Physical data structure: relational tables • Advantage: can use well-engineered, reliable and high-performance database systems and query languages Special table structure • Star / Snowflake Schema • Dimension tables with textual attributes • Fact table with measures consisting of foreign keys to dimension tables ROLAP Data Warehouse / DHBWDaimler TSS 48
  • 49. Special table structure (continued) • Memory amount depends mainly on the number of facts • One row per fact • Size of a row approx. (#dimensions + #measures) * column size • Aggregated totals are computed dynamically in general • Longer response times ROLAP Data Warehouse / DHBWDaimler TSS 49
  • 50. Dimensions • Relational table for each dimension like product, region, time period • Primary key (surrogates) identifies each dimension element • Additional fields contain descriptive information like product name • E.g. Dimensions: Product, Region, Time period (day, week, month, year) Facts • Relational table containing key figures – “Measures” • Stores foreign keys to dimension tables • The other fields contain the values of the key figures/measures • E.g. Sales amount (for product X in region y and time period z) RELATIONAL DATA MODEL Data Warehouse / DHBWDaimler TSS 50
  • 51. RELATIONAL MODEL: STAR SCHEMA Data Warehouse / DHBWDaimler TSS 51 Sales Fact Time_key (FK) Product_key (FK) Location_key (FK) Branch_key (FK) Sales_amout Discount Time Dimension Time_key (PK) Date Day Month Quarter Year Product Dimension Product_key (PK) Product_name Supplier_Name Branch Dimension Branch_key (PK) Branch_name Location Dimension Location_key (PK) Street City Country n n n n
  • 52. Denormalized Dimensions • 1 Table with all hierarchy levels • Advantage:  • Efficient aggregations • Performance • Disadvantage: • Complex updates if hierarchies change DATA MODELS FOR HIERARCHIES Data Warehouse / DHBWDaimler TSS 52
  • 53. Normalized Dimensions • 1 table for each hierarchy level • Advantage:  • Minimal updates for changes in the hierarchies • Disadvantage: • More complex queries when computing aggregations • Multiple joins DATA MODELS FOR HIERARCHIES Data Warehouse / DHBWDaimler TSS 53
  • 54. RELATIONAL MODEL: SNOWFLAKE SCHEMA WITH DENORMALIZED DIMENSIONS Data Warehouse / DHBWDaimler TSS 54 Sales Fact Time_key (FK) Product_key (FK) Location_key (FK) Branch_key (FK) Sales_amout Discount Time Dimension Time_key (PK) Date Day Month Quarter Year Product Dimension Product_key (PK) Product_name Supplier_Key (FK) Branch Dimension Branch_key (PK) Branch_name Location Dimension Location_key (PK) Street City_key (FK) City Dimension City_key (PK) City Country_Key (FK) Supplier Dimension Supplier_key (PK) Supplier_Name Country Dimension Country_key (PK) Country n n n n n n n
  • 55. EXERCISE STAR SCHEMA The following data model shows vehicle sales with entities • Person (sales_person and owner) • Vehicle • Production_plant Architect a Star Schema for the Data Mart Layer Data Warehouse / DHBWDaimler TSS 55
  • 56. SAMPLE SOLUTION STAR SCHEMA Data Warehouse / DHBWDaimler TSS 56
  • 57. Used for accelerating data warehouse queries in general • Precomputation of aggregated values • Materialized views / query tables store data physically • Relational Columnar (in-memory) databases ROLAP ENHANCEMENTS Data Warehouse / DHBWDaimler TSS 57
  • 58. Query processing in the Mart Layer • SQL statements can become complex, e.g. many joins • SQL statements can become slow if many rows are aggregated • E.g. sum of sales amount for city X AND product Y AND year 2016 compared to city X AND product Y AND year 2015 • If aggregated values are stored in Fact tables, new data from the Core Warehouse layer have to be integrated into such aggregated fact tables PRECOMPUTATION OF AGGREGATED TOTALS Data Warehouse / DHBWDaimler TSS 58
  • 59. The DBMS takes care of solving these problems • The user defines views containing aggregated values for certain hierarchy levels • These views are materialized as tables • Update options • immediate • deferred • When performing a query against a fact table the DB optimizer takes advantage of these materialized views, i.e., no special queries have to be written for this by a user or application program • The user has not to rewrite the original query to use the materialized views MATERIALIZED VIEWS/QUERY TABLES Data Warehouse / DHBWDaimler TSS 59
  • 60. Example statement Oracle to precompute values (similar DB2 and other RDBMS) CREATE MATERIALIZED VIEW sales_agg BUILD IMMEDIATE REFRESH FAST ON DEMAND AS SELECT p.productname, s.city, EXTRACT(MONTH FROM s.date) , sum(s.sales_amount) , sum(no_items) FROM product p JOIN sales s ON p.productid = s.productid GROUP by p.productname, s.city, EXTRACT(MONTH FROM s.date); MATERIALIZED VIEWS / MATERIALIZED QUERY TABLES Data Warehouse / DHBWDaimler TSS 60
  • 61. Row-oriented storage • Data of a relational table is stored row wise: <values of Row 1><values of Row 2> … <values of Row N> Column-oriented storage • The values of each column are stored separately: <values of Column 1><values of Column 2> … <values of Column M> RELATIONAL COLUMNAR DATABASES Data Warehouse / DHBWDaimler TSS 61
  • 62. ROW AND COLUMN ORIENTED DB BLOCK STORAGE Data Warehouse / DHBWDaimler TSS 62 Id Name Birthdate 1 Bush 1967 2 Schmitt 1980 3 Bush 1993 4 Berger 1980 5 Miller 1967 6 Bush 1970 7 Miller 1980 Column-oriented storage 1, Bush, 1967, 2 Schmitt, 1980, 3 Bush, 1993, 4, Berger, 1980, 5 Miller, 1967, 6, Bush, 1970, … 1, 2, 3, 4, 5, 6, 7, … Bush, Schmitt, Bush, Berger, Miller, Bush, Miller, … Row-oriented storage DB-Page/Block
  • 63. Row-oriented storage • Data of one row is grouped on disk and can be retrieved through one read operation • Single values can be retrieved through efficient index and off-set computations • Good Insert, update and delete operations performance • Suited for OLTP systems ROW VS COLUMN ORIENTED STORAGE Data Warehouse / DHBWDaimler TSS 63
  • 64. Column-oriented storage • Data-of one column is grouped on disk and can be retrieved with far less read operations than for row-oriented storage • This makes computation of aggregations much faster in particular for tables with a lot of columns • In general better suited for queries involving partial table scans • Bad Insert, update and delete operations performance • Normally excellent compression as identical data types are stored in same blocks • Products: SAP HANA, HP Vertica, Exasol, IBM DB2 BLU, Oracle In-Memory Option, SQL Server (Columnar Indexes), etc • Suited for OLAP systems ROW VS COLUMN ORIENTED STORAGE Data Warehouse / DHBWDaimler TSS 64
  • 65. Data changes, e.g. • new employees • employees change departments • employees leave • whole department reorganisations, etc How are the changes handled? Insert-only approach in the Core Warehouse Layer, but choices in the Mart Layer (reduce data amount to what end user needs) • What does the business want to see? (Reporting Scenarios) • How is data inserted / updated in dimensions? (Slowly Changing Dimensions) HOW TO COVER DATA CHANGES IN THE MART? Data Warehouse / DHBWDaimler TSS 65
  • 66. • As-is scenario • As-of scenario • As-posted scenario • As-posted with comparable data scenario REPORTING SCENARIOS Data Warehouse / DHBWDaimler TSS 66
  • 67. DATA MART – EXAMPLE BASELINE Data Warehouse / DHBWDaimler TSS 67 Employee Organisation Miller DWH Rogers DWH Douglas Database Powell Database EmployeeDimension2015 Employee Organisation Miller DWH Rogers DWH Powell DWH Douglas Database Bush Database EmployeeDimension2016 Employee Year #Pro- jects Miller 2015 10 Rogers 2015 10 Douglas 2015 10 Powell 2015 10 Miller 2016 10 Rogers 2016 10 Powell 2016 10 Douglas 2016 10 Bush 2016 10 FactsAssumption: current year: 2016 New employee Other department
  • 68. Reporting uses current structure AS-IS SCENARIO Data Warehouse / DHBWDaimler TSS 68 Employee Organisation Miller DWH Rogers DWH Powell DWH Douglas Database Bush Database EmployeeDimension2016 Employee Year #Pro- jects Miller 2015 10 Rogers 2015 10 Douglas 2015 10 Powell 2015 10 Miller 2016 10 Rogers 2016 10 Powell 2016 10 Douglas 2016 10 Bush 2016 10 Facts Organisation #Projects ´15 #Projects ´16 DWH 30 30 Database 10 20
  • 69. Reporting uses structure as demanded e.g. requested for 2015 AS-OF SCENARIO Data Warehouse / DHBWDaimler TSS 69 Employee Organisation Miller DWH Rogers DWH Douglas Database Powell Database EmployeeDimension2015 Employee Year #Pro- jects Miller 2015 10 Rogers 2015 10 Douglas 2015 10 Powell 2015 10 Miller 2016 10 Rogers 2016 10 Powell 2016 10 Douglas 2016 10 Bush 2016 10 Facts Organisation #Projects ´15 #Projects ´16 DWH 20 20 Database 20 20
  • 70. Reporting uses „historical truth“ AS-POSTED SCENARIO Data Warehouse / DHBWDaimler TSS 70 Organisation #Projects ´15 #Projects ´16 DWH 20 30 Database 20 20
  • 71. AS-POSTED WITH COMPARABLE DATA SCENARIO Data Warehouse / DHBWDaimler TSS 71 Reporting uses „historical truth“ for identical dimension data Organisation #Projects ´15 #Projects ´16 DWH 20 20 Database 10 10
  • 72. Dimensions must absorb changes Slowly changing dimensions according to Kimball / Ross (2002): • SCD Type 0 • no changes, new data is ignored • SCD Type 1 - 3 • See next slides • And some more SCD types • Rarely relevant SLOWLY CHANGING DIMENSIONS Data Warehouse / DHBWDaimler TSS 72
  • 73. Changes: • New data added: Albert, DWH • Powell marries and has new name Parker SLOWLY CHANGING DIMENSIONS – EXAMPLE BASELINE Data Warehouse / DHBWDaimler TSS 73 ID Employee Organisation 1 Miller DWH 2 Powell Database Employee Dimension
  • 74. • No History • Dimension attributes always contain current data Changes: • New data added: Albert, DWH • Powell marries and has new name Parker SLOWLY CHANGING DIMENSION TYPE 1 Data Warehouse / DHBWDaimler TSS 74 Employee Dimension ID Employee Organisation 1 Miller DWH 2 Parker Database 3 Albert DWH Employee Dimension ID Employee Organisation 1 Miller DWH 2 Powell Database
  • 75. • Full Historization • Dimension contains timestamps with NULLs or future date like 31.12.2999 Changes: • New data added: Albert, DWH • Powell marries and has new name Parker SLOWLY CHANGING DIMENSION TYPE 2 Data Warehouse / DHBWDaimler TSS 75 Employee Dimension I D Emplo yee Organisa tion Valid From Valid To 1 Miller DWH 01.01.2015 NULL 2 Powell Database 21.12.2014 15.10.2016 3 Albert DWH 05.03.2014 NULL 2 Parker Database 15.10.2016 NULL Employee Dimension ID Employee Organisation 1 Miller DWH 2 Powell Database
  • 76. • Historization of latest change only • And storage of current value Changes: • New data added: Albert, DWH • Powell marries and has new name Parker SLOWLY CHANGING DIMENSION TYPE 3 Data Warehouse / DHBWDaimler TSS 76 Employee Dimension I D Employee Name Previous Name Organisa tion Previous Organisation 1 Miller NULL DWH NULL 2 Parker Powell Database NULL 3 Albert NULL DWH NULL Employee Dimension ID Employee Organisation 1 Miller DWH 2 Powell Database
  • 77. • Conformed dimension • Junk dimension • Role-Playing dimension • Degenerated dimension • Transactional fact • Periodic fact • Accumulating fact DIMENSION AND FACT TABLE TYPES Data Warehouse / DHBWDaimler TSS 77
  • 78. • Dimension that is used in several fact tables • Fact tables can be connected by using conformed dimensions DIMENSION TYPES: CONFORMED DIMENSION Data Warehouse / DHBWDaimler TSS 78 Sales Fact Inventory Fact Product Dimension Location Dimension
  • 79. Kimball: Enterprise DWH Bus Matrix is a “design tool” to document the organization’s processes DIMENSION TYPES: CONFORMED DIMENSION Data Warehouse / DHBWDaimler TSS 79 Date Product Location Customer Promotion Sales Fact X X X X X Inventory Fact X X X Customer Returns Fact X X X X Sales Forecast Fact X X X
  • 80. Collection of lookup data / codes that could also form it’s own dimension DIMENSION TYPES: JUNK DIMENSION Data Warehouse / DHBWDaimler TSS 80 ID MartialStatus Gender 1 Single Male 2 Single Female 3 Married Male 4 Married Female
  • 81. A single dimension is referenced several times by the same fact table • E.g. several dates in fact table reference Date Dimension DIMENSION TYPES: ROLE-PLAYING DIMENSION Data Warehouse / DHBWDaimler TSS 81 ID OrderDate DeliveryDate ProductionDate 1 .. .. .. 2 .. .. .. 3 .. .. .. 4 .. .. ..
  • 82. • A dimension without own dimension table. Data are stored in the fact table only. • Used e.g. for drill-through in reports • E.g. OrderNumber in sales fact table DIMENSION TYPES: DEGENERATED DIMENSION Data Warehouse / DHBWDaimler TSS 82 ID OrderNumber 1 A51273 .. .. 2 72841 .. .. 3 732GT5 .. .. 4 624TR5K .. ..
  • 83. Transactional • Most common • Usually one row per line/event in a transaction • Most detailed level • The grain must (should) be the same for all rows • Measures can usually be aggregated: “additive measure” (e.g. sum over sales amount) • E.g. fact table for sales data TYPES OF FACT TABLES - TRANSACTIONAL Data Warehouse / DHBWDaimler TSS 83
  • 84. Periodic snapshots • Picture of the time • Often computed from transactional fact table, e.g. aggregated by month • Measures can usually not be aggregated (e.g. sum over inventory does not make sense as inventory is already snapshot / sum for a day) • The grain must (should) be the same for all rows • E.g. fact table for inventory data (summed up for each day) TYPES OF FACT TABLES - PERIODIC Data Warehouse / DHBWDaimler TSS 84
  • 85. Accumulating snapshots • Shows activity of a process/event over time • The data is not complete at the beginning and is updated as soon as new data arrived (e.g. delivery date can be unknown at the beginning) • The grain must (should) be the same for all rows • E.g. fact table for processing an order TYPES OF FACT TABLES - ACCUMULATING Data Warehouse / DHBWDaimler TSS 85
  • 86. How many cabriolets (D_Model.model) have been Built in January and February 2016? Assume SCD1 and no history in fact tables EXERCISE: QUERIES 1 Data Warehouse / DHBWDaimler TSS 86 Count 01/2016 02/2016
  • 87. How many cabriolets (D_Model.model) have been Built in January and February 2016? SELECT d.month, d.year, sum(f.count) FROM f_vehicle_built f JOIN d_model m on m.modelid = f.modelid JOIN d_production_date d on d.prod_date = f.prod_date WHERE m.model = ‘Cabriolet‘ AND d.month IN (1, 2) AND d.year = 2016 GROUP BY d.month, d.year EXERCISE: QUERIES 1 Data Warehouse / DHBWDaimler TSS 87
  • 88. How many different models (D_Model.model) have Currently a performance of 105PS (D_ENGINE.performance)? Assume SCD1 and no history in fact tables EXERCISE: QUERIES 2 Data Warehouse / DHBWDaimler TSS 88 Model Count Cabriolet SUV …
  • 89. How many different models (D_Model.model) have Currently a performance of 105PS (D_ENGINE.performance)? Select m.model, sum(f.count) FROM f_vehicle_built f JOIN d_model m on m.modelid = f.modelid JOIN d_engine e on e.engineid = engineid WHERE e.performance = 105 GROUP BY m.model EXERCISE: QUERIES 2 Data Warehouse / DHBWDaimler TSS 89
  • 90. How many different models (D_Model.model) have Currently a performance of 105PS (D_ENGINE.performance)? EXERCISE: QUERIES 3 Data Warehouse / DHBWDaimler TSS 90 Model Count Cabriolet SUV …
  • 91. How many different models (D_Model.model) have Currently a performance of 105PS (D_ENGINE.performance)? CREATE VIEW v_vehicle_sat as SELECT h_vehicle_key, max(loaddate), model FROM s_vehicle_base GROUP BY h_vehicle_key; CREATE VIEW v_engine_sat as SELECT h_engine_key, max(loaddate), performance FROM s_engine GROUP BY h_engine_key; EXERCISE: QUERIES 3 Data Warehouse / DHBWDaimler TSS 91
  • 92. How many different models (D_Model.model) have Currently a performance of 105PS (D_ENGINE.performance)? SELECT model, count(*) FROM v_vehicle_sat v JOIN l_plugged_into l ON l.h_vehicle_key = v.h_vehicle_key JOIN v_engine_sat e ON l.h_engine_key = e.h_engine_key JOIN s_engine s ON s.h_engine_key = e.h_engine_key AND s.loaddate = e.loaddate WHERE s.performance = 105 GROUP by model; EXERCISE: QUERIES 3 Data Warehouse / DHBWDaimler TSS 92 Many other solutions possible, e.g. using with clause instead of views or using window functions – all depending from DB vendor/version
  • 93. MOLAP
  • 94. Edges of a cube (“Dimension”) • Attributes like Product, Region, Time period (day, week, month, year) Cells of a cube (“Measures”) • Key Figures (i.e. sales amount, profit) – “measures” • For every combination of attribute values one value of each key figure, e.g. Sales amount for product X in region y and time period z • Can be NULL and is stored as empty cell MULTIDIMENSIONAL DATA MODEL Data Warehouse / DHBWDaimler TSS 94
  • 95. A database specially designed to handle the organization of data in multiple dimensions • Good for DWH requirements only but not generally suited like a relational DBMS • E.g. IBM Cognos TM1, Oracle Essbase, Microsoft Analysis Services, Oracle OLAP Option, IBM Cognos Powerplay Holds data cells in blocks that constitute a virtual cube Optimized to handle numeric data • Aggregated totals often precalculated • Not intended for textual data MOLAP - MULTIDIMENSIONAL DATABASES Data Warehouse / DHBWDaimler TSS 95
  • 96. Linearization of the cells in a cube into a one-dimensional array Memory amount: #(dim1) x #(dim2) x ... x #(dimN) Depends on the number of dimensions and their cardinality, not on the number of facts Example: • Cube with 2 dimensions with 3 and 1 dimension with 2 elements • Memory amount = size = 3*3*2 = 18 cells • The numbers in the cube cells indicate the position in the array MULTIDIMENSIONAL STORAGE Data Warehouse / DHBWDaimler TSS 96
  • 97. Cube with 3 dimensions • Product – 4 values – p1, p2, p3, p4 • Store – 3 values – s1, s2, s3 • Time (year) – 2 values - y1, y2 Number of cells in the cube: 4 x 3 x 2 = 24 EXAMPLE Data Warehouse / DHBWDaimler TSS 97
  • 98. Sales in year y2 EXAMPLE Data Warehouse / DHBWDaimler TSS 98
  • 99. Sales of store s1 in year y2 EXAMPLE Data Warehouse / DHBWDaimler TSS 99
  • 100. Sales of product p2 in year y1 EXAMPLE Data Warehouse / DHBWDaimler TSS 100
  • 101. Selection • Definition of a filter • Select data of a single cell with a condition for each dimension • For instance: • time = 'January 2006' • location = 'Stuttgart' • product = ‘ThinkPad T60' MULTIDIMENSIONAL OPERATIONS - SELECTION Data Warehouse / DHBWDaimler TSS 101
  • 102. EXAMPLE SELECTION Data Warehouse / DHBWDaimler TSS 102
  • 103. Slice • Definition of a filter • Condition for one single dimension • Select a new cube with one fewer dimension • For instance • Product = ‘ThinkPad T60' MULTIDIMENSIONAL OPERATIONS - SLICE Data Warehouse / DHBWDaimler TSS 103
  • 104. EXAMPLE SLICE Data Warehouse / DHBWDaimler TSS 104
  • 105. Dice • Definition of intervals/sets as filter • Pick specific values of multiple dimensions • Select a smaller cube • Conditions for instance • time = 1st quarter (January, February, March) • location = region south (Stuttgart, Frankfurt, Munich) MULTIDIMENSIONAL OPERATIONS - DICE Data Warehouse / DHBWDaimler TSS 105
  • 106. EXAMPLE DICE Data Warehouse / DHBWDaimler TSS 106
  • 107. Rotate/Pivot • Rotate cube along its axes • Get different view on data cube • # of views on cube = (# of dimensions)! • 2 dimensions, 2 views (2! = 2*1) • 3 dimensions, 6 views (3! = 3*2*1) • 4 dimensions, 24 views (4! = 4*3*2*1) • ... MULTIDIMENSIONAL OPERATIONS – ROTATE/PIVOT Data Warehouse / DHBWDaimler TSS 107
  • 108. EXAMPLE ROTATE/PIVOT Data Warehouse / DHBWDaimler TSS 108
  • 109. Roll-up & Drill-down • Prerequisites: • Hierarchies defined • Aggregated data for all hierarchy levels available • Roll up: change hierarchy level "upwards": • get less detailed data (= higher aggregation) • Drill down: change hierarchy level "downwards": • get more detailed data (= lower aggregation) MULTIDIMENSIONAL OPERATIONS – ROLL-UP/DRILL- DOWN Data Warehouse / DHBWDaimler TSS 109
  • 110. EXAMPLE ROLL-UP & DRILL-DOWN Data Warehouse / DHBWDaimler TSS 110
  • 111. ROLAP = SQL is standard language MOLAP = MDX - Multidimensional Expressions • De-facto industry standard developed by Microsoft • Very complex • SQL like syntax • Language elements • Scalar – data type „string“ or „number“ • Dimension, Hierarchy, Level, Member • … MDX - OLAP QUERY LANGUAGE Data Warehouse / DHBWDaimler TSS 111
  • 112. SELECT { [Measures].[Store Sales] } ON COLUMNS, { [Date].[2002], [Date].[2003] } ON ROWS FROM Sales WHERE ( [Store].[USA].[CA] ) This query defines the following result set information: • The SELECT clause sets the query axes as the Store Sales (amount) member and the 2002 and 2003 members of the Date dimension • The FROM clause indicates that the data source is the Sales cube • The WHERE clause defines the "slicer axis" for member California of Store dimension MDX SAMPLE QUERY Data Warehouse / DHBWDaimler TSS 112 Store Sales 2002 95863,66 2003 99764,01
  • 113. MOLAP - ROLAP MOLAP ROLAP Database type Multidimensional Relational Data storage Special storage engines for cube data Star schema – special relational data model Size 100s of Gigabytes 10s of Terabytes Query language MDX SQL Data Warehouse / DHBWDaimler TSS 113
  • 114. MOLAP - ROLAP MOLAP ROLAP Advantages • special database products optimized for multidimensional analysis • short response times, e.g. no joins • suitable storage schema and query processing for multidimensional data • can use existing, well established DBMS • easy data import, update • user access, backup, security mechanisms from DBMS can be used Disadvantages • problems with sparsity (ratio occupied / not occupied cells): "null" is stored in a field with same length as any value • limited data volume: 5-6 dimensions • cube data read-only accessible only for end users • expensive update operation • Complex SQL queries for processing OLAP requests longer response times (solution: Materialized Views and In- memory columnar databases) Data Warehouse / DHBWDaimler TSS 114
  • 115. Combines the advantages or ROLAP and MOLAP Relational DBMS for storage of sparse, historic data • Data of highest granularity level Multidimensional DBMS for efficient storage of dense data cubes • Multidimensional cache for aggregated totals Complex architecture and maintenance processes No uniform OLAP query processing HOLAP – HYBRID OLAP Data Warehouse / DHBWDaimler TSS 115
  • 116. The following is a data model used by a supermarket chain to analyze their business: EXERCISE: OLAP Data Warehouse / DHBWDaimler TSS 116
  • 117. With each transaction, an average of 20 different articles are bought. The data warehouse collects sales transactions data over 2 years. There are 1000 stores with 2000 transactions per store and day. Questions: • 1.  What are the columns of the ROLAP fact table? • 2. How many records are stored in the fact table? • 3.  What is the size of the cube (number of cells) that stores the aggregated values at the most detailed level? • 4.  Compute the respective cube sizes for the other 3 (higher) hierarchy levels. EXERCISE: OLAP Data Warehouse / DHBWDaimler TSS 117
  • 118. 1.  What are the columns of the ROLAP fact table? • Trans. No. (FK to dimension) • Date (FK to dimension) • Location (FK to dimension) • Article (FK to dimension) • No of articles (measure) and Article Price (measure) 2. How many records are stored in the fact table? • One record per transaction and article (with quantity and price) • 2 years * 365 days/year * 1000 stores * 2000 transactions/(store*day)* 20 articles/transaction = 29.200.000.000 articles/records EXERCISE: OLAP Data Warehouse / DHBWDaimler TSS 118
  • 119. 3.  What is the size of the cube (number of cells) that stores the aggregated values at the most detailed level? • 2 years * 365 [days]/year * 2000 [transactions] * 1000 [stores] * 50000 [articles] = 73.000.000.000.000 cells 4.  Compute the respective cube sizes for the other 3 hierarchy levels. • Level 2: 2 years * 12 [months]/year * 500 [cities] * 2000 [product groups] = 24.000.000 cells • Level 3: 2 years * 4 [quarters]/year * 20 [regions] * 200 [product categories] = 32.000 cells • Level 4: 2 [years] * 5 [regions] * 10 [product departments] = 100 cells EXERCISE: OLAP Data Warehouse / DHBWDaimler TSS 119
  • 120. • Data modeling in the Core Warehouse Layer • Choices like Data Vault • Data modeling in the Mart Layer • Dimensional Modeling • ROLAP (Star Schema with fact and dimension tables) • MOLAP (Cubes) SUMMARY Data Warehouse / DHBWDaimler TSS 120
  • 121. Daimler TSS GmbH Wilhelm-Runge-Straße 11, 89081 Ulm / Telefon +49 731 505-06 / Fax +49 731 505-65 99 tss@daimler.com / Internet: www.daimler-tss.com/ Intranet-Portal-Code: @TSS Domicile and Court of Registry: Ulm / HRB-Nr.: 3844 / Management: Christoph Röger (CEO), Steffen Bäuerle Data Warehouse / DHBWDaimler TSS 121 THANK YOU
  • 122. OnLine Analytical Processing • Term introduced by E. Codd in 1993 in a white paper for Arbor Essbase • 12 criteria for OLAP systems like • Multi-dimensionality • Transparency • Constant response-times • Multi-user support • Flexible definition of reports • No limits on dimensions and hierarchy levels OLAP – 12 CRITERIA BY CODD Data Warehouse / DHBWDaimler TSS 122
  • 123. FASMI – Fast Analysis of Shared Multidimensional Information Criteria by Pendse/Creeth (1995) • Fast • maximum response time for regular queries 5 seconds and complex queries not more 20 seconds • Analysis • intuitive analysis, easy/no programming • flexible: queries may contain arbitrary computations OLAP – FASMI CRITERIA Data Warehouse / DHBWDaimler TSS 123
  • 124. • Shared • Multi user capable: Shared usage and access control • Multidimensional • Multidimenional view on the data • regardless of the underlying data model • Full support of hierarchies • Information • User must be able to get all data without restrictions by the used OLAP system, no restriction in regards to scalability OLAP – FASMI CRITERIA Data Warehouse / DHBWDaimler TSS 124

Notas do Editor

  1. Mission: Wir sind Spezialist und strategischer Business-Partner für innovative IT-Gesamtlösungen im Daimler-Konzern – not just another supplier! more than another supplier!    
  2. Jedes Attribut der Relation muss einen atomaren Wertebereich haben, und die Relation muss frei von Wiederholungsgruppen sein. 
  3. Eine Relation ist in der zweiten Normalform genau dann, wenn die erste Normalform vorliegt und kein Nichtprimärattribut (Attribut, das nicht Teil eines Schlüsselkandidaten ist) funktional von einer echten Teilmenge eines Schlüsselkandidaten abhängt. Anders gesagt: Jedes nicht-primäre Attribut (nicht Teil eines Schlüssels) ist jeweils von allen ganzen Schlüsseln abhängig, nicht nur von einem Teil eines Schlüssels. Wichtig ist hierbei, dass die Nichtschlüsselattribute wirklich von allen Schlüsseln vollständig abhängen.
  4. Einfach gesagt: Ein Nichtschlüsselattribut darf nicht von einer Menge aus Nichtschlüsselattributen abhängig sein. Ein Nichtschlüsselattribut darf also nur direkt von einem Primärschlüssel (bzw. einem Schlüsselkandidaten) abhängig sein. Die dritte Normalform ist genau dann erreicht, wenn sich das Relationenschema in der 2NF befindet, und kein Nichtschlüsselattribut von einem Schlüsselkandidaten transitiv abhängt. Ein Attribut {\displaystyle A_{2}} ist vom Schlüsselkandidaten {\displaystyle P_{1}} transitiv abhängig, wenn es eine Attributmenge {\displaystyle A_{1}} gibt, sodass {\displaystyle (P_{1}\rightarrow A_{1})} und {\displaystyle (A_{1}\rightarrow A_{2})}.
  5. Documentation
  6. What are problems and their solutions?
  7. Procurement = Beschaffung Lösung dieses Problems is eine zentrale Anforderung DWH
  8. Non-additive or semi-additive measure
  9. 29.2Mrd
  10. 73Bio
  11. Multidimensionale konzeptionelle Sicht auf die Daten (wichtigstes Kriterium für OLAP) Transparenz (klare Trennung zwischen Benutzerschnittstelle und der zu Grunde liegenden Architektur) Zugriffsmöglichkeiten (Bezug der Basisdaten aus externen oder operationalen Datenbeständen) Konsistente Leistungsfähigkeit der Berichterstattung (möglichst schnelle Reportingfunktionalität) Client-Server-Architektur (auf den Verwendungszweck optimierte Lastverteilung) Generische Dimensionalität (alle Dimensionen in ihrer Struktur und Funktionalität einheitlich) Dynamische Handhabung dünn besetzter Matrizen (dynamische Speicherstrukturanpassung) Mehrbenutzerunterstützung Unbeschränkte dimensionsübergreifende Operationen Intuitive Datenanalyse (direkte Navigation innerhalb der Datenwürfel) Flexibles Berichtswesen (Ergebnisse im Report frei anordenbar) Unbegrenzte Anzahl von Dimensionen und Konsolidierungsebenen (15 bis 20 Dimensionen mit beliebig vielen Aggregationsstufen)
  12. Fast: Abfragen sollen durchschnittlich fünf Sekunden dauern dürfen. Dabei sollen einfache Abfragen nicht länger als eine Sekunde und nur wenige, komplexere Abfragen bis zu 20 Sekunden Verarbeitungszeit beanspruchen. Analysis: Ein OLAP-System soll jegliche benötigte Logik bewältigen können. Dabei soll die Definition einer komplexeren Analyseabfrage durch den Anwender mit wenig Programmieraufwand zu realisieren sein. Shared: Ein OLAP-System soll für den Mehrbenutzerbetrieb ausgelegt sein. Dies bedingt eine Verfügbarkeit geeigneter Zugriffsschutzmechanismen. Multidimensional: Als Hauptkriterium fordern Pendse und Creeth eine mehrdimensionale Strukturierung der Daten mit voller Unterstützung der Dimensionshierarchien. Information: Bei der Analyse sollen einem Anwender alle benötigten Daten transparent zur Verfügung stehen. Eine Analyse darf nicht durch Beschränkungen des OLAP-Systems beeinflusst werden.
  13. Fast: Abfragen sollen durchschnittlich fünf Sekunden dauern dürfen. Dabei sollen einfache Abfragen nicht länger als eine Sekunde und nur wenige, komplexere Abfragen bis zu 20 Sekunden Verarbeitungszeit beanspruchen. Analysis: Ein OLAP-System soll jegliche benötigte Logik bewältigen können. Dabei soll die Definition einer komplexeren Analyseabfrage durch den Anwender mit wenig Programmieraufwand zu realisieren sein. Shared: Ein OLAP-System soll für den Mehrbenutzerbetrieb ausgelegt sein. Dies bedingt eine Verfügbarkeit geeigneter Zugriffsschutzmechanismen. Multidimensional: Als Hauptkriterium fordern Pendse und Creeth eine mehrdimensionale Strukturierung der Daten mit voller Unterstützung der Dimensionshierarchien. Information: Bei der Analyse sollen einem Anwender alle benötigten Daten transparent zur Verfügung stehen. Eine Analyse darf nicht durch Beschränkungen des OLAP-Systems beeinflusst werden.