2. Agenda
What is a Data Warehouse?
What problem it solves?
Where does Dimensional Modeling fit in?
Basic concept of Dimensional Modeling
Foundation of Design Concepts
Q&A
3. Data Warehouse
“Data warehousing is the design and implementation
of processes, tools, and facilities to manage and
deliver complete, timely, accurate, and
understandable information for decision making. It
includes all the activities that make it possible for an
organization to create, manage, and maintain a data
warehouse or data mart.”
(IBM Data Modeling Techniques for Data Warehousing)
4. Data Warehouse Goals
Easy information access
Consistent information presentation
Adaptive and resilient to change
Information assets protection
Foundation for improved decision making
Acceptable by Business Community
(The Data Warehouse Toolkit)
5. Data Analysis Techniques
Query
Analyze
Discover
(IBM Data Modeling Techniques for Data Warehousing)
7. Data Presentation Area
Key Considerations…
Dimensional Model Vs Normalized Model
Global Data Warehouse Vs Independent Data Marts
Top-down Vs Bottom-up
Atomic Vs Summarized Data
(The Data Warehouse Toolkit)
8. Dimensional Model Components
A fact is a collection of related data
items, consisting of measures and context
data. A fact contains the information the
business is interested in
A dimension is a collection of members or
units of the same type of views. A
dimension is the window to the
information contained in the facts
9. Dimensional Model Myths
Dimensional models and data marts are for
summary data only.
Dimensional models and data marts are
departmental, not enterprise, solutions and
Dimensional models and data marts can’t be
integrated
Dimensional models and data marts are not scalable
Dimensional models and data marts are only
appropriate when there is a predictable usage
pattern
(The Data Warehouse Toolkit)
10. Dimensional Model Process
Select business process to model
Declare grain of the business process
Choose dimensions that apply to each fact table
row
Identify numeric facts that will populate each fact
table row
12. Design Concepts 1
Snow flake vs Star Schema
How many dimensions?
Degenerate Dimensions
Surrogate Keys
Null Keys Handling
Date Dimension and its Surrogate Key
Factless Fact Tables
(The Data Warehouse Toolkit)
13. Design Concepts 2
Periodic Snapshots
Semi-additive facts
Accumulating Snapshots
Bus Architecture
Conformed Dimensions
Slowly Changing Dimensions
Overwritingthe value
Adding Dimension Row
Adding Dimension Column
(The Data Warehouse Toolkit)
14. Design Concepts 3
Role Playing Dimensions
Junk Dimension (Indicators)
Fact Normalisation
Multiple Currencies
Currency Conversion Fact
Header & Line Facts (different granularity)
Multiple UOM
(The Data Warehouse Toolkit)
16. Design Concepts 5
Aggregated Facts as Attributes
Age Groups
Volume Buckets
Spend Buckets etc
Dimension Outriggers
Category Dimension (Start Date)
Time Intelligence
YTD, QTD, CY, LY, CM, LM, etc
(The Data Warehouse Toolkit)
17. More Design Concepts…
Partitioning
Rapidly changing dimensions
Bridge Tables (Variable Depth Hierarchies)
ClickStream Analysis
Audit Dimensions
Building Data Warehouse
Basket Analysis
(The Data Warehouse Toolkit)
18. References
Books:
The Data Warehouse Toolkit (Ralph Kimball, Margy Ross)
Mastering Data Warehouse Design (Wiley Press)
Building the Data Warehouse (W. H. Inmon)
Data Modeling Techniques for Data Warehousing (IBM Press)
Internet:
http://www.kimballgroup.com/html/designtips.html
http://www.inmoncif.com/home/
http://inmoninstitute.com/
Notas do Editor
“The type of analysis that will be done with the data warehouse can determinethe type of model and the model' s contents. Because query and reporting andmultidimensional analysis require summarization and explicit metadata, it isimportant that the model contain these elements. Also, multidimensionalanalysis usually entails drilling down and rolling up, so these characteristicsneed to be in the model as well. A clean and clear data warehouse model is arequirement, else the end users' tasks will become too complex, and end userswill stop trusting the contents of the data warehouse and the information drawnfrom it because of highly inconsistent results.Data mining, however, usually works best with the lowest level of detailavailable. Thus, if the data warehouse is used for data mining, a low level ofdetail data should be included in the model.”(IBM Data Modeling Techniques for Data Warehousing)
We will focus only on Data Presentation Area of Data Warehouse components
Emphasis is on intuitive and high performance retrieval of data, we will come back to this later on….
This is the core to dimensional modeling The first dimensional model built should be the one with the most impact Preferably you should develop dimensional models for the most atomic information captured by a business process.Percentages and ratios, such as gross margin, are nonadditive. The numerator and denominator should be stored in the fact table.
Extensibility of the dimensional modelNew dimension attributesNew dimensionsNew measured factsDimension becoming more granularAddition of a completely new data source involving existing dimensions as well as unexpected new dimensions
A very large number of dimensions typically is a sign that several dimensions are not completely independent and should be combined into a single dimension. It is a dimensional modeling mistake to represent elements of a hierarchy as separate dimensions in the fact table.> Data warehouses always need an explicit date dimension table. There are many date attributes not supported by the SQL date function, including fiscal periods, seasons, holidays, and weekends. Rather than attempting to determine these nonstandard calendar calculations in a query, we should look them up in a dateDimension table.> Drilling down in a data mart is nothing more than adding row headers from the dimension tables. Drilling up is removing row headers. We can drill down or up on attributes from more than one explicit hierarchy and with attributes that are part of no hierarchy. > You must avoid null keys in the fact table. A proper design includes a row in the corresponding dimension table to identify that the dimension is not applicable to the measurement. > Factless fact table > Degenerate Dimension -- Operational control numbers such as order numbers, invoice numbers, and bill-of lading numbers usually give rise to empty dimensions and are represented as degenerate dimensions (that is, dimension keys without corresponding dimension tables) in fact tables where the grain of the table is the document itself or a line item in the document.
We shouldn’t mix fact granularities (for example, order and order line facts) within asingle fact table. Instead, we need to either allocate the higher-level facts to a moredetailed level or create two separate fact tables to handle the differently grainedfacts. Allocation is the preferred approach. Optimally, a finance or business team(not the data warehouse team) spearheads the allocation effort.