The document provides an introduction to Data Vault data modeling and discusses how it enables agile data warehousing. It describes the core structures of a Data Vault model including hubs, links, and satellites. It explains how the Data Vault approach provides benefits such as model agility, productivity, and extensibility. The document also summarizes the key changes in the Data Vault 2.0 methodology.
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
Agile Data Engineering: Introduction to Data Vault Data Modeling
1. KENT GRAZIANO
AGILE DATA ENGINEERING:
INTRODUCTION TO DATA VAULT DATA
MODELING
@KentGraziano kentgraziano.com
2. Agenda
2
Bio
What do we mean by Agile?
What is a Data Vault?
Where does it fit in an DW/BI architecture
How to design a Data Vault model
Being “agile” with Data Vault
What’s new in DV 2.0
3. My Bio
3
› Senior Technical Evangelist, Snowflake Computing
› Oracle ACE Director (BI/DW)
› Certified Data Vault Master and DV 2.0 Practitioner
› Data Modeling, Data Architecture and Data Warehouse
Specialist
• 30+ years in IT
• 25+ years of Oracle-related work
• 20+ years of data warehousing experience
› Member – DAMA Houston
› Former-Member: Boulder BI Brain Trust
(http://www.boulderbibraintrust.org/)
› Author & Co-Author of a bunch of books
• The Business of Data Vault Modeling
• The Data Model Resource Book (1st Edition)
› Blogger: The Data Warrior
› Past-President of Oracle Development Tools User Group
and Rocky Mountain Oracle User Group
4. Manifesto for Agile Software Development
4
“We are uncovering better ways of developing software by
doing it and helping others do it.
Through this work we have come to value:
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
That is, while there is value in the items on the right, we value
the items on the left more.”
http://agilemanifesto.org/
5. Applying the Agile Manifesto to DW
(C) Kent Graziano 5
User Stories instead of requirements documents
Time-boxed iterations
› Iteration has a standard length
› Choose one or more user stories to fit in that
iteration
Rework is part of the game
› There are no “missed requirements”... only those
that haven’t been delivered or discovered yet.
6. Data Vault Definition
TDAN.com Article 6
The Data Vault is a detail oriented, historical tracking and uniquely linked
set of normalized tables that support one or more functional areas of
business.
It is a hybrid approach encompassing the best of breed between 3rd normal
form (3NF) and star schema. The design is flexible, scalable, consistent
and adaptable to the needs of the enterprise.
Architected specifically to meet the needs of today’s
enterprise data warehouses
DAN LINSTEDT: Defining the Data Vault
7. What is Data Vault Trying to Solve?
(C) Kent Graziano 7
What are our other Enterprise Data Warehouse
options?
› Third-Normal Form (3NF): Complex primary keys (PK’s)
with cascading snapshot dates
› Star Schema (Dimensional): Difficult to reengineer fact
tables for granularity changes
Difficult to get it right the first time
Not adaptable to rapid business change
NOT AGILE!
9. Data Vault Evolution
(C) Kent Graziano 9
The work on the Data Vault approach began in the early 1990s, and completed around 1999.
Throughout 1999, 2000, and 2001, the Data Vault design was tested, refined, and deployed
into specific customer sites.
In 2002, the industry thought leaders were asked to review the architecture.
This is when I attend my first DV seminar in Denver and met Dan!
In 2003, Dan began teaching the modeling techniques to the mass public.
In 2014, Dan introduced DV 2.0!
15. 1. Hub = Business Keys
(C) Kent Graziano 15
Hubs = Unique Lists of Business Keys
Business Keys are used to TRACK and IDENTIFY key information
New: DV 2.0 uses MD5 Hash of the BK for the PK
16. 2: Links = Associations
(C) Kent Graziano 16
Links = Transactions and Associations
They are used to hook together multiple sets of information
In DV 2.0 the BK attributes may migrate to the Links for faster query
17. Modeling Links - 1:1 or 1:M?
(C) Kent Graziano 17
Today Tomorrow With a Link in The Data Vault
Relationship is
a 1:1 so why
model a Link?
The business rule
can change to a
1:M.
You discover new
data later.
No need to
change the EDW
structure.
Existing data is
fine.
New data is
added.
18. 3. Satellites = Descriptors
(C) Kent Graziano 18
Satellites provide context for the Hubs and the Links
Tracks changes over time - Like SCD 2
In DV 2.0 use HASH_DIFF to detect changes
19. Data Vault Model Flexibility (Agility)
(C) Kent Graziano 19
Goes beyond
standard 3NF
Based on natural
business keys
Hyper normalized
› Hubs and Links only hold keys and meta data
› Satellites split by rate of change and/or source
Enables Agile data modeling
› Easy to add to model without having to change existing
structures and load routines
• Relationships (links) can be dropped and created on-demand.
› No more reloading history because of a missed requirement
Not system surrogate keys
Allows for integrating data across functions
and source systems more easily
› All data relationships are key driven
20. Data Vault Extensibility
(C) LearnDataVault.com 20
Adding new
components to the
EDW has NEAR
ZERO impact to:
› Existing Loading
Processes
› Existing Data Model
› Existing Reporting &
BI Functions
› Existing Source
Systems
› Existing Star
Schemas and Data
Marts
21. Data Vault Productivity
(C) Kent Graziano 21
› Standardized modeling rules
• Highly repeatable and learnable modeling
technique
• Can standardize load routines
o Delta Driven process
o Re-startable, consistent loading patterns.
• Can standardize extract routines
o Rapid build of new or revised Data Marts
• Can be automated
• Can use a BI-meta layer to virtualize the
reporting structures
o Example: OBIEE Business Model and
Mapping tool
o Example: BOBJ Universe Business Layer
• Can put views on the DV structures as well
o Simulate ODS/3NF or Star Schemas
22. Data Vault Adaptability
(C) Kent Graziano 22
› The Data Vault holds granular
historical relationships.
• Holds all history for all time, allowing any
source system feeds to be reconstructed
on-demand
o Easy generation of Audit Trails for data
lineage and compliance.
o Data Mining can discover new
relationships between elements
o Patterns of change emerge from the
historical pictures and linkages.
› The Data Vault can be accessed by
power-users
23. Other Benefits of a Data Vault
(C) Kent Graziano 23
› Modeling it as a DV forces integration
of the Business Keys upfront
• Good for organizational alignment
› An integrated data set with raw data
extends it’s value beyond BI:
• Source for data quality projects
• Source for master data
• Source for data mining
• Source for Data as a Service (DaaS) in
an SOA (Service Oriented Architecture).
24. Other Benefits of a Data Vault
(C) Kent Graziano 24
› Upfront Hub integration simplifies the
data integration routines required to
load data marts.
• Helps divide the work a bit.
› It is much easier to implement security
on these granular pieces.
› Granular, re-startable processes enable
pin-point failure correction.
› It is designed and optimized for real-
time loading in its core architecture
(without any tweaks or mods).
25. How to be Agile using DV
(C) Kent Graziano 25
Model iteratively
› Use Data Vault data
modeling technique
› Create basic components,
then add over time
Virtualize the Access Layer
› Don’t waste time building
facts and dimensions up
front
ETL and testing takes too
long
› “Project” objects using
pattern-based DV model with
database views (or BI meta
layer)
Users see real reports with
real data
› Can always build out for
performance in another
iteration
28. Notably…
28
› In 2008 Bill Inmon stated that the
“Data Vault is the optimal approach for
modeling the EDW in the DW2.0
framework.” (DW2.0)
› The number of Data Vault users in the
US surpassed 500 in 2010 and grows
rapidly (http://danlinstedt.com/about/dv-
customers/)
29. Organizations using Data Vault
29
› WebMD Health Services
› Anthem Blue-Cross Blue Shield
› MD Anderson Cancer Center
› Denver Public Schools
› Independent Purchasing Cooperative
(IPC, Miami)
• Owner of Subway
› Kaplan
› US Defense Department
› Colorado Springs Utilities
› State Court of Wyoming
› Federal Express
› US Dept. Of Agriculture
32. Summary
32
Data Vault provides a data modeling technique that allows:
Model Agility Productivity So? Agile Data
Warehousing?
01 02 03
› Enabling rapid
changes and
additions
› Enabling low
complexity
systems with high
value output at a
rapid pace
› Easy projections
of dimensional
models
33. 33
› Available on Amazon:
http://www.amazon.com/Better-
Data-Modeling-Introduction-
Engineering-ebook /dp/
B018BREV1C/
Shameless Plug:
34. 34
› Available on Amazon.com
› Soft Cover or Kindle Format
› Now also available in PDF at
LearnDataVault.com
› Hint: Kent is the Technical Editor
Super Charge
Your Data Warehouse
35. 35
› Available on Amazon:
http://www.amazon.com/Buildin
g-Scalable-Data-Warehouse-
Vault/dp/0128025107/
New DV 2.0 Book