SlideShare uma empresa Scribd logo
1 de 55
Baixar para ler offline
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Y O U R D A T A , N O L I M I T S
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Kent Graziano
Chief Technical Evangelist and Strategic Advisor
Agile Data Engineering
Introduction to Data Vault 2.0
@KentGraziano
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• Bio
• Agile & DW
• What is a Data Vault & Where does it fit?
• How to design a Data Vault model
• Foundational Keys
• Benefits of Data Vault
• Who is using Data Vault?
• References
Agenda
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
My Bio
› Chief Technical Evangelist, Snowflake Computing
› Blogger: The Data Warrior
› Certified Data Vault Master and DV 2.0 Practitioner
› Oracle ACE Director (Alumni)
› OakTable Member
› Member – DAMA Houston & DAMA International
› Data Modeling, Data Architecture and Data
Warehouse Specialist
› 30+ years in IT
› 25+ years of Oracle-related work
› 20+ years of data warehousing experience
› Former-Member: Boulder BI Brain Trust
(http://www.boulderbibraintrust.org/)
› Author & Co-Author of a bunch of books
› Past-President of ODTUG and Rocky Mountain Oracle
User Group
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
3 years in stealth + 3 years GA
600+ employees
Over 2000
customers today
Over $400M in venture
funding from leading
investors
First customers 2014,
general availability
2015
Founded 2012 by
industry veterans
with over 120
database patents
Queries processed in Snowflake per day:
Largest single table:
Largest number of tables single DB:
Single customer most data:
Single customer most users:
60 Million
68 Trillion Rows
200,000
> 40 PB
> 10,000
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
http://agilemanifesto.org
"We are uncovering better ways of developing software by doing it and helping others do it.
Through this work we have come to value:
★ Individuals and interactions over processes and tools
★ Working software over comprehensive documentation
★ Customer collaboration over contract negotiation
★ Responding to change over following a plan
That is, while there is value in the items on the right, we value the items on the left more.”
Manifesto for Agile Software Development
Kent Beck
Mike Beedle
Arie van Bennekum
Alistair Cockburn
Ward Cunningham
Martin Fowler
James Grenning
Jim Highsmith
Andrew Hunt
Ron Jeffries
Jon Kern
Brian Marick
Robert C. Martin
Steve Mellor
Ken Schwaber
Jeff Sutherland
Dave Thomas
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• User Stories instead of requirements documents
• Time-based iterations
• Iteration has a standard length
• Choose one or more user stories to fit in that iteration
• Rework is part of the game
• There are no “Missed requirements”…only those that haven’t been discovered yet
Applying the Agile Manifest to DW
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Data Vault Model Definition
The Data Vault is a detail oriented, historical tracking and uniquely linked
set of normalized tables that support one or more functional areas of
business.
It is a hybrid approach encompassing the best of breed between 3rd normal
form (3NF) and star schema. The design is flexible, scalable, consistent
and adaptable to the needs of the enterprise.
Architected specifically to meet the needs of today’s
enterprise data warehouses
Dan Linstedt: Defining the Data Vault
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• What are our other Enterprise
Data Warehouse Options?
• Third-Normal Form (3NF): Complex primary
keys (PK’s) with cascading snapshot
• Star Schema (Dimensional): Difficult to
reengineer fact tables for granularity
changes
• Difficult to get it right the first
time
• Not adaptable to rapid business
change
• NOT AGILE!
What is Data Vault Trying to Solve?
Data Vault Timeline
1960 1970 1980 1990 2000
E.F. Codd invented
relational modeling
Chris Date and Hugh
Harwin refined
modeling concepts 1976: Dr. Peter Chen
created E-R
diagramming
Mid 70’s: AC Nielsen
popularized
Dimension & Fact Terms
1990: Dan Linstedt begins
R&D on Data Vault
Modeling
Early 70’s: Bill Inmon
began discussing Data
Warehousing
Mid 60’s: Dimension & Fact modeling
presented by General Mills and
Dartmouth University
Late 80’s: Barry Devlin and
Dr. Kimball release
“Business Data Warehouse”
Mid 80’s: Bill Inmon
popularized Data
Warehousing
Mid-Late 80’s: Dr. Kimball
popularizes Star Schema
2000: Dan Linstedt release
first 5 articles on Data Vault
Modeling
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• The work on the Data Vault approach began in the early 1990s;
completed around 1999.
• Throughout 1999, 2000, and 2001, the Data Vault design was
tested, refined, and deployed into specific customer sites.
• In 2002, the industry thought leaders were asked to review the
architecture.
• Kent meets Dan at a Lunch & Learn in Denver!
• In 2003, Dan began teaching the modeling techniques to the mass
public.
• Kent & Team take their 1st Data Vault Modeling class
• In 2014, Dan introduced DV 2.0!
Data Vault Evolution
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
WHAT DOES THE DATA VAULT MODEL LOOK LIKE?
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Data Vault: 3 Simple Structures
HUB
LINK
SATELLITE
EDW
Data Vault
1
2
3
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Data Vault Core Architecture
HUBS
-------------------
Unique List of
Business Keys
LINKS
-------------------
Unique List of
Relationships
across Keys
SATS
-------------------
Descriptive
Data
● Satellites have one and only one parent table
● Satellites cannot be parents to other tables
● Hubs cannot be child tables
https://en.wikipedia.org/wiki/Data_vault_modeling#Basic_notions
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Standard Data Vault Model
Hub: List of UNIQUE business keys.
Link: List of UNIQUE relationships across keys
Satellite: Historical descriptive data.
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
1. Hub = Business Keys
Hubs = Unique Lists of Business Keys
Business Keys are used to TRACK and IDENTIFY key information
DV 2.0 uses MD5 Hash of the BK for the PK
Natural Business Keys also acceptable for PK
HUB_ORDERH
*P MD5_HUB_ORDER VARCHAR2 (32)
*U O_ORDERID INTEGER
* LDTS TIMESTAMP
* RSCR VARCHAR2 (256)
O_ORDERKEY INTEGER
HUB_ORDER_PK (MD5_HUB_ORDER)
HUB_ORDER__UN (O_ORDERID)
HUB_CUSTOMERH
*P MD5_HUB_CUSTOMER VARCHAR2 (32)
*U C_NAME VARCHAR2 (80)
LDTS TIMESTAMP
RSCR VARCHAR2 (256)
C_CUSTKEY INTEGER
HUB_CUSTOMER_PK (MD5_HUB_CUSTOMER)
HUB_CUSTOMER__UN (C_NAME)
HUB_SUPPLIERH
*P MD5_HUB_SUPPLIER VARCHAR2 (32)
*U S_NAME VARCHAR2 (80)
* LDTS TIMESTAMP
* RSCR VARCHAR2 (256)
* S_SUPPKEY INTEGER
HUB_SUPPLIER_PK (MD5_HUB_SUPPLIER)
HUB_SUPPLIER__UN (S_NAME)
HUB_PARTH
*P MD5_HUB_PART VARCHAR2 (32)
*U P_NAME VARCHAR2 (80)
*U P_BRAND VARCHAR2 (80)
*U P_TYPE VARCHAR2 (80)
*U P_SIZE INTEGER
*U P_CONTAINER VARCHAR2 (80)
* LDTS TIMESTAMP
* RSCR VARCHAR2 (256)
P_PARTKEY INTEGER
HUB_PART_PK (MD5_HUB_PART)
HUB_PART__UN (P_NAME, P_BRAND, P_TYPE, P_SIZE, P_CONTAINER)
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• MD5 Hash Function – Snowflake
• MD5 (UPPER(RTRIM(RMC.CAFCUSCHN)))
• MD5 hash function – Oracle
• rawtohex(sys.utl_raw.cast_to_raw(dbms_obfuscation_toolkit.md5 (input_string => ...)
• NEW: dbms_crypto.HASH(utl_raw.cast_to_raw(<input string>), 2);
• 2 is for MD5 algorithm option
• MD5 hash function - SQL Server
• CONVERT([Char](32),HASHBYTES('MD5’, UPPER(RTRIM(RMC.CAFCUSCHN))))
• Need to minimize chance of duplicates
• 12||3||45 and 1||2||345 hash to same value
• Need a separator between each
• Example: Col1||’^’||Col2||’^’||Col3
• Need to account for NULLs too
What Does the MD5 Look Like?
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• To generate most consistent string: standardize!
• Convert data types
• If 'NUMBER', 'VARCHAR', ‘TEXT’
• THEN 'TO_CHAR(' || column_name || ‘)‘
• If LIKE 'DATE%‘
• THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD’’)‘
• If LIKE 'TIME%‘
• THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD HH24:MI:SS'')'
Other Considerations
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Final Input String
(
UPPER(TRIM(T1.GENERICNAME)) ||'^'||
UPPER(TRIM(TO_CHAR(T1.MED_STRNG_AMT))) ||'^'||
UPPER(TRIM(T1.UOM_CD)) ||'^'||
UPPER(TRIM(T1.MED_FORM_NM)) ||'^'
)
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
2: Links = Associations
Links = Transactions and Associations
They are used to hook together multiple sets of information
In DV 2.0 the BK attributes may migrate to the Links for faster query
LNK_CUSTOMER_ORDERL
*P MD5_LNK_CUSTOMER_ORDER VARCHAR2 (32)
*UF MD5_HUB_CUSTOMER VARCHAR2 (32)
*UF MD5_HUB_ORDER VARCHAR2 (32)
* C_NAME VARCHAR2 (80)
* O_ORDERID INTEGER
* LDTS TIMESTAMP
* RSCR VARCHAR2 (256)
LNK_CUSTOMER_ORDER_PK (MD5_LNK_CUSTOMER_ORDER)
LNK_CUSTOMER_ORDER__UN (MD5_HUB_CUSTOMER, MD5_HUB_ORDER)
LNK_CUSTOMER_ORDER_HUB_CUSTOMER_FK (MD5_HUB_CUSTOMER)
LNK_CUSTOMER_ORDER_HUB_ORDER_FK (MD5_HUB_ORDER)
HUB_ORDERH
*P MD5_HUB_ORDER VARCHAR2 (32)
*U O_ORDERID INTEGER
* LDTS TIMESTAMP
* RSCR VARCHAR2 (256)
O_ORDERKEY INTEGER
HUB_ORDER_PK (MD5_HUB_ORDER)
HUB_ORDER__UN (O_ORDERID)
HUB_CUSTOMERH
*P MD5_HUB_CUSTOMER VARCHAR2 (32)
*U C_NAME VARCHAR2 (80)
LDTS TIMESTAMP
RSCR VARCHAR2 (256)
C_CUSTKEY INTEGER
HUB_CUSTOMER_PK (MD5_HUB_CUSTOMER)
HUB_CUSTOMER__UN (C_NAME)
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Tommorrow With DV
No need to change
EDW structure
The business rule can
change to a 1:M
Relationship is a 1:1 so
why model a Link?
Modeling Links - 1:1 or 1:M?
Today
You discover new
data later
Existing data is fine
New data is added
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
3. Satellites = Descriptors
Satellites provide context for the Hubs and the Links
Tracks changes over time - Like SCD 2
In DV 2.0 use HASH_DIFF to detect changes
SAT_ORDERS
*PF MD5_HUB_ORDER VARCHAR2 (32)
*P LDTS TIMESTAMP
O_ORDERSTATUS VARCHAR2 (1)
O_TOTALPRICE NUMBER (12,2)
O_ORDERDATE DATE
O_ORDERPRIORITY VARCHAR2 (15)
O_CLERK VARCHAR2 (15)
O_SHIPPRIORITY INTEGER
O_COMMENT VARCHAR2 (79)
* HASH_DIFF VARCHAR2 (32)
* RSRC VARCHAR2 (256)
SAT_ORDER_PK (MD5_HUB_ORDER, LDTS)
SAT_ORDER_HUB_ORDER_FK (MD5_HUB_ORDER)
SAT_LNK_PART_SUPPLIERS
*PF MD5_LNK_PART_SUPPLIER VARCHAR2 (32)
*P LDTS TIMESTAMP
PS_AVAILQTY INTEGER
PS_SUPPLYCOST NUMBER (12,2)
PS_COMMENT VARCHAR2 (199)
* HASH_DIFF VARCHAR2 (32)
* RSRC VARCHAR2 (256)
PK_SAT_LNK_PART_SUPPLIER (LDTS, MD5_LNK_PART_SUPPLIER)
SAT_LNK_PART_SUPPLIER_LNK_PART_SUPPLIER_FK (MD5_LNK_PART_SUPPLIER)
HUB_ORDERH
*P MD5_HUB_ORDER VARCHAR2 (32)
*U O_ORDERID INTEGER
* LDTS TIMESTAMP
* RSRC VARCHAR2 (256)
HUB_ORDER_PK (MD5_HUB_ORDER)
HUB_ORDER__UN (O_ORDERID)
LNK_PART_SUPPLIERL
*P MD5_LNK_PART_SUPPLIER VARCHAR2 (32)
*UF MD5_HUB_SUPPLIER VARCHAR2 (32)
*UF MD5_HUB_PART VARCHAR2 (32)
* S_NAME VARCHAR2 (80)
* P_NAME VARCHAR2 (80)
* P_BRAND VARCHAR2 (80)
* P_TYPE VARCHAR2 (80)
* P_SIZE INTEGER
* P_CONTAINER VARCHAR2 (80)
* LDTS TIMESTAMP
* RSRC VARCHAR2 (256)
LNK_PART_SUPPLIER_PK (MD5_LNK_PART_SUPPLIER)
LNK_PART_SUPPLIER__UN (MD5_HUB_SUPPLIER, MD5_HUB_PART)
LNK_PART_SUPPLIER_HUB_PART_FK (MD5_HUB_PART)
LNK_PART_SUPPLIER_HUB_SUPPLIER_FK (MD5_HUB_SUPPLIER)
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• Think Type 2 SCD (Slowly Changing Dimensions)
• Old Way:
• Compare column by column
• Source value != Current value in DW table
• 20 columns, then 20 compares
• New Way:
• Concatenate all columns to one string
• Convert to one char(32) string with hash function
• Compare to hashed value (HASH_DIFF) in target table
• Does not matter how many columns
MD5-Based Change Detection
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• Use the Hub/Link PK columns
• Filter on LEAD of LOAD_DTS
WHERE LEAD(stg.LOAD_DTS) OVER
(PARTITION BY stg.HUB_KEY
ORDER BY stg.LOAD_DTS) IS NULL
Easily Getting Current Rows
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
BUT… Can’t Use Function In Where
01.
Have to nest the
query with the
function as a
virtual table in the
FROM
02.
Then use
CURR_FLAG
in outer
WHERE
03.
Works in
Oracle, SQL
Server, and
SnowflakeDB
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Example: Current View
SELECT
SAT.HASH_KEY
SAT.Business_Group_Code,
SAT.Code_Key_Numeric,
SAT.System_Value_Type,
…
SAT.Change_Time,
SAT.LOAD_DTS
FROM
(
SELECT
RMC.HASH_KEY AS HASH_KEY
RMC.CODCODNUM AS Business_Group_Code,
RMC.CODKEYNUM AS Code_Key_Numeric,
RMC.CODSYSTYP AS System_Value_Type,
…
RMC.CODCHGTIM AS Change_Time,
RMC.LOAD_DTS AS LOAD_DTS,
CASE
WHEN RANK() OVER (PARTITION BY RMC.HASH_KEY
ORDER BY RMC.LOAD_DTS DESC) = 1
THEN 'Y'
ELSE 'N'
END CURR_FLG –- calculated column
FROM
SAT_RMCODP RMC
) SAT –- nested virtual table
WHERE
SAT.CURR_FLG = 'Y' –filter on calculated column
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Example 2 – Virtual Expire Date!
CASE
WHEN LEAD(stg.LOAD_DTS)
OVER (PARTITION BY stg.CDC_KEY
ORDER BY stg.LOAD_DTS) IS NULL
THEN 'Y'
ELSE 'N'
END CURR_FLG,
LEAD(stg.LOAD_DTS) OVER (PARTITION BY
stg.CDC_KEY ORDER BY stg.LOAD_DTS) EXPR_DTS
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Foundational Keys
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Data Vault Model Flexibility (Agility)
Based on natural
business keys
Highly normalized
› Hubs and Links only hold keys and meta data
› Satellites split by rate of change and/or source
Enables Agile data modeling
› Easy to add to model without having to change existing
structures and load routines
• Relationships (links) can be dropped and created on-demand.
› No more reloading history because of a missed requirement
Not system surrogate keys
Allows for integrating data across functions
and source systems more easily
› All data relationships are key driven
Highly normalized
› Hubs and Links only hold keys and meta data
› Satellites split by rate of change and/or source
Enables Agile data modeling
› Easy to add to model without having to change existing
structures and load routines
• Relationships (links) can be dropped and created on-demand.
› No more reloading history because of a missed requirement
Not system surrogate keys
Allows for integrating data across functions
and source systems more easily
› All data relationships are key driven
Goes beyond
standard 3NF
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Data Vault Agility
Adding new components to
the EDW has NEAR ZERO
impact to existing:
Ø Loading Processes
Ø Data Model
Ø Reports & BI Functions
Ø Downstream Systems
Ø Star Schemas or Data Marts
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Case in Point
Result of flexibility of Data Vault Model allowed them to
merge 3 companies in 90 days - that is ALL systems, ALL
DATA!
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Key: Scalability
Providing no foreseeable barrier to increased
size and scope
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Scalability in the Model Architecture
Scaling is easy, its based on the following
principles:
Hub and spoke design
MPP Shared-Nothing Architecture (original
intent)
Scale Free Networks (i.e., Snowflake)
Can be partitioned vertically and
horizontally to meet performance or
security demands
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Perhaps You Wish to Split for Security Reasons?
From This
DV Physical Partitioning
Single Snowflake Database
DV “Physical” Partitioning
In Snowflake –
might split for
security reasons
To THIS!
DV “Physical” Partitioning
Non-sensitive
data
PII or sensitive data
with a Link to non-
sensitive data
Snowflake DB 1 Snowflake DB 2
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Case in Point:
Result of scalability was to produce a Data Vault Model
that scaled to 3 Petabytes in size, and is still growing
today!
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Key: Productivity
Enabling low complexity systems with high value
output at a rapid pace
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• Standardized modeling rules
• Highly repeatable and learnable modeling technique
• Can standardize load routines
• Delta Driven process
• Re-startable, consistent loading patterns.
• Load multiple objects in parallel!
• Can standardize extract routines
• Rapid build of new or revised Data Marts
• Can be automated (e.g. WhereScape)
• Can use a BI-meta layer to virtualize the reporting structures
• Example: Looker using LookML semantic layer
• Example: BOBJ Universe Business Layer
• Can put views on the DV structures as well
• Simulate ODS/3NF or Star Schemas
Key: Productivity
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• The Data Vault holds granular historical relationships
• Holds all history for all time, allowing any source system feeds to be reconstructed on-
demand
• Easy generation of Audit Trails for data lineage and compliance
• Data Mining can discover new relationships between elements
• Patterns of change emerge from the historical pictures and linkages
• The Data Vault can be accessed by power-users
Key: Productivity
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Key: Productivity - Loading
Deterministic Keys
Non-Deterministic Keys
Less Scheduling
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Key: Productivity
(More Maintenance Before)
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Key: Productivity
(Less Maintenance After)
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• Modeling it as a DV forces integration of the Business
Keys upfront
• Good for organizational alignment
• An integrated data set with raw data extends it’s
value beyond BI:
• Source for data quality projects
• Source for master data
• Source for data mining
• Source for Data as a Service (DaaS) in an SOA
(Service Oriented Architecture)
Other Benefits of a Data Vault
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• Upfront Hub integration simplifies the data integration
routines required to load data marts
• Helps divide the work a bit
• It is much easier to implement security on these
granular pieces
• Granular, re-startable processes enable pin-point failure
correction
• It is designed and optimized for real-time loading in its
core architecture (without any tweaks or mods)
Other Benefits of Data Vault
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
WHAT IS
THE
WORLD'S
SMALLEST
DATA VAULT?
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Worlds Smallest Data Vault
Hub Customer
Hub_Cust_Seq_ID
Hub_Cust_Num
Hub_Cust_Load_DTS
Hub_Cust_Rec_Src
Hub_Cust_Seq_ID
Sat_Cust_Load_DTS
Sat_Cust_Load_End_DTS
Sat_Cust_Name
Sat_Cust_Rec_Src
Satellite
Customer Name
› The Data Vault doesn’t have to be “BIG”.
› A Data Vault can be built incrementally.
› Reverse engineering one component of the
existing models is not uncommon.
› Building one part of the Data Vault, then
changing the marts to feed from that vault is a
best practice.
› The smallest Enterprise Data Warehouse
consists of two tables:
• One Hub,
• One Satellite
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Who’s Using Data Vault?
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• University of Texas, MD Anderson Cancer Center
• Denver Public Schools
• Micron
• Independent Purchasing Cooperative (IPC, Miami)
• Kaplan
• US Defense Department
• Colorado Springs Utilities
• State Court of Wyoming
• Federal Express
• US Dept. of Agriculture
Other Organizations Using Data Vault
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
•Aptus Health
•ResearchNow
•F+W Media
Snowflake Customers using Data Vault
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
• Several Snowflake Partners offer Data Vault classes
• ScaleFree - in EMEA
• PerformacneG2 – in USA
• Empowered Holdings (Dan Linstedt) – globally
• Talk to you Snowflake account rep for contact information
Data Vault Training & Certification
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
The Experts Say…
“The Data Vault is the optimal choice
for modeling the EDW in the DW 2.0
framework.” Bill Inmon
“The Data Vault is foundationally
strong and exceptionally scalable
architecture.” Stephen Brobst
“The Data Vault is a technique which some
industry experts have predicted may spark a
revolution as the next big thing in data modeling
for enterprise warehousing....” Doug Laney
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
More Notables…
“This enables organizations to take control of
their data warehousing destiny, supporting
better and more relevant data warehouses in
less time than before.” Howard Dresner
“[The Data Vault] captures a practical body of
knowledge for data warehouse development
which both agile and traditional practitioners
will benefit from.” Scott Ambler
“In planning to build a CIF infrastructure for Denver Public Schools,
we have determined that the Data Vault approach is what we will use
to build the central EDW. We believe that this data modeling
technique is the best suited for designing a central, historic data
repository because of its flexibility to easily add new subject areas
and attributes. This will allow us to grow the EDW in an organic
manner, over time, as we discover new requirements.”
Kent Graziano
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
References
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
› Available on Amazon:
http://www.amazon.com/Bette
r-Data-Modeling-Introduction-
Engineering-ebook /dp/
B018BREV1C/
Intro ebook by
Kent Graziano:
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
› Available on Amazon.com
› Soft Cover or Kindle Format
› Now also available in PDF at
LearnDataVault.com
› Kent was the Technical Editor
Super Charge
Your Data Warehouse
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
› Available on Amazon:
http://www.amazon.com/Buildi
ng-Scalable-Data-Warehouse-
Vault/dp/0128025107/
New DV 2.0 Book
from Dan Linstedt
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
YOUR DATA, NO LIMITS
© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
Thank You!

Mais conteúdo relacionado

Mais procurados

Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerDatabricks
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaJeffrey T. Pollock
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault ModelingKent Graziano
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data GovernanceDATAVERSITY
 
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScapeData Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScapeWhereScape
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeKent Graziano
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesEric Kavanagh
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of MetadataDATAVERSITY
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Liberating data with Talend Data Catalog
Liberating data with Talend Data CatalogLiberating data with Talend Data Catalog
Liberating data with Talend Data CatalogJean-Michel Franco
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoDataKitchen
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsDatabricks
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceDenodo
 

Mais procurados (20)

Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Operational Data Vault
Operational Data VaultOperational Data Vault
Operational Data Vault
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault Modeling
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data Governance
 
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScapeData Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
Delivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with SnowflakeDelivering Data Democratization in the Cloud with Snowflake
Delivering Data Democratization in the Cloud with Snowflake
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Liberating data with Talend Data Catalog
Liberating data with Talend Data CatalogLiberating data with Talend Data Catalog
Liberating data with Talend Data Catalog
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps Manifesto
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Data Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and GovernanceData Catalog for Better Data Discovery and Governance
Data Catalog for Better Data Discovery and Governance
 

Semelhante a Agile Data Engineering: Introduction to Data Vault 2.0 (2018)

Kent-Graziano-Intro-to-Datavault_short.pdf
Kent-Graziano-Intro-to-Datavault_short.pdfKent-Graziano-Intro-to-Datavault_short.pdf
Kent-Graziano-Intro-to-Datavault_short.pdfabhaybansal43
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWKent Graziano
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDATAVERSITY
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessionsJessicaMurrell3
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar ibi
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsArcadia Data
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation Brett VanderPlaats
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesDenodo
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...Xpand IT
 
Data Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data WarehousingData Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data WarehousingAll Things Open
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksDatabricks
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeSoftware Guru
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudInside Analysis
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 

Semelhante a Agile Data Engineering: Introduction to Data Vault 2.0 (2018) (20)

Kent-Graziano-Intro-to-Datavault_short.pdf
Kent-Graziano-Intro-to-Datavault_short.pdfKent-Graziano-Intro-to-Datavault_short.pdf
Kent-Graziano-Intro-to-Datavault_short.pdf
 
Demystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFWDemystifying Data Warehousing as a Service - DFW
Demystifying Data Warehousing as a Service - DFW
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
Modern data integration expert sessions
Modern data integration expert sessionsModern data integration expert sessions
Modern data integration expert sessions
 
Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar Modern Data Integration Expert Session Webinar
Modern Data Integration Expert Session Webinar
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
 
Data Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data WarehousingData Vault 2.0: Big Data Meets Data Warehousing
Data Vault 2.0: Big Data Meets Data Warehousing
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
datavault2.pptx
datavault2.pptxdatavault2.pptx
datavault2.pptx
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
 
Bridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the CloudBridging the Gap: Analyzing Data in and Below the Cloud
Bridging the Gap: Analyzing Data in and Below the Cloud
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 

Mais de Kent Graziano

Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudKent Graziano
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...Kent Graziano
 
Rise of the Data Cloud
Rise of the Data CloudRise of the Data Cloud
Rise of the Data CloudKent Graziano
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Kent Graziano
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on ReadKent Graziano
 
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsExtreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsKent Graziano
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
 
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSAgile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSKent Graziano
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Kent Graziano
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016Kent Graziano
 
Worst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignWorst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignKent Graziano
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureKent Graziano
 
Agile Methods and Data Warehousing
Agile Methods and Data WarehousingAgile Methods and Data Warehousing
Agile Methods and Data WarehousingKent Graziano
 
Top Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data ModelerTop Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data ModelerKent Graziano
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault ModelingKent Graziano
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachKent Graziano
 

Mais de Kent Graziano (17)

Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data Cloud
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
 
Rise of the Data Cloud
Rise of the Data CloudRise of the Data Cloud
Rise of the Data Cloud
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on Read
 
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsExtreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSAgile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
Worst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignWorst Practices in Data Warehouse Design
Worst Practices in Data Warehouse Design
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
 
Agile Methods and Data Warehousing
Agile Methods and Data WarehousingAgile Methods and Data Warehousing
Agile Methods and Data Warehousing
 
Top Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data ModelerTop Five Cool Features in Oracle SQL Developer Data Modeler
Top Five Cool Features in Oracle SQL Developer Data Modeler
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 

Último

Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Agile Data Engineering: Introduction to Data Vault 2.0 (2018)

  • 1. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Y O U R D A T A , N O L I M I T S © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Kent Graziano Chief Technical Evangelist and Strategic Advisor Agile Data Engineering Introduction to Data Vault 2.0 @KentGraziano
  • 2. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • Bio • Agile & DW • What is a Data Vault & Where does it fit? • How to design a Data Vault model • Foundational Keys • Benefits of Data Vault • Who is using Data Vault? • References Agenda
  • 3. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. My Bio › Chief Technical Evangelist, Snowflake Computing › Blogger: The Data Warrior › Certified Data Vault Master and DV 2.0 Practitioner › Oracle ACE Director (Alumni) › OakTable Member › Member – DAMA Houston & DAMA International › Data Modeling, Data Architecture and Data Warehouse Specialist › 30+ years in IT › 25+ years of Oracle-related work › 20+ years of data warehousing experience › Former-Member: Boulder BI Brain Trust (http://www.boulderbibraintrust.org/) › Author & Co-Author of a bunch of books › Past-President of ODTUG and Rocky Mountain Oracle User Group
  • 4. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. 3 years in stealth + 3 years GA 600+ employees Over 2000 customers today Over $400M in venture funding from leading investors First customers 2014, general availability 2015 Founded 2012 by industry veterans with over 120 database patents Queries processed in Snowflake per day: Largest single table: Largest number of tables single DB: Single customer most data: Single customer most users: 60 Million 68 Trillion Rows 200,000 > 40 PB > 10,000
  • 5. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. http://agilemanifesto.org "We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value: ★ Individuals and interactions over processes and tools ★ Working software over comprehensive documentation ★ Customer collaboration over contract negotiation ★ Responding to change over following a plan That is, while there is value in the items on the right, we value the items on the left more.” Manifesto for Agile Software Development Kent Beck Mike Beedle Arie van Bennekum Alistair Cockburn Ward Cunningham Martin Fowler James Grenning Jim Highsmith Andrew Hunt Ron Jeffries Jon Kern Brian Marick Robert C. Martin Steve Mellor Ken Schwaber Jeff Sutherland Dave Thomas
  • 6. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • User Stories instead of requirements documents • Time-based iterations • Iteration has a standard length • Choose one or more user stories to fit in that iteration • Rework is part of the game • There are no “Missed requirements”…only those that haven’t been discovered yet Applying the Agile Manifest to DW
  • 7. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Data Vault Model Definition The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema. The design is flexible, scalable, consistent and adaptable to the needs of the enterprise. Architected specifically to meet the needs of today’s enterprise data warehouses Dan Linstedt: Defining the Data Vault
  • 8. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • What are our other Enterprise Data Warehouse Options? • Third-Normal Form (3NF): Complex primary keys (PK’s) with cascading snapshot • Star Schema (Dimensional): Difficult to reengineer fact tables for granularity changes • Difficult to get it right the first time • Not adaptable to rapid business change • NOT AGILE! What is Data Vault Trying to Solve?
  • 9. Data Vault Timeline 1960 1970 1980 1990 2000 E.F. Codd invented relational modeling Chris Date and Hugh Harwin refined modeling concepts 1976: Dr. Peter Chen created E-R diagramming Mid 70’s: AC Nielsen popularized Dimension & Fact Terms 1990: Dan Linstedt begins R&D on Data Vault Modeling Early 70’s: Bill Inmon began discussing Data Warehousing Mid 60’s: Dimension & Fact modeling presented by General Mills and Dartmouth University Late 80’s: Barry Devlin and Dr. Kimball release “Business Data Warehouse” Mid 80’s: Bill Inmon popularized Data Warehousing Mid-Late 80’s: Dr. Kimball popularizes Star Schema 2000: Dan Linstedt release first 5 articles on Data Vault Modeling © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.
  • 10. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • The work on the Data Vault approach began in the early 1990s; completed around 1999. • Throughout 1999, 2000, and 2001, the Data Vault design was tested, refined, and deployed into specific customer sites. • In 2002, the industry thought leaders were asked to review the architecture. • Kent meets Dan at a Lunch & Learn in Denver! • In 2003, Dan began teaching the modeling techniques to the mass public. • Kent & Team take their 1st Data Vault Modeling class • In 2014, Dan introduced DV 2.0! Data Vault Evolution
  • 11. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. WHAT DOES THE DATA VAULT MODEL LOOK LIKE?
  • 12. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Data Vault: 3 Simple Structures HUB LINK SATELLITE EDW Data Vault 1 2 3
  • 13. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Data Vault Core Architecture HUBS ------------------- Unique List of Business Keys LINKS ------------------- Unique List of Relationships across Keys SATS ------------------- Descriptive Data ● Satellites have one and only one parent table ● Satellites cannot be parents to other tables ● Hubs cannot be child tables https://en.wikipedia.org/wiki/Data_vault_modeling#Basic_notions
  • 14. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Standard Data Vault Model Hub: List of UNIQUE business keys. Link: List of UNIQUE relationships across keys Satellite: Historical descriptive data.
  • 15. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. 1. Hub = Business Keys Hubs = Unique Lists of Business Keys Business Keys are used to TRACK and IDENTIFY key information DV 2.0 uses MD5 Hash of the BK for the PK Natural Business Keys also acceptable for PK HUB_ORDERH *P MD5_HUB_ORDER VARCHAR2 (32) *U O_ORDERID INTEGER * LDTS TIMESTAMP * RSCR VARCHAR2 (256) O_ORDERKEY INTEGER HUB_ORDER_PK (MD5_HUB_ORDER) HUB_ORDER__UN (O_ORDERID) HUB_CUSTOMERH *P MD5_HUB_CUSTOMER VARCHAR2 (32) *U C_NAME VARCHAR2 (80) LDTS TIMESTAMP RSCR VARCHAR2 (256) C_CUSTKEY INTEGER HUB_CUSTOMER_PK (MD5_HUB_CUSTOMER) HUB_CUSTOMER__UN (C_NAME) HUB_SUPPLIERH *P MD5_HUB_SUPPLIER VARCHAR2 (32) *U S_NAME VARCHAR2 (80) * LDTS TIMESTAMP * RSCR VARCHAR2 (256) * S_SUPPKEY INTEGER HUB_SUPPLIER_PK (MD5_HUB_SUPPLIER) HUB_SUPPLIER__UN (S_NAME) HUB_PARTH *P MD5_HUB_PART VARCHAR2 (32) *U P_NAME VARCHAR2 (80) *U P_BRAND VARCHAR2 (80) *U P_TYPE VARCHAR2 (80) *U P_SIZE INTEGER *U P_CONTAINER VARCHAR2 (80) * LDTS TIMESTAMP * RSCR VARCHAR2 (256) P_PARTKEY INTEGER HUB_PART_PK (MD5_HUB_PART) HUB_PART__UN (P_NAME, P_BRAND, P_TYPE, P_SIZE, P_CONTAINER)
  • 16. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • MD5 Hash Function – Snowflake • MD5 (UPPER(RTRIM(RMC.CAFCUSCHN))) • MD5 hash function – Oracle • rawtohex(sys.utl_raw.cast_to_raw(dbms_obfuscation_toolkit.md5 (input_string => ...) • NEW: dbms_crypto.HASH(utl_raw.cast_to_raw(<input string>), 2); • 2 is for MD5 algorithm option • MD5 hash function - SQL Server • CONVERT([Char](32),HASHBYTES('MD5’, UPPER(RTRIM(RMC.CAFCUSCHN)))) • Need to minimize chance of duplicates • 12||3||45 and 1||2||345 hash to same value • Need a separator between each • Example: Col1||’^’||Col2||’^’||Col3 • Need to account for NULLs too What Does the MD5 Look Like?
  • 17. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • To generate most consistent string: standardize! • Convert data types • If 'NUMBER', 'VARCHAR', ‘TEXT’ • THEN 'TO_CHAR(' || column_name || ‘)‘ • If LIKE 'DATE%‘ • THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD’’)‘ • If LIKE 'TIME%‘ • THEN 'TO_CHAR(' || column_name || ', ''YYYY-MM-DD HH24:MI:SS'')' Other Considerations
  • 18. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Final Input String ( UPPER(TRIM(T1.GENERICNAME)) ||'^'|| UPPER(TRIM(TO_CHAR(T1.MED_STRNG_AMT))) ||'^'|| UPPER(TRIM(T1.UOM_CD)) ||'^'|| UPPER(TRIM(T1.MED_FORM_NM)) ||'^' )
  • 19. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. 2: Links = Associations Links = Transactions and Associations They are used to hook together multiple sets of information In DV 2.0 the BK attributes may migrate to the Links for faster query LNK_CUSTOMER_ORDERL *P MD5_LNK_CUSTOMER_ORDER VARCHAR2 (32) *UF MD5_HUB_CUSTOMER VARCHAR2 (32) *UF MD5_HUB_ORDER VARCHAR2 (32) * C_NAME VARCHAR2 (80) * O_ORDERID INTEGER * LDTS TIMESTAMP * RSCR VARCHAR2 (256) LNK_CUSTOMER_ORDER_PK (MD5_LNK_CUSTOMER_ORDER) LNK_CUSTOMER_ORDER__UN (MD5_HUB_CUSTOMER, MD5_HUB_ORDER) LNK_CUSTOMER_ORDER_HUB_CUSTOMER_FK (MD5_HUB_CUSTOMER) LNK_CUSTOMER_ORDER_HUB_ORDER_FK (MD5_HUB_ORDER) HUB_ORDERH *P MD5_HUB_ORDER VARCHAR2 (32) *U O_ORDERID INTEGER * LDTS TIMESTAMP * RSCR VARCHAR2 (256) O_ORDERKEY INTEGER HUB_ORDER_PK (MD5_HUB_ORDER) HUB_ORDER__UN (O_ORDERID) HUB_CUSTOMERH *P MD5_HUB_CUSTOMER VARCHAR2 (32) *U C_NAME VARCHAR2 (80) LDTS TIMESTAMP RSCR VARCHAR2 (256) C_CUSTKEY INTEGER HUB_CUSTOMER_PK (MD5_HUB_CUSTOMER) HUB_CUSTOMER__UN (C_NAME)
  • 20. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Tommorrow With DV No need to change EDW structure The business rule can change to a 1:M Relationship is a 1:1 so why model a Link? Modeling Links - 1:1 or 1:M? Today You discover new data later Existing data is fine New data is added
  • 21. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. 3. Satellites = Descriptors Satellites provide context for the Hubs and the Links Tracks changes over time - Like SCD 2 In DV 2.0 use HASH_DIFF to detect changes SAT_ORDERS *PF MD5_HUB_ORDER VARCHAR2 (32) *P LDTS TIMESTAMP O_ORDERSTATUS VARCHAR2 (1) O_TOTALPRICE NUMBER (12,2) O_ORDERDATE DATE O_ORDERPRIORITY VARCHAR2 (15) O_CLERK VARCHAR2 (15) O_SHIPPRIORITY INTEGER O_COMMENT VARCHAR2 (79) * HASH_DIFF VARCHAR2 (32) * RSRC VARCHAR2 (256) SAT_ORDER_PK (MD5_HUB_ORDER, LDTS) SAT_ORDER_HUB_ORDER_FK (MD5_HUB_ORDER) SAT_LNK_PART_SUPPLIERS *PF MD5_LNK_PART_SUPPLIER VARCHAR2 (32) *P LDTS TIMESTAMP PS_AVAILQTY INTEGER PS_SUPPLYCOST NUMBER (12,2) PS_COMMENT VARCHAR2 (199) * HASH_DIFF VARCHAR2 (32) * RSRC VARCHAR2 (256) PK_SAT_LNK_PART_SUPPLIER (LDTS, MD5_LNK_PART_SUPPLIER) SAT_LNK_PART_SUPPLIER_LNK_PART_SUPPLIER_FK (MD5_LNK_PART_SUPPLIER) HUB_ORDERH *P MD5_HUB_ORDER VARCHAR2 (32) *U O_ORDERID INTEGER * LDTS TIMESTAMP * RSRC VARCHAR2 (256) HUB_ORDER_PK (MD5_HUB_ORDER) HUB_ORDER__UN (O_ORDERID) LNK_PART_SUPPLIERL *P MD5_LNK_PART_SUPPLIER VARCHAR2 (32) *UF MD5_HUB_SUPPLIER VARCHAR2 (32) *UF MD5_HUB_PART VARCHAR2 (32) * S_NAME VARCHAR2 (80) * P_NAME VARCHAR2 (80) * P_BRAND VARCHAR2 (80) * P_TYPE VARCHAR2 (80) * P_SIZE INTEGER * P_CONTAINER VARCHAR2 (80) * LDTS TIMESTAMP * RSRC VARCHAR2 (256) LNK_PART_SUPPLIER_PK (MD5_LNK_PART_SUPPLIER) LNK_PART_SUPPLIER__UN (MD5_HUB_SUPPLIER, MD5_HUB_PART) LNK_PART_SUPPLIER_HUB_PART_FK (MD5_HUB_PART) LNK_PART_SUPPLIER_HUB_SUPPLIER_FK (MD5_HUB_SUPPLIER)
  • 22. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • Think Type 2 SCD (Slowly Changing Dimensions) • Old Way: • Compare column by column • Source value != Current value in DW table • 20 columns, then 20 compares • New Way: • Concatenate all columns to one string • Convert to one char(32) string with hash function • Compare to hashed value (HASH_DIFF) in target table • Does not matter how many columns MD5-Based Change Detection
  • 23. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • Use the Hub/Link PK columns • Filter on LEAD of LOAD_DTS WHERE LEAD(stg.LOAD_DTS) OVER (PARTITION BY stg.HUB_KEY ORDER BY stg.LOAD_DTS) IS NULL Easily Getting Current Rows
  • 24. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. BUT… Can’t Use Function In Where 01. Have to nest the query with the function as a virtual table in the FROM 02. Then use CURR_FLAG in outer WHERE 03. Works in Oracle, SQL Server, and SnowflakeDB
  • 25. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Example: Current View SELECT SAT.HASH_KEY SAT.Business_Group_Code, SAT.Code_Key_Numeric, SAT.System_Value_Type, … SAT.Change_Time, SAT.LOAD_DTS FROM ( SELECT RMC.HASH_KEY AS HASH_KEY RMC.CODCODNUM AS Business_Group_Code, RMC.CODKEYNUM AS Code_Key_Numeric, RMC.CODSYSTYP AS System_Value_Type, … RMC.CODCHGTIM AS Change_Time, RMC.LOAD_DTS AS LOAD_DTS, CASE WHEN RANK() OVER (PARTITION BY RMC.HASH_KEY ORDER BY RMC.LOAD_DTS DESC) = 1 THEN 'Y' ELSE 'N' END CURR_FLG –- calculated column FROM SAT_RMCODP RMC ) SAT –- nested virtual table WHERE SAT.CURR_FLG = 'Y' –filter on calculated column
  • 26. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Example 2 – Virtual Expire Date! CASE WHEN LEAD(stg.LOAD_DTS) OVER (PARTITION BY stg.CDC_KEY ORDER BY stg.LOAD_DTS) IS NULL THEN 'Y' ELSE 'N' END CURR_FLG, LEAD(stg.LOAD_DTS) OVER (PARTITION BY stg.CDC_KEY ORDER BY stg.LOAD_DTS) EXPR_DTS
  • 27. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Foundational Keys
  • 28. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Data Vault Model Flexibility (Agility) Based on natural business keys Highly normalized › Hubs and Links only hold keys and meta data › Satellites split by rate of change and/or source Enables Agile data modeling › Easy to add to model without having to change existing structures and load routines • Relationships (links) can be dropped and created on-demand. › No more reloading history because of a missed requirement Not system surrogate keys Allows for integrating data across functions and source systems more easily › All data relationships are key driven Highly normalized › Hubs and Links only hold keys and meta data › Satellites split by rate of change and/or source Enables Agile data modeling › Easy to add to model without having to change existing structures and load routines • Relationships (links) can be dropped and created on-demand. › No more reloading history because of a missed requirement Not system surrogate keys Allows for integrating data across functions and source systems more easily › All data relationships are key driven Goes beyond standard 3NF
  • 29. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Data Vault Agility Adding new components to the EDW has NEAR ZERO impact to existing: Ø Loading Processes Ø Data Model Ø Reports & BI Functions Ø Downstream Systems Ø Star Schemas or Data Marts
  • 30. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Case in Point Result of flexibility of Data Vault Model allowed them to merge 3 companies in 90 days - that is ALL systems, ALL DATA!
  • 31. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Key: Scalability Providing no foreseeable barrier to increased size and scope
  • 32. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Scalability in the Model Architecture Scaling is easy, its based on the following principles: Hub and spoke design MPP Shared-Nothing Architecture (original intent) Scale Free Networks (i.e., Snowflake) Can be partitioned vertically and horizontally to meet performance or security demands
  • 33. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Perhaps You Wish to Split for Security Reasons? From This DV Physical Partitioning Single Snowflake Database DV “Physical” Partitioning In Snowflake – might split for security reasons To THIS! DV “Physical” Partitioning Non-sensitive data PII or sensitive data with a Link to non- sensitive data Snowflake DB 1 Snowflake DB 2
  • 34. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Case in Point: Result of scalability was to produce a Data Vault Model that scaled to 3 Petabytes in size, and is still growing today!
  • 35. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Key: Productivity Enabling low complexity systems with high value output at a rapid pace
  • 36. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • Standardized modeling rules • Highly repeatable and learnable modeling technique • Can standardize load routines • Delta Driven process • Re-startable, consistent loading patterns. • Load multiple objects in parallel! • Can standardize extract routines • Rapid build of new or revised Data Marts • Can be automated (e.g. WhereScape) • Can use a BI-meta layer to virtualize the reporting structures • Example: Looker using LookML semantic layer • Example: BOBJ Universe Business Layer • Can put views on the DV structures as well • Simulate ODS/3NF or Star Schemas Key: Productivity
  • 37. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • The Data Vault holds granular historical relationships • Holds all history for all time, allowing any source system feeds to be reconstructed on- demand • Easy generation of Audit Trails for data lineage and compliance • Data Mining can discover new relationships between elements • Patterns of change emerge from the historical pictures and linkages • The Data Vault can be accessed by power-users Key: Productivity
  • 38. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Key: Productivity - Loading Deterministic Keys Non-Deterministic Keys Less Scheduling
  • 39. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Key: Productivity (More Maintenance Before)
  • 40. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Key: Productivity (Less Maintenance After)
  • 41. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • Modeling it as a DV forces integration of the Business Keys upfront • Good for organizational alignment • An integrated data set with raw data extends it’s value beyond BI: • Source for data quality projects • Source for master data • Source for data mining • Source for Data as a Service (DaaS) in an SOA (Service Oriented Architecture) Other Benefits of a Data Vault
  • 42. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • Upfront Hub integration simplifies the data integration routines required to load data marts • Helps divide the work a bit • It is much easier to implement security on these granular pieces • Granular, re-startable processes enable pin-point failure correction • It is designed and optimized for real-time loading in its core architecture (without any tweaks or mods) Other Benefits of Data Vault
  • 43. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. WHAT IS THE WORLD'S SMALLEST DATA VAULT?
  • 44. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Worlds Smallest Data Vault Hub Customer Hub_Cust_Seq_ID Hub_Cust_Num Hub_Cust_Load_DTS Hub_Cust_Rec_Src Hub_Cust_Seq_ID Sat_Cust_Load_DTS Sat_Cust_Load_End_DTS Sat_Cust_Name Sat_Cust_Rec_Src Satellite Customer Name › The Data Vault doesn’t have to be “BIG”. › A Data Vault can be built incrementally. › Reverse engineering one component of the existing models is not uncommon. › Building one part of the Data Vault, then changing the marts to feed from that vault is a best practice. › The smallest Enterprise Data Warehouse consists of two tables: • One Hub, • One Satellite
  • 45. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Who’s Using Data Vault?
  • 46. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • University of Texas, MD Anderson Cancer Center • Denver Public Schools • Micron • Independent Purchasing Cooperative (IPC, Miami) • Kaplan • US Defense Department • Colorado Springs Utilities • State Court of Wyoming • Federal Express • US Dept. of Agriculture Other Organizations Using Data Vault
  • 47. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. •Aptus Health •ResearchNow •F+W Media Snowflake Customers using Data Vault
  • 48. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. • Several Snowflake Partners offer Data Vault classes • ScaleFree - in EMEA • PerformacneG2 – in USA • Empowered Holdings (Dan Linstedt) – globally • Talk to you Snowflake account rep for contact information Data Vault Training & Certification
  • 49. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. The Experts Say… “The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon “The Data Vault is foundationally strong and exceptionally scalable architecture.” Stephen Brobst “The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney
  • 50. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. More Notables… “This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” Howard Dresner “[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from.” Scott Ambler “In planning to build a CIF infrastructure for Denver Public Schools, we have determined that the Data Vault approach is what we will use to build the central EDW. We believe that this data modeling technique is the best suited for designing a central, historic data repository because of its flexibility to easily add new subject areas and attributes. This will allow us to grow the EDW in an organic manner, over time, as we discover new requirements.” Kent Graziano
  • 51. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. References
  • 52. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. › Available on Amazon: http://www.amazon.com/Bette r-Data-Modeling-Introduction- Engineering-ebook /dp/ B018BREV1C/ Intro ebook by Kent Graziano:
  • 53. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. › Available on Amazon.com › Soft Cover or Kindle Format › Now also available in PDF at LearnDataVault.com › Kent was the Technical Editor Super Charge Your Data Warehouse
  • 54. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution.© 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. › Available on Amazon: http://www.amazon.com/Buildi ng-Scalable-Data-Warehouse- Vault/dp/0128025107/ New DV 2.0 Book from Dan Linstedt
  • 55. © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. YOUR DATA, NO LIMITS © 2018 Snowflake Computing Inc. All Rights Reserved. Snowflake Proprietary. Not for Redistribution. Thank You!