1. The document discusses Agile Data Warehousing and the Data Vault approach. It introduces Ronald Damhof as the presenter at a conference on June 18, 2012 in Munich.
2. Damhof discusses different approaches to data management including a centralized department approach, outsourcing to experts, and a do-it-yourself approach. He advocates for an approach that delivers information products on demand against quality criteria aligned with customer expectations.
3. Key principles for a modern data management environment discussed are adaptability, sustainability, compliance, decentralization, effectiveness, standardization, and centralization.
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Tdwi agile data warehouse - dv, what is the buzz about
1. Agile Data Warehousing
Data Vault, What is the buzz about
TDWI München
June 18, 2012
Ronald Damhof
R.D.Damhof
2. “Our highest priority is to satisfy the
customer through early and continuous
delivery of valuable software”
Agile Manifesto, 2001
Kent Beck, Mike Beedle, Arie van Bennekum, Alistair Cockburn, Ward Cunningham,
Martin Fowler, James Grenning, Jim Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern,
Brian Marick, Robert C. Martin, Steve Mellor, Ken Schwaber, Jeff Sutherland, Dave Thomas
R.D.Damhof
4. Everybody mines their own data
Everybody enriches their own data
Everybody uses their own data
User = Developer
With his selfmade tools
Data quality determined by the individual
It’s a grind – limited reusability
Leadtimes unpredictable
No management
R.D.Damhof
5. Lets ‘order’ an information product
And hire a master/expert
Separation between user/developer
Developer/expert mines the data
The information product = custom made
Data quality is mostly dependable on the
developer/expert
Leadtimes unpredictable
Still not much reusability
R.D.Damhof
6. A central department who knows what
information you need
That assembles information products,
ready to be used for you
‘I now what you want’ – black
Efficiency is the name of the game
At least I got something, but it does not
comply - even remotely - to my needs
Even worse; the guild-days are still there
– the expert is now submerged, but
needed to get the data you actually need.
Introduction of management – you want
something? Please apply in 3-fold…
R.D.Damhof
7. Creating information products, the
moment they are asked for
Against quality criteria which are in line
with the expectation of the customer
Empower the customer with skills and
facilities to be more self sufficient
Minimize ‘data’-stock as much as possible
Embrace new wishes and changes
required by the customer
The customer is the most important part
of the production process
Stephen Denning (2011) – Radical Management
R.D.Damhof
8. A modern data management environment:
The ‘Supermarket’
The ‘Restaurant’
The ‘Do it yourself buffet’
R.D.Damhof
10. Push characteristics
§ Mass production
§ Known specifications, operational definitions, standards
§ Repeatable, predictable, even better; uniform process
§ Part of the system that needs statistical control
§ Inventory allowed/necessary
§ Supply driven
§ Reliability over flexibility
Pull characteristics
§
Just in time
§
Demand driven
§
Build to order
§
Preferably no inventory
§
Flexibility over Reliability
R.D.Damhof
11. Back to the issue at hand……
§ What: the ‘production process of data’
§ Where: Coordination - Local versus central
§ How: System Engineering - Systematic vs. Opportunistic
§ What principles guide us - leading principles
R.D.Damhof
12. Local
vs
Central
deployment
Informa.on
Delivery
Proces
Recipient
Informa.on
Delivery
process
4.
Generate
Informa.on
products
End-‐user
(Local)
Data
func.on
service
4
4
4
4
4
3.
Enrich
and
cleanse
data
3
3
3
3
3
2.
Register
Standardize
2
2
2
2
2
1
1
1
1
1
1.
Get
the
raw
uncut
data
Generic
proces
(Central)
Data
sources
(internal
external)
R.D.Damhof
13. System Engineering - Systematic vs. Opportunistic
Manoeuvrability
(opportunistic approach)
Ad-hoc development proces
Selfservice
Developer=user
Development
Self-sufficient/ great degree of freedom
Very broad tasks
Lightweight development process
Delegated
Minimum of specialisation/ distinction of roles
Development
Self-sufficient/ limited freedom
Development line discipline (OTAP)
Developers at a distance from users
IT Development
Mutually dependent/ within frameworks
Heavy separation of function
Sustainability
(Systematic approach)
R.D.Damhof
15. 1
2
3
4
Company
xxx
data
management
Domain
Source
store
BI
apps
Reports
Business
View
Sources
BI
Apps
Analysis
Enterprise
Data
Warehouse
BI
Apps
Ad-‐hoc
Data,
‘What’
Func.on,
‘How’
‘Where’,
‘Whom’
15
R.D.Damhof
16. Source
to
Sourcestore
to
Sourcestore
to
EDW
(DV)
product
product
BV
Adaptable
Sustainable
Compliant
Decoupled
Effec.ve
Standardized
Centralized
16
R.D.Damhof
17. 1
2
3
4
Company
xxx
data
warehouse
Business
Intelligence
Domain
Source
store
BI
apps
Reports
Business
View,
Data
feeds
Sources
BI
Apps
Analysis
Enterprise
Data
Warehouse
BI
Apps
Ad-‐hoc
Data,
‘What’
Func.on,
‘How’
‘Where’,
‘Whom’
17
R.D.Damhof
18. Administra.ve
process
Informa.on
Delivery
Process
Decision-‐
control
Generate
Data
Informa.on
recipients
Distribute
Enrich
Register
Standardize
Proces
AXain
Why PDCA
DV?
Compliance
repor.ng
Informa.on
products
Risk
Management
Push
Systems
DV
based
Pull
(internal
Data
Performance
external)
Warehouse
Management
Business
Supply
chain
Staging
rules
op.miza.on
Push
Data
products
Fraud
detec.on
Market
basket
analysis
Control
/
Metadata
18
R.D.Damhof
19. Metamodel
driven
automa.on
-‐ Models
(process,
rules
and
data)
determine
the
metadata,
the
metadata
determines
the
automa.on
ar.facts
-‐ Aim
is
to
be
100%
declara.ve
-‐ It
can
not
be
generated
all,
specific
tailored
metadata
will
remain
necessary
Metadata
driven
automa.on
-‐
Inputs:
Source
model(s),
target
model,
Template
Design,
Naming
conven.ons
-‐
Advanced
inputs:
Normaliza.on
preferences,
Ontologies
Taken
from
Dan
Linstedt’s
blog
post:
hXp://danlinstedt.com/datavaultcat/code-‐genera.on-‐for-‐data-‐vault-‐not-‐as-‐easy-‐as-‐you-‐think/
Data
Vault
implementa.ons
Template
driven
automa.on
-‐ In
the
most
basic
forms;
documenta.on
-‐
describing
a
paXern
-‐ More
advanced;
genera.ng
XML
code
for
2nd
gen.
ETL
tooling
-‐ Vb
-‐
hXp://www.grundsatzlich-‐it.nl/bi-‐tools-‐templator.html
19
R.D.Damhof
20. My PoV about (Data Vault) automation Tooling
§ Generation is an aid, not a goal in itself
Do not accommodate the principles to fit the tool....
Look for decoupling
§ Truly understand the mechanics - handcraft it first!
Invest in proper education and learning
Invest in getting ready time
Involve your customers from the start
§ PoC, PoC, PoC
§ Deliver, Deliver, Deliver
20
R.D.Damhof
21. Agility Data Vault (1)
Why is it that you can build and deploy extremely
small particles in Data Vault and not in other
approaches, without having an increase in the
overhead and coordination of these particles? In
other words; 'Divide and Conquer to beat the
Size / Complexity Dynamic’
R.D.Damhof
22. Agility Data Vault (2)
Why is it that you can re-engineer your existing model
and guarantee that the changes remain local?
Something that is hugely beneficial in data warehouses
that - by definition - grow over time.
R.D.Damhof
23. Agility Data Vault (3)
Why is it that - as your (Data Vault based) data
warehouse grows - your costs grow ‘merely’ in linear
fashion initially, and as you approach the end state
marginal growth in cost decreases exponentially.
R.D.Damhof
24. Data Vault as-such is not Agile, it is the development
process that needs to be agile, DV merely supports
the agile development process.
“Our highest priority is to satisfy the
customer through early and continuous
delivery of valuable software”
Agile Manifesto, 2001
Kent Beck, Mike Beedle, Arie van Bennekum, Alistair Cockburn, Ward Cunningham,
Martin Fowler, James Grenning, Jim Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern,
Brian Marick, Robert C. Martin, Steve Mellor, Ken Schwaber, Jeff Sutherland, Dave Thomas
R.D.Damhof
36. Classic Data Vault Application Architecture
Business
Transac.on
System
Staging
Data
Vault
Datasets
Out
Business
Transac.on
Generic
Business
Rules
System
Rule
Vault
Structure
transforma.on
Business
rule
execu.on
Hub
=
business
keys
Structure
and
value
transforma.on
Adaptable
Sustainable
Compliant
Decoupled
Effec.veness
Standardized
Centralized
?
?
36
R.D.Damhof
37. Data Vault Application Architecture
§ Central EDW
§ Business rules downstream
§ Incremental/Non destructive Loading
§ 100% of the data (within scope) 100% of the time
§ Auditable/Partly source driven
R.D.Damhof
68. Agility Data Vault - recap (1)
Why is it that you can build and deploy extremely small particles in
Data Vault and not in other approaches, without having an increase
in the overhead and coordination of these particles? In other
words; 'Divide and Conquer to beat the Size / Complexity Dynamic’
Why is it that you can re-engineer your existing
model and guarantee that the changes remain local?
Something that is hugely beneficial in data
warehouses that - by definition - grow over time.
Why is it that - as your (Data Vault based) data
warehouse grows - your costs grow ‘merely’ in linear
fashion initially, and as you approach the end state
marginal growth in cost decreases exponentially.
R.D.Damhof
69. Agility Data Vault - recap (2)
Remember the Push characteristics
➡ Mass production
Data Vault
➡ Known specifications, operational definitions, standards
Data Vault
➡ Repeatable, predictable, even better; uniform process
Data Vault
➡ Part of the system that needs statistical control
Data Vault
➡ Inventory allowed/necessary
Data Vault
➡ Mainly supply driven
Data Vault
➡ Reliability over flexibility
Data Vault
Automation of a Data Vault ‘production process’ is just common sense
R.D.Damhof
70. Bonus Slides
Forks and mutations in DV ‘evolution’
R.D.Damhof
71. Type 1 - Classic Data Vault
Business
Transac.on
System
Staging
Data
Vault
Datasets
Out
Business
Transac.on
Generic
Business
Rules
System
Rule
Vault
Structure
transforma.on
Business
rule
execu.on
Hub
=
business
keys
Structure
and
value
transforma.on
Adaptable
Sustainable
Compliant
Decoupled
Effec.veness
Standardized
Centralized
?
?
71
R.D.Damhof
72. Type 2 - Source Data Vault
Business
Transac.on
Staging
Vault
System
Business
Data
Marts
Data
Vault
Business
Transac.on
Staging
Vault
System
Structure
transforma.on
Business
rule
execu.on
Structure
transforma.on
No
integra.on,
Hub=surrogate
keys
Integra.on
Persis.ng
staging
in
DV
format
DV
modelled
Adaptable
Sustainable
Compliant
Decoupled
Effec.veness
Standardized
Centralized
?
?
?
72
R.D.Damhof
73. Source
Source
100%
Seman.c
gap
Source
Staging
DV
Business
DV
Source
Staging
DV
100%
Seman.c
gap
S.ll
the
source
Integra.on,
cleansing,
consolida.on
Business
rule
execu.on
upstream
??
DV
modelled
73
R.D.Damhof
74. Source
Source
100%
Seman.c
gap
Source
Source
Staging
DV
Business
DV
Data
Warehouse
Source
Source
Staging
DV
100%
Seman.c
gap
S.ll
the
source
Integra.on,
cleansing,
consolida.on
Business
rule
execu.on
upstream
??
DV
modelled
74
R.D.Damhof
75. Wanna know more?
§ Training certification: www.geneseeacademy.com
§ Books: ‘Super Charge Your Data Warehouse: Invaluable Data
Modeling Rules to Implement Your Data Vault’ – D.Linstedt /
K.Graziano
§ Linkedin: Data Vault Discussions (approx. 800 members)
§ Niche non-commercial conferences; www.dwhautomation.com
§ Many blogs, articles, presentations on the World Wide Web
§ The best way to learn; try it, make some code, experience, engage
R.D.Damhof
76. Thank You
Drs.
Ronald
D.
Damhof
Blog
hXp://prudenza.typepad.com/
hXp://www.b-‐eye-‐network.com/blogs/damhof/
Linkedin
hXp://nl.linkedin.com/in/ronalddamhof
Email
ronald.damhof@prudenza.nl
TwiXer
RonaldDamhof
Skype
Ronald.Damhof
Mobile
+31(0)6
269
67
184
Others
Informa.on
Quality
Cer.fied
Professional
(IQCP)
Data
Vault
Cer.fied
Grand
Master
Cer.fied
Scrum
Master
Member
of
the
Boulder
BI
Brain
Trust
(#BBBT)
Ronald
Damhof
is
an
independent
prac..oner
in
the
field
of
data
management
and
decision
support.
Graduated
in
1995
in
the
study
of
Economics.
Since
1995
he
worked
as
a
prac..oner
into
the
field
of
Informa.on
Management
with
a
focus
on
decision
support
and
data
management,
trying
hard
to
enhance
the
rigor
and
relevance
in
these
fields
by
combining
scien.fic
research
with
the
everyday
challenges
of
the
prac..oner.
Ronald
is
mainly
hired
by
customers
in
the
role
of
business/IT
architect,
auditor,
coach
trainer.
He
blogs
on
B-‐Eye-‐Network.com
as
well
as
his
own
blog,
is
a
member
of
the
pres.gious
BBBT,
wrote
several
ar.cles
regarding
decision
support
architectures
and
is
a
researcher
in
the
field
of
Informa.on
Management.
Although
Ronald
likes
to
work
with
theore.cal
grounded
research
and
proven
prac.ces,
Ronald
is
not
a
'white
paper'
architect;
put
your
money
where
your
mouth
is,
is
his
moXo.
He
likes
to
see
architectures
'live'
in
enterprises,
not
just
write
about
it.
In
most
organiza.ons
his
role
extends
architecture
onen.
In
truely
agile
spirit
the
roles
he
plays
depend
on
the
context
of
the
client;
he
can
be
a
missionary
(selling
the
value),
a
project
manager
(geong
it
done),
a
scrum
master
(removing
impediments),
specialist
(educa.ng
hardware
peeps,
data
architects,
data
logis.cs
etc.)
or
a
leader.
76
R.D.Damhof