SlideShare uma empresa Scribd logo
1 de 28
Data Vault
Consortium
Presentation by @dougneedham
Who are we?
CLEAR MEASURES offers a range of services and solutions designed to
satisfy needs shared by firms large and small; and the skills required to make
your customized goals a reality. If your goals aren’t yet defined, CLEAR
MEASURES can help you define a strategy for managing, analyzing, or
visualizing your data in ways that make your path easier to identify.
• Analytics and Intelligence
• Data Integration
• Enterprise Architecture
• Strategic & Project Management
• Cloud Infrastructure
• Database Administration
• System Administration
• Technology Services
Who are we?
All our customers have access to:
Capacity
Pay on demand, with 15 minute increments, not the half-day or full-day you pay
for a contractor.
Coverage
True 24 X 7 Coverage, with in-facility staff directed from our Global Operations
Center in Covington, Kentucky.
Cost
CLEAR MEASURES can help your team with effective costs from Rural Sourcing
and Global Sourcing locations. CLEAR MEASURES proprietary ONguard system
allows for complete direction of a global workforce with U.S. oversight, focused on
efficiency and repeatability.
Who am I?
• The Data Guy
• 1st job was Marine Corps DBA supporting the Entire Marine
Corps at the main site for Systems Software Evaluation.
• First 10 years of my career DBA.
• 20 years of data management.
• Most recent decade building analytical systems.
• Pentaho, Informatica, Business Objects, Cognos, Oracle, SQL
Server, MySQL.
• Cloud based Analytics with a large healthcare information
company on Cassandra.
• Trying to figure out where Data Science, and Big Data fit
together with the Data Warehouse.
This is the wrong time for
Data Science
• It is also the wrong time for a Data Warehouse, Business
Intelligence Platform, Data Vault, Data Mining, Big Data, or any
other predictive, machine learning, analytics platform.
• Do these projects when things are going well. Anticipate what
could happen to prevent things from going poorly.
When is the right time?
• If you have multiple systems you need to integrate.
• As you lay the foundation for Self Service Business
Intelligence.
• To lay the foundation of Data as a service application.
• If you are combining data from many applications,
systems, or business units, or you are providing data to
many applications, systems, or business units that want
data provided to them in slightly different standard feeds.
Data Science and The Data
Warehouse
• “Data Science is the application of statistical and
mathematical rigor to business data.” Doug
• I have heard it said 80% of data science is data munging.
• Data Vault is: “100% of the data 100% of the time” – Dan
L.
• What does this mean?
• What does the data say? Where did the data come from?
What happened to the data from the time it was captured
until the time it was presented?
• Models, Statistical Models specifically, are the core of
Data Science.
• Looking forward to hearing more about DV 2.0 and how it
supports Polyglot persistence.
Data Science and The Data
Warehouse
• By the way, we have been doing this for a while.
• Some data is predictive, All data is instructive.
• Being able to create a statistical model, quickly run lots
of data through that statistical model, observe the actual
results and compare these with predicted results allows
us to refine the statistical model.
• Are Business analysts Data Scientists? What is the main
differential between the two?
• Which one “needs” more data? Which one can actually
use more data?
Quick Trivia
• Who was one of the first Data Scientist?
•
• Now let’s talk about storing all of this data we collect, and
see if there is anything new with our understanding of the
structures we are all familiar with.
Data Vault
• The integration layer of an overall data warehouse strategy.
• There are other areas of data warehousing.
• Presentation
• Near-Line
• Archive
• Applications within the enterprise are the data capture
mechanisms.
• I think everyone is trying to find the best way to leverage a “Big
Data” platform into the world of the Data Warehouse.
• Data vault is the mechanism that allows a data warehouse to
evolve over time.
• Simple, straightforward, repeatable, auditable, resilient.
Modeling
• HUBs – Business Keys
• LNKs -Relationships
• SATs – Contextual data.
• There are other entities of the Data Vault
methods, however, these are the primary entities.
Everything else is functionally dependent on some
combination of the above.
• Notice the colors, Hubs one color, Links another, Sats a
third. Anything else should be a separate color.
HUB
• Business Keys.
• Isolated entities that can stand alone representing a list
of unique business keys.
• The collection of business keys for an organization is the
answer to the question, “What do we do?”
• Which business key is most important?
• How many edges does it have?
LNK
• Relationships.
• Isolated entities that can stand alone representing a list
of unique business keys.
• The collection of relationships for an organization is the
answer to the question, “At what time does whom do
what to whom or what?”
• Links are actually very interesting in their own right. We
will be speaking further about links specifically a little
later in this session.
•
LNK
• How many edges does a link have? The number of
incoming edges a Link table has is the number of
HUB_SQNs the link is connecting (This includes weak
hubs).
• Outgoing Edges are the number of Satellites connected
to this Link table.
• What is the ratio of OE/IE?
Research in progress
•
Details
•
Now What?
• Now that I have these numbers, what do I do with them?
• This is one way to confirm the accuracy of the
sequencing of your business keys in a link, in order to
separate out the driver business key from the dependent
keys?
• Are there any other links in the Data Vault that have a
similar Cosine?
Now What?
• If you have cosine similarity between links does this
mean something?
• What is going on in the business? Is it obvious the links
are related?
• More importantly, is it not obvious why two links are
similar within a margin of error?
SAT
• Contextual data.
• Detail data. Most pertinent for loading use in downstream
systems.
• The “Payload” of the satellite is the data you want to
capture.
• The collection of business keys for an organization is the
answer to the question “What do we do?”
• Has one edge.
Satellite Clustering
• Using some simple k-means clustering with Euclidean
distance calculations you can identify divergent rates of
change within a satellite.
• This is one way to divaricate satellites coming from a
single source table.
• If you are interested in knowing more about this, let me
know.
Philosophies
• From Dan: “100% of the data 100% of the time”
• From Doug: “A model is not valid, until 100% of the
model is populated from source systems.”
• Notice I did not say 100% of the data as Dan did.
• During development, the assumptions built into the
model have to be validated.
• Designing a proper data vault model does not take very
long for those versed in its abilities. Loading the model to
validate the assumptions built into the model is
paramount to success.
Philosophies
• The second portion of this philosophy is to extract data
from the Vault to an alternative system, be that star
schema, statistical research, data science, excel, etc.
Something Downstream needs to be populated FROM
the vault
• In order to know you have a valid model, data must both
go in and come out accurately according to business
rules.
• This must be done in order to say a particular phase of
the development cycle is complete.
• What does complete mean? It means this is the end of
the beginning. Welcome to the world of Data Warehouse
support, maintenance and evolution.
Aesthetics
• One of the most fascinating things about a data vault
model - to me - is that it flows quite aesthetically in
accordance with the particular business processes the
data vault is attempting to model.
• It just makes sense to a variety of users, from technical
to executive.
• The following slide is an example of this, where we are
modeling a process and something surprising came out
of the modelling exercise.
What do I mean by
Aesthetics?
• Can you do this with another data modeling technique?
Architecture
• A data architect understands applications are only the
entry point of data into the Enterprise. Data Science
makes data forever useful.
Volumetrics
•
Summary
• One of the main reasons Architects are constantly studying
designs is they are continuously looking for ways not just
to create something new, but to reduce new problems to
ones already solved. The same thing can be said for
Mathematicians, Engineers, Physicists, even managers
and executives.
• The Data Vault is a repeatable pattern for database design
when that database is to be used for integration of multiple
systems. There are many other uses for Data Vault, of
course, but this is the first principle of why the data vault
exists.
• As we learn from prior implementations, be they our
own, or from someone else, let us continuously strive to
not only reduce problems to those already solved but look
for, and discuss these repeatable patterns of Data Vault
design.
Final thoughts
• With the Data Vault, the structure itself has meaning.
• This is a feature that I believe is unique to Data Vault
modeling.
• Our email contact information:
• dneedham@clearmeasures.com
• pdokouzov@clearmeasures.com

Mais conteúdo relacionado

Destaque

Indiabulls one gurgaon sector 104 99997.44778 indiabulls gurgaon sector 104 d...
Indiabulls one gurgaon sector 104 99997.44778 indiabulls gurgaon sector 104 d...Indiabulls one gurgaon sector 104 99997.44778 indiabulls gurgaon sector 104 d...
Indiabulls one gurgaon sector 104 99997.44778 indiabulls gurgaon sector 104 d...
sachivchawla
 
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
sachivchawla
 
Master sheng yen
Master sheng yen Master sheng yen
Master sheng yen
walkmankim
 
Mission vishvas-resume template-16
Mission vishvas-resume template-16Mission vishvas-resume template-16
Mission vishvas-resume template-16
vishvas786
 
The need for Water Sanitaion
The need for Water SanitaionThe need for Water Sanitaion
The need for Water Sanitaion
Ewan Cameron
 
Promotional Packages Analysis: Trailer, Poster, Magazine cover
Promotional Packages Analysis: Trailer, Poster, Magazine coverPromotional Packages Analysis: Trailer, Poster, Magazine cover
Promotional Packages Analysis: Trailer, Poster, Magazine cover
Charlotte Bowerman
 
bezopasnost v internete
bezopasnost v internetebezopasnost v internete
bezopasnost v internete
mdou_142
 

Destaque (19)

中阿含經
中阿含經中阿含經
中阿含經
 
Phan quyen
Phan quyenPhan quyen
Phan quyen
 
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
 
Jorge anm
Jorge anmJorge anm
Jorge anm
 
французська кухня
французська кухняфранцузська кухня
французська кухня
 
Indiabulls one gurgaon sector 104 99997.44778 indiabulls gurgaon sector 104 d...
Indiabulls one gurgaon sector 104 99997.44778 indiabulls gurgaon sector 104 d...Indiabulls one gurgaon sector 104 99997.44778 indiabulls gurgaon sector 104 d...
Indiabulls one gurgaon sector 104 99997.44778 indiabulls gurgaon sector 104 d...
 
Escuela Blean
Escuela BleanEscuela Blean
Escuela Blean
 
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
One indiabulls gurgaon 9999744778 sachiv indiabulls one gurgaon sector 104 in...
 
Theravada Buddhism morning chanting
Theravada Buddhism morning chantingTheravada Buddhism morning chanting
Theravada Buddhism morning chanting
 
Master sheng yen
Master sheng yen Master sheng yen
Master sheng yen
 
Mission vishvas-resume template-16
Mission vishvas-resume template-16Mission vishvas-resume template-16
Mission vishvas-resume template-16
 
One indiabulls gurgaon 99997.44778 indiabulls one gurgaon sector 104 dwarka e...
One indiabulls gurgaon 99997.44778 indiabulls one gurgaon sector 104 dwarka e...One indiabulls gurgaon 99997.44778 indiabulls one gurgaon sector 104 dwarka e...
One indiabulls gurgaon 99997.44778 indiabulls one gurgaon sector 104 dwarka e...
 
The need for Water Sanitaion
The need for Water SanitaionThe need for Water Sanitaion
The need for Water Sanitaion
 
Research summary
Research summaryResearch summary
Research summary
 
索达吉堪布《什么是密宗》
索达吉堪布《什么是密宗》索达吉堪布《什么是密宗》
索达吉堪布《什么是密宗》
 
RING panel discussion, Coling 2010 ( E. Hovy + M. Zock)
RING panel discussion, Coling 2010 ( E. Hovy + M. Zock)RING panel discussion, Coling 2010 ( E. Hovy + M. Zock)
RING panel discussion, Coling 2010 ( E. Hovy + M. Zock)
 
Baud rate is the number of change in signal
Baud rate is the number of change in signalBaud rate is the number of change in signal
Baud rate is the number of change in signal
 
Promotional Packages Analysis: Trailer, Poster, Magazine cover
Promotional Packages Analysis: Trailer, Poster, Magazine coverPromotional Packages Analysis: Trailer, Poster, Magazine cover
Promotional Packages Analysis: Trailer, Poster, Magazine cover
 
bezopasnost v internete
bezopasnost v internetebezopasnost v internete
bezopasnost v internete
 

Mais de Doug Needham (6)

Data Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZoneData Structure Graph DMZ #DMZone
Data Structure Graph DMZ #DMZone
 
Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview. Social Network Analysis Introduction including Data Structure Graph overview.
Social Network Analysis Introduction including Data Structure Graph overview.
 
Apache Spark GraphX highlights.
Apache Spark GraphX highlights. Apache Spark GraphX highlights.
Apache Spark GraphX highlights.
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Cloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug NeedhamCloudera Data Science Challenge 3 Solution by Doug Needham
Cloudera Data Science Challenge 3 Solution by Doug Needham
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Data Vault Consortium A Mathematical Perspective of Data Vault.

  • 2. Who are we? CLEAR MEASURES offers a range of services and solutions designed to satisfy needs shared by firms large and small; and the skills required to make your customized goals a reality. If your goals aren’t yet defined, CLEAR MEASURES can help you define a strategy for managing, analyzing, or visualizing your data in ways that make your path easier to identify. • Analytics and Intelligence • Data Integration • Enterprise Architecture • Strategic & Project Management • Cloud Infrastructure • Database Administration • System Administration • Technology Services
  • 3. Who are we? All our customers have access to: Capacity Pay on demand, with 15 minute increments, not the half-day or full-day you pay for a contractor. Coverage True 24 X 7 Coverage, with in-facility staff directed from our Global Operations Center in Covington, Kentucky. Cost CLEAR MEASURES can help your team with effective costs from Rural Sourcing and Global Sourcing locations. CLEAR MEASURES proprietary ONguard system allows for complete direction of a global workforce with U.S. oversight, focused on efficiency and repeatability.
  • 4. Who am I? • The Data Guy • 1st job was Marine Corps DBA supporting the Entire Marine Corps at the main site for Systems Software Evaluation. • First 10 years of my career DBA. • 20 years of data management. • Most recent decade building analytical systems. • Pentaho, Informatica, Business Objects, Cognos, Oracle, SQL Server, MySQL. • Cloud based Analytics with a large healthcare information company on Cassandra. • Trying to figure out where Data Science, and Big Data fit together with the Data Warehouse.
  • 5. This is the wrong time for Data Science • It is also the wrong time for a Data Warehouse, Business Intelligence Platform, Data Vault, Data Mining, Big Data, or any other predictive, machine learning, analytics platform. • Do these projects when things are going well. Anticipate what could happen to prevent things from going poorly.
  • 6. When is the right time? • If you have multiple systems you need to integrate. • As you lay the foundation for Self Service Business Intelligence. • To lay the foundation of Data as a service application. • If you are combining data from many applications, systems, or business units, or you are providing data to many applications, systems, or business units that want data provided to them in slightly different standard feeds.
  • 7. Data Science and The Data Warehouse • “Data Science is the application of statistical and mathematical rigor to business data.” Doug • I have heard it said 80% of data science is data munging. • Data Vault is: “100% of the data 100% of the time” – Dan L. • What does this mean? • What does the data say? Where did the data come from? What happened to the data from the time it was captured until the time it was presented? • Models, Statistical Models specifically, are the core of Data Science. • Looking forward to hearing more about DV 2.0 and how it supports Polyglot persistence.
  • 8. Data Science and The Data Warehouse • By the way, we have been doing this for a while. • Some data is predictive, All data is instructive. • Being able to create a statistical model, quickly run lots of data through that statistical model, observe the actual results and compare these with predicted results allows us to refine the statistical model. • Are Business analysts Data Scientists? What is the main differential between the two? • Which one “needs” more data? Which one can actually use more data?
  • 9. Quick Trivia • Who was one of the first Data Scientist? • • Now let’s talk about storing all of this data we collect, and see if there is anything new with our understanding of the structures we are all familiar with.
  • 10. Data Vault • The integration layer of an overall data warehouse strategy. • There are other areas of data warehousing. • Presentation • Near-Line • Archive • Applications within the enterprise are the data capture mechanisms. • I think everyone is trying to find the best way to leverage a “Big Data” platform into the world of the Data Warehouse. • Data vault is the mechanism that allows a data warehouse to evolve over time. • Simple, straightforward, repeatable, auditable, resilient.
  • 11. Modeling • HUBs – Business Keys • LNKs -Relationships • SATs – Contextual data. • There are other entities of the Data Vault methods, however, these are the primary entities. Everything else is functionally dependent on some combination of the above. • Notice the colors, Hubs one color, Links another, Sats a third. Anything else should be a separate color.
  • 12. HUB • Business Keys. • Isolated entities that can stand alone representing a list of unique business keys. • The collection of business keys for an organization is the answer to the question, “What do we do?” • Which business key is most important? • How many edges does it have?
  • 13. LNK • Relationships. • Isolated entities that can stand alone representing a list of unique business keys. • The collection of relationships for an organization is the answer to the question, “At what time does whom do what to whom or what?” • Links are actually very interesting in their own right. We will be speaking further about links specifically a little later in this session. •
  • 14. LNK • How many edges does a link have? The number of incoming edges a Link table has is the number of HUB_SQNs the link is connecting (This includes weak hubs). • Outgoing Edges are the number of Satellites connected to this Link table. • What is the ratio of OE/IE?
  • 17. Now What? • Now that I have these numbers, what do I do with them? • This is one way to confirm the accuracy of the sequencing of your business keys in a link, in order to separate out the driver business key from the dependent keys? • Are there any other links in the Data Vault that have a similar Cosine?
  • 18. Now What? • If you have cosine similarity between links does this mean something? • What is going on in the business? Is it obvious the links are related? • More importantly, is it not obvious why two links are similar within a margin of error?
  • 19. SAT • Contextual data. • Detail data. Most pertinent for loading use in downstream systems. • The “Payload” of the satellite is the data you want to capture. • The collection of business keys for an organization is the answer to the question “What do we do?” • Has one edge.
  • 20. Satellite Clustering • Using some simple k-means clustering with Euclidean distance calculations you can identify divergent rates of change within a satellite. • This is one way to divaricate satellites coming from a single source table. • If you are interested in knowing more about this, let me know.
  • 21. Philosophies • From Dan: “100% of the data 100% of the time” • From Doug: “A model is not valid, until 100% of the model is populated from source systems.” • Notice I did not say 100% of the data as Dan did. • During development, the assumptions built into the model have to be validated. • Designing a proper data vault model does not take very long for those versed in its abilities. Loading the model to validate the assumptions built into the model is paramount to success.
  • 22. Philosophies • The second portion of this philosophy is to extract data from the Vault to an alternative system, be that star schema, statistical research, data science, excel, etc. Something Downstream needs to be populated FROM the vault • In order to know you have a valid model, data must both go in and come out accurately according to business rules. • This must be done in order to say a particular phase of the development cycle is complete. • What does complete mean? It means this is the end of the beginning. Welcome to the world of Data Warehouse support, maintenance and evolution.
  • 23. Aesthetics • One of the most fascinating things about a data vault model - to me - is that it flows quite aesthetically in accordance with the particular business processes the data vault is attempting to model. • It just makes sense to a variety of users, from technical to executive. • The following slide is an example of this, where we are modeling a process and something surprising came out of the modelling exercise.
  • 24. What do I mean by Aesthetics? • Can you do this with another data modeling technique?
  • 25. Architecture • A data architect understands applications are only the entry point of data into the Enterprise. Data Science makes data forever useful.
  • 27. Summary • One of the main reasons Architects are constantly studying designs is they are continuously looking for ways not just to create something new, but to reduce new problems to ones already solved. The same thing can be said for Mathematicians, Engineers, Physicists, even managers and executives. • The Data Vault is a repeatable pattern for database design when that database is to be used for integration of multiple systems. There are many other uses for Data Vault, of course, but this is the first principle of why the data vault exists. • As we learn from prior implementations, be they our own, or from someone else, let us continuously strive to not only reduce problems to those already solved but look for, and discuss these repeatable patterns of Data Vault design.
  • 28. Final thoughts • With the Data Vault, the structure itself has meaning. • This is a feature that I believe is unique to Data Vault modeling. • Our email contact information: • dneedham@clearmeasures.com • pdokouzov@clearmeasures.com

Notas do Editor

  1. Keplerwas the first Data Scientist because Brahe had collected and stored many years of observations (data), yet he had no way of interpreting it accurately until Kepler studied the data and came up with is laws for planetary motions. Kepler came up with an accurate model that not only explained the observations of Brahe, but also predicted future observations.
  2. Keplerwas the first Data Scientist because Brahe had collected and stored many years of observations (data), yet he had no way of interpreting it accurately until Kepler studied the data and came up with is laws for planetary motions. Kepler came up with an accurate model that not only explained the observations of Brahe, but also predicted future observations.