O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Data Vault Overview

8.727 visualizações

Publicada em

Data Vault Modeling and Methodology introduction that I provided to a Montreal event in September 2011. It covers an introduction and overview of the Data Vault components for Business Intelligence and Data Warehousing. I am Dan Linstedt, the author and inventor of Data Vault Modeling and methodology.

If you use the images anywhere in your presentations, please credit http://LearnDataVault.com as the source (me).

Thank-you kindly,
Daniel Linstedt

Publicada em: Negócios, Tecnologia
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Data Vault Overview

  1. 1. Data Vault Model &Methodology<br />© Dan Linstedt, 2011-2012 all rights reserved<br />1<br />
  2. 2. Agenda<br />Introduction – why are you here?<br />What is a Data Vault? Where does it come from?<br />Star Schema, 3nf, and Data Vault pros and cons AS AN EDW solution..<br />When is a Data Vault a good fit?<br />Benefits of Data Vault Modeling & Methodology<br /><BREAK><br />When to NOT use a Data Vault<br />Fundamental Paradigm Shift<br />Business Keys & Business Processes<br />Technical Review<br />Query Performance (PIT & Bridge)<br />What wasn’t covered in this presentation…<br />2<br />
  3. 3. A bit about me…<br />3<br />Author, Inventor, Speaker – and part time photographer…<br />25+ years in the IT industry<br />Worked in DoD, US Gov’t, Fortune 50, and so on…<br />Find out more about the Data Vault:<br />http://www.youtube.com/LearnDataVault<br />http://LearnDataVault.com<br />Full profile on http://www.LinkedIn.com/dlinstedt<br />
  4. 4. Why Are YOU Here?<br />4<br />Your Expectations?<br />Your Questions?<br />Your Background?<br />Areas of Interest?<br />Biggest question:<br />What are the top 3 pains your current EDW / BI solution is experiencing?<br />
  5. 5. What is it?Where did it come from? <br />Defining the Data Vault Space<br />5<br />
  6. 6. Data Vault Time Line<br />E.F. Codd invented relational modeling<br />1976 Dr Peter Chen<br />Created E-R Diagramming<br />1990 – Dan Linstedt Begins R&D on Data Vault Modeling<br />Chris Date and Hugh Darwen Maintained and Refined Modeling<br />Mid 70’s AC Nielsen <br />Popularized<br />Dimension & Fact Terms<br />1970<br />2000<br />1960<br />1980<br />1990<br />Late 80’s – Barry Devlin and Dr Kimball Release “Business Data Warehouse”<br />Early 70’s Bill Inmon Began Discussing Data Warehousing<br />Mid 80’s Bill Inmon<br />Popularizes Data Warehousing<br />Mid 60’s Dimension & Fact Modeling presented by General Mills and Dartmouth University<br />2000 – Dan Linstedt releases first 5 articles on Data Vault Modeling<br />Mid – Late 80’s Dr Kimball <br />Popularizes Star Schema<br />6<br />
  7. 7. Data Vault Modeling…<br />Took 10 years of Research and Design, including TESTING <br />to become <br />flexible, consistent, and scalable<br />7<br />
  8. 8. What IS a Data Vault? (Business Definition)<br />Data Vault Model<br />Detail oriented<br />Historical traceability<br />Uniquely linked set of normalized tables<br />Supports one or more functional areas of business<br />8<br /><ul><li>Data Vault Methodology
  9. 9. CMMI, Project Plan
  10. 10. Risk, Governance, Versioning
  11. 11. Peer Reviews, Release Cycles
  12. 12. Repeatable, Consistent, Optimized
  13. 13. Complete with Best Practices for BI/DW</li></ul>Business Keys<br />Span / Cross<br />Lines of Business<br />Sales<br />Contracts<br />Planning<br />Delivery<br />Finance<br />Operations<br />Procurement<br />Functional Area<br />
  14. 14. The Data Vault Model<br />The Data Vault model is a data modeling approach<br /> …so it fits into the family of modeling approaches:<br />3rd Normal Form<br />Data Vault<br />Star Schema<br /><ul><li>While 3rd Normal Formis optimal for Operational Systems</li></ul> …andStar Schema is optimal for OLAP Delivery / Data Marts<br /> …the Data Vault is optimal for the Data Warehouse (EDW)<br />9<br />
  15. 15. Supply Chain Analogy<br />10<br />Source <br />Systems<br />Data Vault<br />(EDW)<br />Data Marts<br />
  16. 16. What Does One Look Like?<br />Records a history of the interaction<br />Customer<br />Product<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Link<br />Customer<br />Product<br />F(x)<br />F(x)<br />F(x)<br />Sat<br />Sat<br />Sat<br />Sat<br />Order<br />F(x)<br />Sat<br />Order<br />Elements:<br /><ul><li>Hub
  17. 17. Link
  18. 18. Satellite</li></ul>11<br />Hub = List of Unique Business Keys<br />Link = List of Relationships, Associations<br />Satellites = Descriptive Data<br />
  19. 19. Colorized Perspective…<br />Data Vault<br />3rd NF & Star Schema<br />(separation)<br />Business Keys<br />Associations<br />Details<br />HUB<br />Satellite<br />The Data Vault uniquely separates the Business Keys (Hubs) from the Associations (Links) and both of these from the Detailsthat describe them and provide context (Satellites). <br />LINK<br />Satellite<br />(Colors Concept Originated By: Hans Hultgren)<br />12<br />
  20. 20. Star Schemas, 3NF, Data Vault:Pros & Cons<br />Defining the Data Vault Space<br />Why NOT use Star Schemas as an EDW?<br />Why NOT use 3NF as an EDW?<br />Why NOT use Data Vault as a Data Delivery Model?<br />13<br />
  21. 21. Star Schema Pros/Cons as an EDW<br />PROS<br />Good for multi-dimensional analysis<br />Subject oriented answers<br />Excellent for aggregation points<br />Rapid development / deployment<br />Great for some historical storage<br />CONS<br />Not cross-business functional<br />Use of junk / helper tables<br />Trouble with VLDW<br />Unable to provide integrated enterprise information<br />Can’t handle ODS or exploration warehouse requirements<br />Trouble with data explosion in near-real-time environments<br />Trouble with updates to type 2 dimension primary keys<br />Trouble with late arriving data in dimensions to support real-time arriving transactions<br />Not granular enough information to support real-time data integration<br />14<br />
  22. 22. 3nf Pros/Cons as an EDW<br />PROS<br />Many to many linkages<br />Handle lots of information<br />Tightly integrated information<br />Highly structured<br />Conducive to near-real time loads<br />Relatively easy to extend<br />CONS<br />Time driven PK issues<br />Parent-child complexities<br />Cascading change impacts<br />Difficult to load<br />Not conducive to BI tools<br />Not conducive to drill-down<br />Difficult to architect for an enterprise<br />Not conducive to spiral/scope controlled implementation<br />Physical design usually doesn’t follow business processes<br />15<br />
  23. 23. Data Vault Pros/Cons as an EDW<br />CONS<br />Not conducive to OLAP processing<br />Requires business analysis to be firm<br />Introduces many join operations<br />PROS<br />Supports near-real time and batch feeds<br />Supports functional business linking<br />Extensible / flexible<br />Provides rapid build / delivery of star schema’s<br />Supports VLDB / VLDW<br />Designed for EDW<br />Supports data mining and AI<br />Provides granular detail<br />Incrementally built<br />16<br />
  24. 24. Analogy: The Porsche, the SUV and the Big Rig<br />Which would you use to win a race?<br />Which would you use to move a house?<br />Would you adapt the truck and enter a race with Porches and expect to win?<br />17<br />
  25. 25. A Quick Look at Methodology Issues<br />Business Rule Processing, Lack of Agility, and <br />Future proofing your new solution<br />18<br />
  26. 26. EDW Architecture: Generation 1<br />19<br />Enterprise BI Solution<br />Sales<br />(batch)<br />Staging<br />(EDW)<br />Star<br />Schemas<br />Complex <br />Business <br />Rules #2<br />Finance<br />Conformed Dimensions<br />Junk Tables<br />Helper Tables<br />Factless Facts<br />Staging + History<br />Complex<br />Business <br />Rules<br />+Dependencies<br />Contracts<br /><ul><li>Quality routines
  27. 27. Cross-system dependencies
  28. 28. Source data filtering
  29. 29. In-process data manipulation
  30. 30. High risk of incorrect data aggregation
  31. 31. Larger system = increased impact
  32. 32. Often re-engineered at the SOURCE
  33. 33. History can be destroyed (completely re-computed)</li></li></ul><li>#1 Cause of BI Initiative Failure<br />20<br />Anyone?<br />Re-Engineering<br />For<br />Every Change!<br />Let’s take a look at one example…<br />
  34. 34. Re-Engineering<br />Business<br />Rules<br />Data Flow (Mapping)<br />Current Sources<br />Sales<br />Customer<br />Source<br />Join<br />Finance<br />Customer<br />Transactions<br />Customer<br />Purchases<br />IMPACT!!<br />** NEW SYSTEM**<br />21<br />
  35. 35. Federated Star Schema Inhibiting Agility<br />Data Mart 3<br />High<br />Effort<br />& Cost<br />Data Mart 2<br />Data Mart 1<br />Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over time<br />RESULT: Business builds their own Data Marts!<br />Low<br />Maintenance<br />Cycle Begins<br />Time<br />Start<br />22<br />The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort. This increases delivery time, difficulty, and maintenance costs.<br />
  36. 36. EDW Architecture: Generation 2<br />SOA<br />Enterprise BI Solution<br />Star<br />Schemas<br />(real-time)<br />Sales<br />(batch)<br />EDW<br />(Data Vault)<br />(batch)<br />Staging<br />Error<br />Marts<br />Finance<br />Contracts<br />Complex<br />Business <br />Rules<br />Report<br />Collections<br />Unstructured<br />Data<br />FUNDAMENTAL GOALS<br /><ul><li>Repeatable
  37. 37. Consistent
  38. 38. Fault-tolerant
  39. 39. Supports phased release
  40. 40. Scalable
  41. 41. Auditable</li></ul>The business rules are moved closer to the business, improving IT reaction time, reducing cost and minimizing impacts to the enterprise data warehouse (EDW)<br />23<br />
  42. 42. NO Re-Engineering<br />Current Sources<br />Data Vault<br />Sales<br />Stage<br />Copy<br />Hub<br />Customer<br />Customer<br />Finance<br />Stage<br />Copy<br />Link Transaction<br />Customer<br />Transactions<br />Hub<br />Acct<br />Hub<br />Product<br />Customer<br />Purchases<br />Stage<br />Copy<br />NO IMPACT!!!<br />NO RE-ENGINEERING!<br />** NEW SYSTEM**<br />IMPACT!!<br />24<br />
  43. 43. Progressive Agility and Responsiveness of IT<br />High<br />Effort<br />& Cost<br />Low<br />Maintenance<br />Cycle Begins<br />Time<br />Start<br />25<br />Foundational Base Built<br />New Functional Areas Added<br />Initial DV Build Out<br />Re-Engineering does NOT occur with a Data Vault Model. This keeps costs down, and maintenance easy. It also reduces complexity of the existing architecture.<br />
  44. 44. What’s Wrong With the OLD METHODOLOGY?<br />Using Star Schemas as your Data Warehouse leads to….<br />26<br />
  45. 45. Dimensionitis<br />DimensionItis: Incurable Disease, the symptoms are the creation of new dimensions because the cost and time to conform existing dimensions with new attributes rises beyond the business ability to pay…<br />27<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />Business Says: <br />Avoid the re-engineering costs, just “copy” the dimensions and create a new one for OUR department… <br />What can it hurt?<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………... …………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />…………………...<br />
  46. 46. Deformed Dimensions<br />Deformity: The URGE to continue “slamming data” into an existing conformed dimension until it simply cannot sustain any further changes, the result: a deformed dimension and a HUGE re-engineering cost / nightmare.<br />28<br />Business Wants a Change!<br />Business said: Just add that to the existing Dimension, it will be easy right?<br />Business Change<br />Business Change<br />V1<br />Business Change<br />V2<br />…………………<br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />Complex<br />Load<br />V3<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………<br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />………………… <br />…………………<br />………………… <br />Complex<br />Load<br />Complex<br />Load<br />90 days, $125k<br />120 days, $200k<br />Re-Engineering the <br />Load Processes EACH TIME!<br />180 days, $275k<br />
  47. 47. Silo Building / IT Non-Agility<br />Business Says: Take the dimension you have, copy it, and change it… This should be cheap, and easy right?<br />29<br />SALES<br />Business Change<br />To Modify Existing Star = <br />180 days, $275k<br />We built our own because IT costs too much…<br />First Star<br />FINANCE<br />Customer_ID<br />Customer_Name<br />Customer_Addr<br />Customer_Addr1<br />Customer_City<br />Customer_State<br />Customer_Zip<br />Customer_Phone<br />Customer_Tag<br />Customer_Score<br />Customer_Region<br />Customer_Stats<br />Customer_Phone<br />Customer_Type<br />Customer_ID<br />Customer_Name<br />Customer_Addr<br />Customer_Addr1<br />Customer_City<br />Customer_State<br />Customer_Zip<br />Customer_Phone<br />Customer_Tag<br />Customer_Score<br />Customer_Region<br />Customer_Stats<br />Customer_Phone<br />Customer_Type<br />We built our own because IT took too long…<br />Customer_ID<br />Customer_Name<br />Customer_Addr<br />Customer_Addr1<br />Customer_City<br />Customer_State<br />Customer_Zip<br />Customer_Phone<br />Fact_ABC<br />Fact_DEF<br />Fact_PDQ<br />Fact_MYFACT<br />MARKETING<br />Customer_ID<br />Customer_Name<br />Customer_Addr<br />Customer_Addr1<br />Customer_City<br />Customer_State<br />Customer_Zip<br />Customer_Phone<br />Customer_Tag<br />Customer_Score<br />Customer_Region<br />Customer_Stats<br />Customer_Phone<br />Customer_Type<br />Customer_ID<br />Customer_Name<br />Customer_Addr<br />Customer_Addr1<br />Customer_City<br />Customer_State<br />Customer_Zip<br />Customer_Phone<br />Customer_Tag<br />Customer_Score<br />Customer_Region<br />Customer_Stats<br />Customer_Phone<br />Customer_Type<br />We built our own because we needed customized dimension data…<br />
  48. 48. Why is Data Vault a Good Fit?<br />30<br />
  49. 49. What are the top businessobstacles in your data warehousetoday?<br />31<br />
  50. 50. Poor Agility<br />Inconsistent Answer Sets<br />Needs Accountability<br />Demands Auditability<br />Desires IT Transparency<br />Are you feeling Pinned Down?<br />32<br />
  51. 51. What are the top technologyobstacles in yourdata warehousetoday?<br />33<br />
  52. 52. Complex Systems<br />Real-Time Data Arrival<br />Unimaginable Data Growth<br />Master Data Alignment<br />Bad Data Quality<br />Late Delivery/Over Budget<br />Are your systems CRUMBLING?<br />34<br />
  53. 53. Yugo<br />Existing Solutions<br />Worlds Worst Car<br />Have lead you down a painful path…<br />35<br />
  54. 54. Projects Cancelled & Restarted<br />Re-engineering required to absorb new systems<br />Complexity drives maintenance cost Sky high<br />Disparate Silo Solutions provide inaccurate answers!<br />Severe lack of Accountability<br />36<br />
  55. 55. How can youovercomethese obstacles?<br />There must be a better way…<br />There IS a better way!<br />37<br />
  56. 56. It’s Called the Data Vault Model andMethodology<br />38<br />
  57. 57. What is it?<br />It’s a simple<br />Easy-to-use<br />Plan<br />To build your <br />valuable<br />Data Warehouse!<br />39<br />
  58. 58. What’s the Value?<br />Painless Auditability <br />Understandable Standards<br />Rapid Adaptability<br />Simple Build-out<br />Uncomplicated Design<br />Effortless Scalability<br />Pursue Your Goals!<br />40<br />
  59. 59. Why Bother With Something New?<br />Old Chinese proverb: <br />'Unless you change direction, you're apt to end up where you're headed.'<br />41<br />
  60. 60. What Are the Issues?<br />This is NOT what you want happening to your project!<br />Business…<br />Changes Frequently<br />IT….<br />Needs Accountability<br />Takes Too Long<br />Demands Auditability<br />Is Over-budget<br />Has No Visibility<br />Too Complex<br />Wants More Control<br />Can’t Sustain Growth<br />THE GAP!!<br />42<br />
  61. 61. What Are the Foundational Keys?<br />Flexibility<br />Scalability<br />Productivity<br />43<br />
  62. 62. Key: Flexibility<br />Enabling rapid change on a massive scale without downstream impacts!<br />44<br />
  63. 63. Key: Scalability<br />Providing no foreseeable barrier to increased size and scope<br />People, Process, & Architecture!<br />45<br />
  64. 64. Key: Productivity<br />Enabling low complexity systems with high value output at a rapid pace<br />46<br />
  65. 65. < BREAK TIME ><br />47<br />
  66. 66. How does it work?<br />Bringing the Data Vault to Your Project<br />48<br />
  67. 67. Key: Flexibility<br />No Re-Engineering!<br />Addingnew components to the EDW has NEAR ZERO impact to:<br /><ul><li>Existing Loading Processes
  68. 68. Existing Data Model
  69. 69. Existing Reporting & BI Functions
  70. 70. Existing Source Systems
  71. 71. Existing Star Schemas and Data Marts</li></ul>49<br />
  72. 72. Case In Point:<br />Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA!<br />50<br />
  73. 73. Key: Scalability in Architecture<br />Scalingis easy, its based on the following principles<br /><ul><li>Hub and spoke design
  74. 74. MPP Shared-Nothing Architecture
  75. 75. Scale Free Networks</li></ul>51<br />
  76. 76. Case In Point:<br />Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today!<br />52<br />
  77. 77. Key: Scalability in Team Size<br />You should be able to SCALE your TEAM as well!<br />With the Data Vault methodology, you can:<br />Scale your team when desired, at different points in the project!<br />53<br />
  78. 78. Case In Point:<br />(Dutch Tax Authority)<br />Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault<br />54<br />
  79. 79. Key: Productivity<br />Increasing Productivity requires a reduction in complexity.<br />The Data Vault Model simplifies all of the following:<br /><ul><li>ETL Loading Routines
  80. 80. Real-Time Ingestion of Data
  81. 81. Data Modeling for the EDW
  82. 82. Enhancing and Adapting for Change to the Model
  83. 83. Ease of Monitoring, managing and optimizing processes</li></ul>55<br />
  84. 84. Case in Point:<br />Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports. <br />These individuals generated:<br /><ul><li>90% of the ETL code for moving the data set
  85. 85. 100% of the Staging Data Model
  86. 86. 75% of the finished EDW data Model
  87. 87. 75% of the star schema data model</li></ul>56<br />
  88. 88. The Competing Bid?<br />The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system)<br />Our total cost? $30k and 2 weeks!<br />57<br />
  89. 89. Results?<br />Changing the direction of the river takes less effort than stopping the flow of water<br />58<br />
  90. 90. When NOT to use the Data Vault Model & Methodology<br />59<br />
  91. 91. When NOT to Use the Data Vault<br />You have:<br />a small set of point solution requirements<br />a very short time-frame for delivery<br />To use the data one-time, then throw it away<br />a single source system, single source application<br />A single business analyst in the entire company<br />You do NOT have:<br />audit requirements forcing you to keep history<br />multiple data center consolidation efforts<br />near-real-time to worry about<br />massive batch data to integrate<br />External data feeds outside your control<br />Requirements to do trend analysis of all your data<br />Pain – that forces you to reengineer every time you ask for a change to your current data warehousing systems<br />60<br />
  92. 92. Fundamental Paradigm Shift<br />Exploring differences in the architecture, implementation, and process design.<br />61<br />
  93. 93. It’s Not Just a Data Model…<br />Model<br />Methodology<br />SUCCESS!<br />62<br />
  94. 94. Different From ANYTHING ELSE!<br />The Business Rules go after the Data Warehouse!<br />Data is interpreted on the way OUT!<br />Hold on… We do distinguish between HARD and SOFT business rules…<br />Ok, now tell my WHY this is important?<br />63<br />
  95. 95. EDW: The Old Way of Loading<br />Corporate Fraud Accountability Title XI consists of seven sections. Section 1101 recommends a name for this title as “Corporate Fraud Accountability Act of 2002”. It identifies corporate fraud and records tamperingas criminal offenses and joins those offenses to specific penalties. It also revises sentencing guidelines and strengthens their penalties. This enables the SEC to temporarily freeze large or unusual payments. <br />Source 1<br />HR Mart<br />Business Rules<br />Change<br />Data!<br />Sales Mart<br />Source 2<br />Staging<br />Are changes to data ON THE WAY IN to the EDW <br />equivalent to records tampering?<br />Finance Mart<br />Source 3<br />64<br />
  96. 96. EDW: The New Compliant Way<br />Implement a Raw Data Vault Data Warehouse<br />Move the business rules “downstream”<br />65<br />
  97. 97. Business Keys & Business Processes<br />66<br />
  98. 98. Business Keys & Business Processes<br />67<br />Excel Spreadsheet<br />SLS123<br />*P123MFG<br />SLS123<br />SLS123<br />*P123MFG<br />Procurement<br />Sales<br />Manual Process<br />NO VISIBILITY!<br />Customer<br />Contact<br />$$<br />Revenue<br />Time<br />Delivery<br />Sales<br />Contracts<br />Planning<br />Procurement<br />Manufacturing<br />Finance<br />
  99. 99. Technical Review<br />Hub, Link, Satellite - Definitions<br />68<br />
  100. 100. HUB Data Examples<br />HUB_PART_NUMBER<br />HUB_CUST_ACCT<br />SQN PART_NUM LOAD_DTS RECORD_SRC<br />1 MFG-25862 10-14-2000 MANUFACT<br />2 MFG*25266 10-14-2000 MANUFACT<br />3 *P25862 10-14-2000 PLANNING<br />4 MFG_25862 10-15-2000 DELIVERY<br />5 CN*25266 10-16-2000 DELIVERY<br />SQN CUST_ACCT LOAD_DTS RECORD_SRC<br />1 ABC123 10-14-2000 SALES<br />2 ABC-123 10-14-2000 SALES<br />3 *ABC-123 10-14-2000 FINANCE<br />4 123,ABCD 10-15-2000 CONTRACTS<br />5 PEF-2956 10-16-2000 CONTRACTS<br />Hub Structure<br />SEQUENCE<br /><BUSINESS KEY><br />{LAST SEEN DATE}<br /><LOAD DATE><br /><RECORD SOURCE><br />} Unique Index<br />} Optional<br />69<br />
  101. 101. Link Structures<br />Link_Product_Supplier<br />Link_Customer_Account_Employee<br />LPS_SQN<br />PRODUCT_SQN<br />SUPPLIER_SQN<br />LPS_LOAD_DTS<br />LPS_REC_SOURCE<br />LPS_ENCR_KEY<br />LCAE_SQN<br />CUSTOMER_SQN<br />ACCOUNT_SQN<br />EMPLOYEE_SQN<br />LCAE_LOAD_DTS<br />LCAE_REC_SOURCE<br />Unique<br />Index<br />Link Structure<br />SEQUENCE<br /><HUB KEY SQN 1><br /><HUB KEY SQN 2><br /><HUB KEY SQN N><br />{LAST SEEN DATE}<br />{CONFIDENCE}<br />{STRENGTH}<br /><LOAD DATE><br /><RECORD SOURCE><br />Unique <br />Index<br />} Optional<br />Dynamic Link<br />70<br />
  102. 102. Satellites Split By Source System<br />SAT_FINANCE_CUST<br />SAT_CONTRACTS_CUST<br />SAT_SALES_CUST<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />Contact Name<br />Contact Email<br />Contact Phone Number<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />First Name<br />Last Name<br />Guardian Full Name<br />Co-Signer Full Name<br />Phone Number<br />Address<br />City<br />State/Province<br />Zip Code<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />Name<br />Phone Number<br />Best time of day to reach<br />Do Not Call Flag<br />Satellite Structure<br />PARENT SEQUENCE<br />LOAD DATE<br /><LOAD-END-DATE><br /><RECORD-SOURCE><br />{user defined descriptive data}<br />{or temporal based timelines}<br />Primary<br />Key<br />71<br />
  103. 103. Why do we build Links this way?<br />72<br />
  104. 104. History Teaches Us…<br />If we model for ONE relationship in the EDW, we BREAK the others!<br />73<br />Portfolio<br />The EDW is designed to handle TODAY’S relationship, as soon as history is loaded, it breaks the model!<br />1<br />Today:<br />M<br />Customer<br />Hub Portfolio<br />X<br />1<br />Portfolio<br />5 years<br />From now<br />M<br />M<br />M<br />Customer<br />Hub Customer<br />X<br />Portfolio<br />M<br />10 Years ago<br />1<br />This situation forces re-engineering of the model, load routines, and queries!<br />Customer<br />
  105. 105. History Teaches Us…<br />If we model with a LINK table, we can handle ALL the requirements!<br />74<br />Portfolio<br />1<br />Today:<br />Hub Portfolio<br />M<br />Customer<br />1<br />M<br />Portfolio<br />LNK<br />Cust-Port<br />5 years <br />from now<br />M<br />M<br />M<br />Customer<br />1<br />Hub Customer<br />Portfolio<br />M<br />10 Years ago<br />This design is flexible, handles past, present, and future relationship changes with NO RE-ENGINEERING!<br />1<br />Customer<br />
  106. 106. Applying the Data Vault to Global DW2.0<br />Manufacturing EDW <br />in China<br />Planning in Brazil<br />Hub<br />Hub<br />Link<br />Sat<br />Sat<br />Link<br />Sat<br />Sat<br />Link<br />Hub<br />Link<br />Hub<br />Hub<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Sat<br />Base EDW Created in Corporate<br />Financials in USA<br />75<br />
  107. 107. 76<br />Extreme Data Vault Partitioning<br />
  108. 108. Query Performance<br />Point-in-time and Bridge Tables, overcoming query issues<br />77<br />
  109. 109. Purpose Of PIT & Bridge<br />To reduce the number of joins, and to reduce the amount of data being queried for a given range of time.<br />These two together, allow “direct table match”, as well as table elimination in the queries to occur.<br />These tables are not necessary for the entire model; only when:<br />Massive amounts of data are found<br />Large numbers of Satellites surround a Hub or Link<br />Large query across multiple Hubs & Links is necessary<br />Real-time-data is flowing in, uninterrupted<br />What are they?<br />Snapshot tables – Specifically built for query speed<br />78<br />
  110. 110. PIT Table Architecture<br />Satellite: Point In Time<br />Primary<br />Key<br />PARENT SEQUENCE<br />LOAD DATE<br />{Satellite 1 Load Date}<br />{Satellite 2 Load Date}<br />{Satellite 3 Load Date}<br />{…}<br />{Satellite N Load Date}<br />PIT Sat <br />Sat 1<br />Sat 2<br />Hub<br />Order<br />PIT Sat <br />Sat 3<br />Sat 1<br />Sat 4<br />Sat 2<br />Sat 1<br />Hub Customer<br />Hub Product<br />Sat 2<br />Sat 3<br />Link Line Item<br />Sat 4<br />Satellite<br />Line Item<br />79<br />
  111. 111. PIT Table Example<br />SAT_CUST_CONTACT_CELL<br />SAT_CUST_CONTACT_ADDR<br />SAT_CUST_CONTACT_NAME<br />SQN LOAD_DTSCELL <br />1 10-14-2000999-555-1212<br />1 10-15-2000 999-111-1234<br />1 10-16-2000 999-252-2834<br />1 10-17-2000 999.257-2837<br />1 10-18-2000 999-273-5555<br />SQN LOAD_DTSADDR <br />1 08-01-200026 Prospect<br />109-29-200026 Prosp St.<br />112-17-200028 November<br />1 01-01-200126 Prospect St<br />SQN LOAD_DTSNAME <br />1 10-14-2000 Dan L<br />1 11-01-2000Dan Linedt<br />112-31-2000Dan Linstedt<br />SQN LOAD_DTSSAT_NAME_LDTS SAT_CELL_LDTS SAT_ADDR_LDTS<br />1 08-01-2000NULL NULL 08-01-2000<br />1 09-01-2000 NULL NULL 08-01-2000<br />1 10-01-2000 NULL NULL 09-29-2000<br />1 11-01-200011-01-200010-18-200009-29-2000<br />1 12-01-200011-01-200010-18-200009-29-2000<br />1 01-01-200112-31-200010-18-200001-01-2001<br />Snapshot Date<br />80<br />
  112. 112. BridgeTable Architecture<br />Satellite: Bridge<br />Primary<br />Key<br />UNIQUE SEQUENCE<br />LOAD DATE<br />{Hub 1 Sequence #}<br />{Hub 2 Sequence #}<br />{Hub 3 Sequence #}<br />{Link 1 Sequence #}<br />{Link 2 Sequence #}<br />{…}<br />{Link N Sequence #}<br />{Hub 1 Business Key}<br />{Hub 2 Business Key}<br />{…}<br />{Hub N Business Key}<br />Bridge<br />Sat 1<br />Sat 2<br />Hub Parts<br />Hub Seller<br />Hub Product<br />Link <br />Link <br />Sat 3<br />Sat 4<br />Satellite<br />Satellite<br />81<br />
  113. 113. Bridge Table Data Example<br />Bridge Table: Seller by Product by Part<br />SQN LOAD_DTSSELL_SQN SELL_ID PROD_SQN PROD_NUM PART_SQN PART_NUM<br />1 08-01-200015 NY*1 2756 ABC-123-9K 525 JK*2*4<br />209-01-200016CO*242654DEF-847-0L 324 MN*5-2<br />310-01-200016CO*2482374PPA-252-2A 9938 DD*2*3<br />411-01-200024AZ*2525222UIF-525-88 7 UF*9*0<br />512-01-200099NM*581DAN-347-7F 16 KI*9-2<br />601-01-200199NM*581DAN-347-7F 24 DL*0-5<br />Snapshot Date<br />82<br />
  114. 114. What WASN’T Covered<br />ETL Automation<br />ETL Implementation<br />SQL Query Logic<br />Balanced MPP design<br />Data Vault Modeling on Appliances<br />Deep Dive on Structures (Hubs, Links, Satellites)<br />What happens when you break the rules?<br />Project management, Risk management & mitigation, methodology & approach<br />Automation: Automated DV modeling, Automated ETL production<br />Change Management<br />Temporal Data Modeling Concerns… And so on…<br />83<br />
  115. 115. Conclusions<br />84<br />
  116. 116. Who’s Using It?<br />85<br />
  117. 117. The Experts Say…<br />“The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” <br />Bill Inmon<br />“The Data Vault is foundationally strong and exceptionally scalable architecture.”<br />Stephen Brobst<br />“The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” <br />Doug Laney<br />86<br />
  118. 118. More Notables…<br />“This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” <br />Howard Dresner<br />“[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..”<br />Scott Ambler<br />87<br />
  119. 119. Where To Learn More<br />The Technical Modeling Book: http://LearnDataVault.com<br />The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions<br />Contact me:http://DanLinstedt.com - web siteDanLinstedt@gmail.com - email<br />World wide User Group (Free)http://dvusergroup.com<br />88<br />

×