SlideShare uma empresa Scribd logo
1 de 41
Baixar para ler offline
Trends in
Data Warehouse Data Modeling:

      Data Vault and
     Anchor Modeling
Thanks for Attending!
●   Roland Bouman, Leiden the Netherlands
●   MySQL AB, Sun, Strukton, Pentaho (1 nov)
●   Web- and Business Intelligence Developer
●   author:
       –   Pentaho Solutions
       –   Pentaho Kettle Solutions
●   Http://rpbouman.blogspot.com/
●   Twitter: @rolandbouman
Data Warehouse (DWH)
●   Support Business Intelligence (BI)
        –   Reporting
        –   Analysis
        –   Data mining
●   General Requirements
        –   Integrate disparate data sources
        –   Maintain History
        –   Calculate Derived data
        –   Data delivery to BI applications
DWH Architectures
●   Categories
       –   Traditional
       –   Hybrid
       –   Modern
●   Aspects
       –   Modelling
       –   Data logistics
DWH Architectures
●   Traditional
        –   Information Factory (Bill Inmon)
        –   Enterprise Bus (Ralph Kimball)
●   Hybrid
●   Modern
DWH Architectures
●   Traditional
●   Hybrid
        –   Hub-and-Spoke
●   Modern
DWH Architectures
●   Traditional
●   Hybrid
●   Modern
        –   Data Vault (Dan Linstedt)
        –   Anchor Modeling (Lars Rönnbäck)
Inmon DWH (Traditional):
Corporate Information Factory

 “A source of data that is subject oriented,
integrated, nonvolatile and time variant for
the purpose of management's decision
processes.”

Bill Inmon (the Data Warehouse Toolkit)
●http://www.inmoncif.com/home/
Inmon DWH (Traditional):
Corporate Information Factory
●   Enterprise or Corporate DWH, DWH 2.0
●   Focus on backroom data integration
       –   Central information model
       –   Single version of the truth
●   Data delivery
       –   Disposable data marts
●   Bottom-up
Data logistics of the
  Corporate Information Factory


OLTP                                               OLAP
 DB      Staging                                    DB
                                    Extract
         Extract        Enterprise  Transform
         Transform   Data Warehouse Load
         Load
 Files                                             Cube


Source                                          Data Marts   BI Apps
Data Modeling for the
        CIF Enterprise DWH
●   Normalized, typically 3NF
●   Organized in “subject areas”
       –   Series of related tables
       –   Example: Customer, Product, Transaction
       –   Common key
Data Modeling for the
        CIF Enterprise DWH
●   History
       –   PK includes a date/timepart
●   Contains both detail and aggregate data
       –   Multiple levels of aggregation
Kimball DWH (Traditional):
  Dimensional Model and
   DWH Bus Architecture
 “The data warehouse is the conglomeration
of an organization's staging and presentation
areas, where operational data is specifically
structured for query and analysis
performance and ease of use.”

Ralph Kimball (the Data Warehouse Toolkit)
●http://www.kimballgroup.com/
Kimball DWH (Traditional):
      DWH Bus Architecture
●   Focus on data delivery
●   Integration at the data mart level
●   Top-down
Data logistics of the
    DWH Bus Architecture


OLTP
 DB      Staging      (Enterprise) Data Warehouse

         Extract               OLAP
         Transform              DB
         Load
 Files
                              Cube

Source               EDW is a collection Data Marts   BI Apps
Data Modeling for the
       DWH Bus Architecture
●   Dimensional Modeling
       –   Star schemas
●   Organized in:
       –   Fact tables
       –   Dimension tables
Data Modeling for the
       DWH Bus Architecture
●   Fact tables
       –   Highly normalized
       –   Additive metrics
●   Dimension tables
       –   Highly denormalized
       –   Descriptive labels
       –   Shared across fact tables
Data Modeling for the
       DWH Bus Architecture
●   History
       –   Slowly changing dimensions (versioning)
       –   Fact links to Date and/or Time dimensions
●   Detailed, not aggregated
Sakila Rental Star Schema
Sakila DWH Bus Architecture
    dim_film       dim_date




  fact_inventory   fact_rental   fact_payment




    dim_store      dim_staff     dim_customer
Problems with traditional
        DWH architectures
●   General Problems
       –   Lack of flexibility and resilience to change
       –   Loading (ETL) Complexity
●   Problems with Inmon
       –   Centralization requires upfront investment
       –   Single version of whose truth, when?
●   Problems with Kimball
       –   Dimensional Model anomalies
Dimensional Modeling
            Anomalies
●   Snowflaking (dimension normalization)
        –   Monster dimensions
        –   Outriggers
        –   Ex: Customer Demographics
●   Hierarchical data
        –   Bridge table (closure table)
        –   Ex: Employee/Boss,
●   Multi-valued dimensions
        –   Bridge table
        –   Ex: Account/Customer bridge table
Hybrid DWH: Hub-and-Spoke
●   Inmon back-end (hub)
●   Kimball front-end (satellites)
Modern: Data Vault
 “The Data Vault is a detail oriented,
historical tracking and uniquely linked set of
normalized tables that supports one or more
functional areas of business. It is a hybrid
approach encompassing the best of breed
between 3rd normal form (3NF) and star
schema.”

Dan Linstedt (Data Vault Overview)
●http://danlinstedt.com/
Data Vault
●   Focus on
        –   Data Integration
        –   Traceability and Auditability
        –   Resilience to change
●   Single version of the facts
        –   Rather than single version of the truth
●   All of the data, all of the time
        –   No upfront cleansing and conforming
●   Bottom-up
Data Vault Modelling
●   Hubs
●   Links
●   Satellites
Data Vault Modelling: Hubs
●   Hubs Model Entities
●   Contains business keys
       –   PK in absence of surrogate key
●   Metadata:
       –   Record source
       –   Load date/time
●   Optional surrogate key
       –   Used as PK if present
●   No foreign keys!
Data Vault Modelling: Links
●   Links model relationships
        –   Intersection table (M:n relationship)
●   Foreign keys to related hubs or links
        –   Form natural key (business key) of the link
●   Metadata:
        –   Record source
        –   Load date/time
●   Optional surrogate key
Data Vault Modelling: Satellites
 ●   Satellites model a group of attributes
 ●   Foreign key to a Hub or Link
 ●   Metadata:
         –   Record source
         –   Load date/time
Sakila Data Vault Example
Data Vault tools and Example
●   Kettle Data Vault Example
       –   Sakila Data Vault
       –   Chapter 19
       –   Kasper van de Graaf
       –   http://www.dikw-academy.nl
●   Quipu
       –   Data Vault Generator
       –   Kettle templates
       –   Johannes van den Bosch
       –   http://www.datawarehousemanagement.org/
Modern: Anchor model
 “Anchor Modeling is an agile information
modeling technique that offers non-
destructive extensibility mechanisms
enabling robust and flexible management of
changes. A key benefit of Anchor Modeling is
that changes in a data warehouse
environment only require extensions, not
modications.”

Lars Rönnbäck (Agile Information Modeling
           
in Evolving Data Environments)
●http://www.anchormodeling.com/
Anchor Modelling
●   Focus on
       –   Resilience to change
       –   Agility
       –   Extensibility
       –   History tracking
●   Bottom-up
Anchor Modelling
●   6NF (Date, Darwen, Lorentzos)
●   Table features no non-trivial join
    dependencies at all
●   Translation: A 6NF table cannot be
    decomposed losslessly
●   Translation
●   Temporal Data
Anchor Modelling Constructs
●   Anchors
●   Attributes
●   Ties
●   Knots
Anchor Modelling: Anchors
●   Entities are modeled as Anchors
●   Relationships may be modeled as Anchors
       –   m:n relationships having properties
●   Only a surrogate key
Anchor Modelling: Ties
●   Ties model relationships
        –   1:n relationships
        –   m:n relationships without properties
●   Static vs Historized
        –   History tracked using date/time
●   May be Knotted
        –   Knot holds set of association types
●   Two or more “anchor roles”
        –   Relationships may be broken into several
             ties having only mandatory anchors
Anchor Modelling: Attributes
●   Models properties of an Anchor
●   Static vs Historized
        –   History tracked using date/time
●   May or not be Knotted
        –   Knot holds set of valid attribute values
Anchor Modelling: Knots
●   Reference table
        –   Fairly small set of distinct values
●   Dictionary lookup to qualify
        –   Attributes
        –   Ties
●   “Knotted” Attributes and Ties
Anchor Model Diagram




  anchor        Static attribute         Static tie
  knot          Historized attribute     Historized tie

http://www.anchormodeling.com/modeler/latest/
Aknowledgements
●   Kasper de Graaf
       –   Twitter: @kdgraaf
       –   http://www.dikw-academy.nl
●   Jos van Dongen
       –   Twitter: @josvandongen
       –   http://www.tholis.com/

Mais conteúdo relacionado

Mais procurados

Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSAgile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSKent Graziano
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Kent Graziano
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesCode Mastery
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data SolutionJames Serra
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseOsama Hussein
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseRob Winters
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Worst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignWorst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignKent Graziano
 
PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012Jos van Dongen
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfRob Winters
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachKent Graziano
 
Data modeling star schema
Data modeling star schemaData modeling star schema
Data modeling star schemaSayed Ahmed
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouseUday Kothari
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseBui Ha
 

Mais procurados (20)

Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODSAgile Data Warehousing: Using SDDM to Build a Virtualized ODS
Agile Data Warehousing: Using SDDM to Build a Virtualized ODS
 
Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)Agile Methods and Data Warehousing (2016 update)
Agile Methods and Data Warehousing (2016 update)
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Using SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS CubesUsing SSRS Reports with SSAS Cubes
Using SSRS Reports with SSAS Cubes
 
Building a Big Data Solution
Building a Big Data SolutionBuilding a Big Data Solution
Building a Big Data Solution
 
From Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data WarehouseFrom Traditional Data Warehouse To Real Time Data Warehouse
From Traditional Data Warehouse To Real Time Data Warehouse
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Worst Practices in Data Warehouse Design
Worst Practices in Data Warehouse DesignWorst Practices in Data Warehouse Design
Worst Practices in Data Warehouse Design
 
PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012PDI data vault framework #pcmams 2012
PDI data vault framework #pcmams 2012
 
Data Vault Automation at the Bijenkorf
Data Vault Automation at the BijenkorfData Vault Automation at the Bijenkorf
Data Vault Automation at the Bijenkorf
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 
Data modeling star schema
Data modeling star schemaData modeling star schema
Data modeling star schema
 
Operational Data Vault
Operational Data VaultOperational Data Vault
Operational Data Vault
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouse
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data Warehouse
 

Semelhante a Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_modelling

Open source data_warehousing_overview
Open source data_warehousing_overviewOpen source data_warehousing_overview
Open source data_warehousing_overviewAlex Meadows
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentationargonauts007
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti
 
Case Study: Implementing a Data Mesh at NORD/LB
Case Study: Implementing a Data Mesh at NORD/LBCase Study: Implementing a Data Mesh at NORD/LB
Case Study: Implementing a Data Mesh at NORD/LBHostedbyConfluent
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachChoosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachDATAVERSITY
 
Starring sakila my sql university 2009
Starring sakila my sql university 2009Starring sakila my sql university 2009
Starring sakila my sql university 2009David Paz
 
Architecting Database by Jony Sugianto (Detik.com)
Architecting Database by Jony Sugianto (Detik.com)Architecting Database by Jony Sugianto (Detik.com)
Architecting Database by Jony Sugianto (Detik.com)Tech in Asia ID
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 
The final frontier
The final frontierThe final frontier
The final frontierTerry Bunio
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3Terry Bunio
 
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...confluent
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real worldukc4
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadThink Big, a Teradata Company
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developerJesus Rodriguez
 

Semelhante a Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_modelling (20)

Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
Open source data_warehousing_overview
Open source data_warehousing_overviewOpen source data_warehousing_overview
Open source data_warehousing_overview
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
 
SQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
 
Case Study: Implementing a Data Mesh at NORD/LB
Case Study: Implementing a Data Mesh at NORD/LBCase Study: Implementing a Data Mesh at NORD/LB
Case Study: Implementing a Data Mesh at NORD/LB
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot ApproachChoosing the Right Big Data Tools for the Job - A Polyglot Approach
Choosing the Right Big Data Tools for the Job - A Polyglot Approach
 
Starring sakila my sql university 2009
Starring sakila my sql university 2009Starring sakila my sql university 2009
Starring sakila my sql university 2009
 
Architecting Database by Jony Sugianto (Detik.com)
Architecting Database by Jony Sugianto (Detik.com)Architecting Database by Jony Sugianto (Detik.com)
Architecting Database by Jony Sugianto (Detik.com)
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
The final frontier
The final frontierThe final frontier
The final frontier
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 
Spark at Zillow
Spark at ZillowSpark at Zillow
Spark at Zillow
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
Relational Database Stockholm Syndrome (Neal Murray, 6 Point 6) London 2019 C...
 
Lecture1
Lecture1Lecture1
Lecture1
 
Data Warehousing - in the real world
Data Warehousing - in the real worldData Warehousing - in the real world
Data Warehousing - in the real world
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
 

Mais de Roland Bouman

Beyond OData: Introducing the XML/A model for ui5
Beyond OData: Introducing the XML/A model for ui5Beyond OData: Introducing the XML/A model for ui5
Beyond OData: Introducing the XML/A model for ui5Roland Bouman
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Roland Bouman
 
Writing MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScriptWriting MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScriptRoland Bouman
 
3. writing MySql plugins for the information schema
3. writing MySql plugins for the information schema3. writing MySql plugins for the information schema
3. writing MySql plugins for the information schemaRoland Bouman
 
2. writing MySql plugins general
2. writing MySql plugins   general2. writing MySql plugins   general
2. writing MySql plugins generalRoland Bouman
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
Optimizing mysql stored routines uc2010
Optimizing mysql stored routines uc2010Optimizing mysql stored routines uc2010
Optimizing mysql stored routines uc2010Roland Bouman
 

Mais de Roland Bouman (12)

Beyond OData: Introducing the XML/A model for ui5
Beyond OData: Introducing the XML/A model for ui5Beyond OData: Introducing the XML/A model for ui5
Beyond OData: Introducing the XML/A model for ui5
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
 
Xmla4js
Xmla4jsXmla4js
Xmla4js
 
Xml4js pentaho
Xml4js pentahoXml4js pentaho
Xml4js pentaho
 
Writing MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScriptWriting MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScript
 
3. writing MySql plugins for the information schema
3. writing MySql plugins for the information schema3. writing MySql plugins for the information schema
3. writing MySql plugins for the information schema
 
2. writing MySql plugins general
2. writing MySql plugins   general2. writing MySql plugins   general
2. writing MySql plugins general
 
1. MySql plugins
1. MySql plugins1. MySql plugins
1. MySql plugins
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Optimizing mysql stored routines uc2010
Optimizing mysql stored routines uc2010Optimizing mysql stored routines uc2010
Optimizing mysql stored routines uc2010
 
Writing MySQL UDFs
Writing MySQL UDFsWriting MySQL UDFs
Writing MySQL UDFs
 

Último

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Último (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_modelling

  • 1. Trends in Data Warehouse Data Modeling: Data Vault and Anchor Modeling
  • 2. Thanks for Attending! ● Roland Bouman, Leiden the Netherlands ● MySQL AB, Sun, Strukton, Pentaho (1 nov) ● Web- and Business Intelligence Developer ● author: – Pentaho Solutions – Pentaho Kettle Solutions ● Http://rpbouman.blogspot.com/ ● Twitter: @rolandbouman
  • 3. Data Warehouse (DWH) ● Support Business Intelligence (BI) – Reporting – Analysis – Data mining ● General Requirements – Integrate disparate data sources – Maintain History – Calculate Derived data – Data delivery to BI applications
  • 4. DWH Architectures ● Categories – Traditional – Hybrid – Modern ● Aspects – Modelling – Data logistics
  • 5. DWH Architectures ● Traditional – Information Factory (Bill Inmon) – Enterprise Bus (Ralph Kimball) ● Hybrid ● Modern
  • 6. DWH Architectures ● Traditional ● Hybrid – Hub-and-Spoke ● Modern
  • 7. DWH Architectures ● Traditional ● Hybrid ● Modern – Data Vault (Dan Linstedt) – Anchor Modeling (Lars Rönnbäck)
  • 8. Inmon DWH (Traditional): Corporate Information Factory “A source of data that is subject oriented, integrated, nonvolatile and time variant for the purpose of management's decision processes.” Bill Inmon (the Data Warehouse Toolkit) ●http://www.inmoncif.com/home/
  • 9. Inmon DWH (Traditional): Corporate Information Factory ● Enterprise or Corporate DWH, DWH 2.0 ● Focus on backroom data integration – Central information model – Single version of the truth ● Data delivery – Disposable data marts ● Bottom-up
  • 10. Data logistics of the Corporate Information Factory OLTP OLAP DB Staging DB Extract Extract Enterprise Transform Transform Data Warehouse Load Load Files Cube Source Data Marts BI Apps
  • 11. Data Modeling for the CIF Enterprise DWH ● Normalized, typically 3NF ● Organized in “subject areas” – Series of related tables – Example: Customer, Product, Transaction – Common key
  • 12. Data Modeling for the CIF Enterprise DWH ● History – PK includes a date/timepart ● Contains both detail and aggregate data – Multiple levels of aggregation
  • 13. Kimball DWH (Traditional): Dimensional Model and DWH Bus Architecture “The data warehouse is the conglomeration of an organization's staging and presentation areas, where operational data is specifically structured for query and analysis performance and ease of use.” Ralph Kimball (the Data Warehouse Toolkit) ●http://www.kimballgroup.com/
  • 14. Kimball DWH (Traditional): DWH Bus Architecture ● Focus on data delivery ● Integration at the data mart level ● Top-down
  • 15. Data logistics of the DWH Bus Architecture OLTP DB Staging (Enterprise) Data Warehouse Extract OLAP Transform DB Load Files Cube Source EDW is a collection Data Marts BI Apps
  • 16. Data Modeling for the DWH Bus Architecture ● Dimensional Modeling – Star schemas ● Organized in: – Fact tables – Dimension tables
  • 17. Data Modeling for the DWH Bus Architecture ● Fact tables – Highly normalized – Additive metrics ● Dimension tables – Highly denormalized – Descriptive labels – Shared across fact tables
  • 18. Data Modeling for the DWH Bus Architecture ● History – Slowly changing dimensions (versioning) – Fact links to Date and/or Time dimensions ● Detailed, not aggregated
  • 20. Sakila DWH Bus Architecture dim_film dim_date fact_inventory fact_rental fact_payment dim_store dim_staff dim_customer
  • 21. Problems with traditional DWH architectures ● General Problems – Lack of flexibility and resilience to change – Loading (ETL) Complexity ● Problems with Inmon – Centralization requires upfront investment – Single version of whose truth, when? ● Problems with Kimball – Dimensional Model anomalies
  • 22. Dimensional Modeling Anomalies ● Snowflaking (dimension normalization) – Monster dimensions – Outriggers – Ex: Customer Demographics ● Hierarchical data – Bridge table (closure table) – Ex: Employee/Boss, ● Multi-valued dimensions – Bridge table – Ex: Account/Customer bridge table
  • 23. Hybrid DWH: Hub-and-Spoke ● Inmon back-end (hub) ● Kimball front-end (satellites)
  • 24. Modern: Data Vault “The Data Vault is a detail oriented, historical tracking and uniquely linked set of normalized tables that supports one or more functional areas of business. It is a hybrid approach encompassing the best of breed between 3rd normal form (3NF) and star schema.” Dan Linstedt (Data Vault Overview) ●http://danlinstedt.com/
  • 25. Data Vault ● Focus on – Data Integration – Traceability and Auditability – Resilience to change ● Single version of the facts – Rather than single version of the truth ● All of the data, all of the time – No upfront cleansing and conforming ● Bottom-up
  • 26. Data Vault Modelling ● Hubs ● Links ● Satellites
  • 27. Data Vault Modelling: Hubs ● Hubs Model Entities ● Contains business keys – PK in absence of surrogate key ● Metadata: – Record source – Load date/time ● Optional surrogate key – Used as PK if present ● No foreign keys!
  • 28. Data Vault Modelling: Links ● Links model relationships – Intersection table (M:n relationship) ● Foreign keys to related hubs or links – Form natural key (business key) of the link ● Metadata: – Record source – Load date/time ● Optional surrogate key
  • 29. Data Vault Modelling: Satellites ● Satellites model a group of attributes ● Foreign key to a Hub or Link ● Metadata: – Record source – Load date/time
  • 30. Sakila Data Vault Example
  • 31. Data Vault tools and Example ● Kettle Data Vault Example – Sakila Data Vault – Chapter 19 – Kasper van de Graaf – http://www.dikw-academy.nl ● Quipu – Data Vault Generator – Kettle templates – Johannes van den Bosch – http://www.datawarehousemanagement.org/
  • 32. Modern: Anchor model “Anchor Modeling is an agile information modeling technique that offers non- destructive extensibility mechanisms enabling robust and flexible management of changes. A key benefit of Anchor Modeling is that changes in a data warehouse environment only require extensions, not modications.” Lars Rönnbäck (Agile Information Modeling   in Evolving Data Environments) ●http://www.anchormodeling.com/
  • 33. Anchor Modelling ● Focus on – Resilience to change – Agility – Extensibility – History tracking ● Bottom-up
  • 34. Anchor Modelling ● 6NF (Date, Darwen, Lorentzos) ● Table features no non-trivial join dependencies at all ● Translation: A 6NF table cannot be decomposed losslessly ● Translation ● Temporal Data
  • 35. Anchor Modelling Constructs ● Anchors ● Attributes ● Ties ● Knots
  • 36. Anchor Modelling: Anchors ● Entities are modeled as Anchors ● Relationships may be modeled as Anchors – m:n relationships having properties ● Only a surrogate key
  • 37. Anchor Modelling: Ties ● Ties model relationships – 1:n relationships – m:n relationships without properties ● Static vs Historized – History tracked using date/time ● May be Knotted – Knot holds set of association types ● Two or more “anchor roles” – Relationships may be broken into several ties having only mandatory anchors
  • 38. Anchor Modelling: Attributes ● Models properties of an Anchor ● Static vs Historized – History tracked using date/time ● May or not be Knotted – Knot holds set of valid attribute values
  • 39. Anchor Modelling: Knots ● Reference table – Fairly small set of distinct values ● Dictionary lookup to qualify – Attributes – Ties ● “Knotted” Attributes and Ties
  • 40. Anchor Model Diagram anchor Static attribute Static tie knot Historized attribute Historized tie http://www.anchormodeling.com/modeler/latest/
  • 41. Aknowledgements ● Kasper de Graaf – Twitter: @kdgraaf – http://www.dikw-academy.nl ● Jos van Dongen – Twitter: @josvandongen – http://www.tholis.com/