SlideShare uma empresa Scribd logo
1 de 24
Open Source Data Warehousing:
     MySQL and Beyond



             Alex Meadows
          Twitter: @DBA_Alex
        Percona MySQL University
               Raleigh, NC
                1/29/2013
What Is Data Warehousing?
●   Central repository
●   Oriented on Reporting and Analysis
●   Integrates multiple sources
●   Core to Business Intelligence and Advanced
    Analytics
●   Helps keep source systems clean and lean
Warehouse Methodologies
●   Inmon’s 3NF/Hub and Spoke Model
●   Kimball’s Conformed Dimension Model
●   Linstedt’s Data Vault Model
●   Rönnbäck’s Anchor Model/6NF
Source: http://www.anchormodeling.com/wp-content/uploads/2011/05/Anchor-Modeling-GSE.pdf
Common DW Challenges
●   Data storage increases significantly
        ●   Time based snapshots
        ●   Storing source changes
●   Massive queries
        ●   Joining many tables, from multiple sources
        ●   Exploratory vs reporting
●   Source Issues Magnified
●   Scalability
Inmon’s 3NF Model
●   Original data warehouse model
●   Move historical data into own data store
●   Data transformed to 3NF
        ●   Entities and relationships
Open Source Software
●   MySQL
●   PostgreSQL
●   Greenplum (PostgreSQL derivative)
●   Any other traditional RDBMS
Cautions
●   Indexing
●   Replication
●   Partitioning
Kimball’s Conformed Dimensions
●   Normal database modeling does not meet needs of
    reporting and analysis
●   Denormalize data
●   Dimensions
       ●    How does data need to be filtered?
●   Facts
       ●    What are we wanting to analyze/measure?
Source: http://blog-mstechnology.blogspot.com/2010/06/bi-dimensional-model-star-schema.html
Open Source Software
●   Greenplum (PostgreSQL derivative)
●   InfiniDB (MySQL derivative)
●   Infobright (MySQL derivative)
●   Other columnar data stores
Columnar Data Stores
●   Designed for conformed dimensions
●   High Performance
       ●   Self-indexing based on usage
       ●   High compression of data
Row vs Columnar Databases




Source: http://dbbest.com/blog/column-oriented-database-technologies/
Cautions
●   Traditional RDBMS
       ●   Not built for conformed dimensions!
       ●   Performance will become issue
Inmon’s Hub and Spoke
●   Combines
        ●   3NF central data warehouse
        ●   Conformed dimensions
●   Becomes foundation for further variants
●   Linstedt’s Data Vault Model
●   Mixes 3NF and Conformed Dimensions
●   Model data per business entities and their
    relationships
●   Hubs
        ●   Store unique business entity identifiers (keys)
●   Links
        ●   Relate hubs and other links to form relationships
●   Satellites
        ●   Store unique information regarding entity or
              relationship
Source: http://danlinstedt.com/about/data-vault-basics/
Cautions
●   While you get the best mix between 3NF and
    conformed dimensions, data marts are still needed
●   Issues seen with both 3NF and conformed
    dimensions can be found here
Open Source Software
●   MySQL
●   PostgreSQL
●   Greenplum
●   Other Traditional RDBMS
●   NoSQL
       ●   Hadoop
●   Rönnbäck’s Anchor Model/6NF
●   Focus is on the data and it’s relationships.
●   Anchors
        ●   Model entities and events
●   Attributes
        ●   Model properties of anchors
●   Ties
        ●   Model relationships between anchors
●   Knots
        ●   Model relationships between shared properties
Source: http://en.wikipedia.org/wiki/Anchor_Modeling
Cautions
●   Number of joins will be an issue for some databases
●   Queries will become complex
        ●   Joins
        ●   Finding properties/valuable information
        ●   Every column in traditional tables becomes own
             unique table
?
Open Source Software
●   Anchor Modeling website
        ●   http://www.anchormodeling.com
        ●   Web based design tools
●   No databases built specifically for 6NF

Mais conteúdo relacionado

Mais procurados

Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your DataAlex Meadows
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryAlex Meadows
 
Introduction to mongo db by zain
Introduction to mongo db by zainIntroduction to mongo db by zain
Introduction to mongo db by zainKenAndTea
 
Apache Marmotta (incubating)
Apache Marmotta (incubating)Apache Marmotta (incubating)
Apache Marmotta (incubating)Sergio Fernández
 
Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.jsMax Neunhöffer
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?FlyData Inc.
 
Semantic Media Management with Apache Marmotta
Semantic Media Management with Apache MarmottaSemantic Media Management with Apache Marmotta
Semantic Media Management with Apache MarmottaThomas Kurz
 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsShawn Zhu
 
Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Tsendsuren Munkhdalai
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model databaseMahdi Atawneh
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar PresentationMuntazir Mehdi
 

Mais procurados (20)

NoSQL
NoSQLNoSQL
NoSQL
 
CSCi226PPT1
CSCi226PPT1CSCi226PPT1
CSCi226PPT1
 
Database
DatabaseDatabase
Database
 
Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your Data
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information Discovery
 
NoSQL
NoSQLNoSQL
NoSQL
 
Introduction to mongo db by zain
Introduction to mongo db by zainIntroduction to mongo db by zain
Introduction to mongo db by zain
 
Apache Marmotta (incubating)
Apache Marmotta (incubating)Apache Marmotta (incubating)
Apache Marmotta (incubating)
 
Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.js
 
No sql
No sqlNo sql
No sql
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?
 
Semantic Media Management with Apache Marmotta
Semantic Media Management with Apache MarmottaSemantic Media Management with Apache Marmotta
Semantic Media Management with Apache Marmotta
 
Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data Scientists
 
Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model database
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar Presentation
 
Core Data
Core DataCore Data
Core Data
 

Semelhante a Open source data_warehousing_overview

Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland Bouman
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBKnoldus Inc.
 
Heterogenous Persistence
Heterogenous PersistenceHeterogenous Persistence
Heterogenous PersistenceJervin Real
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLdatamantra
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresqlZaid Shabbir
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...ScyllaDB
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Complete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesComplete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesnicolascombin1
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL DatabasesAbiral Gautam
 
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...adeel8937
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 

Semelhante a Open source data_warehousing_overview (20)

Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Heterogenous Persistence
Heterogenous PersistenceHeterogenous Persistence
Heterogenous Persistence
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
 
Datastore PPT.pptx
Datastore PPT.pptxDatastore PPT.pptx
Datastore PPT.pptx
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
MongoDB
MongoDBMongoDB
MongoDB
 
NoSql
NoSqlNoSql
NoSql
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Complete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesComplete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examples
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
 
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 

Mais de Alex Meadows

Ethics In A Data Driven World
Ethics In A Data Driven WorldEthics In A Data Driven World
Ethics In A Data Driven WorldAlex Meadows
 
SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?Alex Meadows
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data WarehousingAlex Meadows
 
Continuous Integration As A Service
Continuous Integration As A ServiceContinuous Integration As A Service
Continuous Integration As A ServiceAlex Meadows
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehousesAlex Meadows
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To AnalyticsAlex Meadows
 
Continuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsContinuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsAlex Meadows
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - IntroductionAlex Meadows
 
Open Source BI Overview
Open Source BI Overview Open Source BI Overview
Open Source BI Overview Alex Meadows
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business IntelligenceAlex Meadows
 
Data quality overview
Data quality overviewData quality overview
Data quality overviewAlex Meadows
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP OverviewAlex Meadows
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence OverviewAlex Meadows
 
Choosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleChoosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleAlex Meadows
 

Mais de Alex Meadows (14)

Ethics In A Data Driven World
Ethics In A Data Driven WorldEthics In A Data Driven World
Ethics In A Data Driven World
 
SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
 
Continuous Integration As A Service
Continuous Integration As A ServiceContinuous Integration As A Service
Continuous Integration As A Service
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehouses
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
 
Continuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsContinuous integration with business intelligence and analytics
Continuous integration with business intelligence and analytics
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - Introduction
 
Open Source BI Overview
Open Source BI Overview Open Source BI Overview
Open Source BI Overview
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Data quality overview
Data quality overviewData quality overview
Data quality overview
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP Overview
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence Overview
 
Choosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleChoosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettle
 

Último

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Open source data_warehousing_overview

  • 1. Open Source Data Warehousing: MySQL and Beyond Alex Meadows Twitter: @DBA_Alex Percona MySQL University Raleigh, NC 1/29/2013
  • 2. What Is Data Warehousing? ● Central repository ● Oriented on Reporting and Analysis ● Integrates multiple sources ● Core to Business Intelligence and Advanced Analytics ● Helps keep source systems clean and lean
  • 3. Warehouse Methodologies ● Inmon’s 3NF/Hub and Spoke Model ● Kimball’s Conformed Dimension Model ● Linstedt’s Data Vault Model ● Rönnbäck’s Anchor Model/6NF
  • 5. Common DW Challenges ● Data storage increases significantly ● Time based snapshots ● Storing source changes ● Massive queries ● Joining many tables, from multiple sources ● Exploratory vs reporting ● Source Issues Magnified ● Scalability
  • 6. Inmon’s 3NF Model ● Original data warehouse model ● Move historical data into own data store ● Data transformed to 3NF ● Entities and relationships
  • 7. Open Source Software ● MySQL ● PostgreSQL ● Greenplum (PostgreSQL derivative) ● Any other traditional RDBMS
  • 8. Cautions ● Indexing ● Replication ● Partitioning
  • 9. Kimball’s Conformed Dimensions ● Normal database modeling does not meet needs of reporting and analysis ● Denormalize data ● Dimensions ● How does data need to be filtered? ● Facts ● What are we wanting to analyze/measure?
  • 11. Open Source Software ● Greenplum (PostgreSQL derivative) ● InfiniDB (MySQL derivative) ● Infobright (MySQL derivative) ● Other columnar data stores
  • 12. Columnar Data Stores ● Designed for conformed dimensions ● High Performance ● Self-indexing based on usage ● High compression of data
  • 13. Row vs Columnar Databases Source: http://dbbest.com/blog/column-oriented-database-technologies/
  • 14. Cautions ● Traditional RDBMS ● Not built for conformed dimensions! ● Performance will become issue
  • 15. Inmon’s Hub and Spoke ● Combines ● 3NF central data warehouse ● Conformed dimensions ● Becomes foundation for further variants
  • 16. Linstedt’s Data Vault Model ● Mixes 3NF and Conformed Dimensions ● Model data per business entities and their relationships ● Hubs ● Store unique business entity identifiers (keys) ● Links ● Relate hubs and other links to form relationships ● Satellites ● Store unique information regarding entity or relationship
  • 18. Cautions ● While you get the best mix between 3NF and conformed dimensions, data marts are still needed ● Issues seen with both 3NF and conformed dimensions can be found here
  • 19. Open Source Software ● MySQL ● PostgreSQL ● Greenplum ● Other Traditional RDBMS ● NoSQL ● Hadoop
  • 20. Rönnbäck’s Anchor Model/6NF ● Focus is on the data and it’s relationships. ● Anchors ● Model entities and events ● Attributes ● Model properties of anchors ● Ties ● Model relationships between anchors ● Knots ● Model relationships between shared properties
  • 22. Cautions ● Number of joins will be an issue for some databases ● Queries will become complex ● Joins ● Finding properties/valuable information ● Every column in traditional tables becomes own unique table
  • 23. ?
  • 24. Open Source Software ● Anchor Modeling website ● http://www.anchormodeling.com ● Web based design tools ● No databases built specifically for 6NF