SlideShare uma empresa Scribd logo
1 de 44
Physical Database Design for
MPP and Columnar Databases
Geoffrey Clark
Principal at Lucidata, Inc.
September 2013
copywrite, Lucidata, 2013
Conceptual, Logical, Physical
• Conceptual links to Business Strategy.
– This is now becoming more quantitative
• Logical maps to the Business Semantics.
– Con-way example
• Physical maps to your Data Stores
– These will be more varied and heterogeneous in
the future, due to specialization.
copywrite, Lucidata, 2013
HBR Business Strategy
The New Dynamics of Competition, Michael D. Ryall, Harvard Business Review, June 2013
Michael Porter’s Five Forces
has dominated strategic
and competitive analysis
since 1979. This analysis
has largely been conceptual
in nature.
Quantitative analysis on
structured data in context is
changing the nature of
business culture, and
improving business
decisions.
This drives the demand for
data modeling and
management.
copywrite, Lucidata, 2013
Design and Evolution
• Hierarchies
– 14th Century Europe and the Financial Revolution
– Aggregations & Allocations
• Cards, Tapes – physical analog media
• Computer Science
– Moore’s Law
• Processor Speed Improvements
• Memory Improvements
• Media Improvements – Punch Cards, Tape, Disk, Memory
• Design for Context & the Future
– Character encoding - Internationalization
– Calendars – Gregorian, Fiscal, Lunar, ... Y2K?
• Files and Fields
– Separation of Data and Metadata
– Modern versions -> XML, JSON
• Joins!
– Data Sets – Super types, Sub types
– Associations describe Networks!
copywrite, Lucidata, 2013
Technology’s Improvement Pace
copywrite, Lucidata, 2013
... and Demand Forecast
copywrite, Lucidata, 2013
Separation of Church and State
• Operational uses
– Capture the data, hand-entered <- validation
– A Data Flow, such as Order to Cash cycle
– Con-way example of PRO(-gressive) numbers
• Analytical uses
– Desire for reports, Reporting crashes the
Operational cycle, Cash flow problem.
– Banished from OLTP, go make an ODS
copywrite, Lucidata, 2013
The Star Schema
The purpose of business computers is to sort data. A graphical
representation of sorted data is called a ‘Star Schema’.
– Michael Silves, Principal at Datamorphosis
• The right design at the right time, becomes default doctrine for DW
– Early RDBMS (Relational Data Base Management Systems)
• Low memory, slow disks, slow CPU
• Big Demand, with questions that spanned the datasets
• Performance issues over large datasets
– Interview Business people to get questions
• Pre-process the data, based on business questions
– Separation into Dimensions and Facts/Metrics
• Link to Business Semantics
• OLAP (On-Line Analytical Processing)
• Educate Users on Aggregation and Allocation
• Conformed Dimensions across Departments to give an Enterprise-wide view of the data.
• But as technology changes, problems emerge
– Ad-hoc questions require redesign & rework
– With business hierarchies when one concept is both a fact & dimension, e.g. Shipment
– Fact tables become difficult to distribute for MPP ... e.g. Teradata prefers a normalized DW
• Example – transportation networks
copywrite, Lucidata, 2013
Example – Multi-Modal Freight
• Shipments are agreements between a Carrier and a
Shipper to move goods between two places.
• Shipments can be split into “ProFreight” (which is
assigned a cost via activity-based costing).
• Shipments/ProFreight are composed of Freight
handling units.
• Freight can be “re-tendered” to another carrier, in
which case is is linked to the original and the new
Shipment.
• Freight moves between places on one or many “VFCs”
or Containers.
• Containers are moved between places on Trips.
copywrite, Lucidata, 2013
Kimball on Transportation, 3NF
copywrite, Lucidata, 2013
Kimball on Transportation, Star
copywrite, Lucidata, 2013
Table Level DW diagram
copywrite, Lucidata, 2013
Dim Modeling Dogma
• “Our carefully normalized data model can not
be translated into a star schema... “
– Dimensional modeling is necessary in order to
generate correct queries
– Any (normalized) data model can be transformed
in a dimensional model...
– ... and there exists an algorithm to do it
copywrite, Lucidata, 2013
Dim Modeling Example
copywrite, Lucidata, 2013
Star option considered
copywrite, Lucidata, 2013
Bridge table
(remember, we tried this)
We tried this with
hesmith When
selecting a main
hierarchy is has
too much of a
downside, and
you don’t have a
weight factor …
copywrite, Lucidata, 2013
Multi-fact option considered
copywrite, Lucidata, 2013
Oracle’s Algorithmic approach
copywrite, Lucidata, 2013
Basic DW diagram
copywrite, Lucidata, 2013
Build Dimensional Model in BI
copywrite, Lucidata, 2013
Freight moves through Networks
copywrite, Lucidata, 2013
Information Factory & MPP
• Normalized Base
– Integrate data once
• Source -> Normalized -> Denormalized -> OK
• Source -> Denormalized? -> Un-normalized -> ?
– Detect problems and fix them once!
• Does not preclude Data Marts
• Massive Parallel Processing
– Data distribution
• Optimizations – Broadcast, Co-location, Re-distribution
• Scalability, the quest for 1:1
• Normalized data - reduced IO, better match for
copywrite, Lucidata, 2013
Bob Conway’s Rapid Methodology
copywrite, Lucidata, 2013
Core Model with many Roles
Transaction
Tables
Reference Tables
copywrite, Lucidata, 2013
Power of Conformed Dimensions
copywrite, Lucidata, 2013
Example Data Model & Hierarchy
copywrite, Lucidata, 2013
Data Flow and Usage
copywrite, Lucidata, 2013
Cubes and In-memory BI
• Multi-Dimensional OLAP (MOLAP)
– Drag-and-Drop OLAP environment, analysts
become capable of self-service.
– Dealt with Ragged Hierarchies, common in
Financial data such as General Ledger (GL)
– Limited by memory size
– Pressure for more dimensionality floods cube size,
build times from relational sources exceed load
windows ...
• Relational OLAP (ROLAP)
copywrite, Lucidata, 2013
But a network this size choked it
copywrite, Lucidata, 2013
Columnar vs Row-wise
• Physically store data by Column vs Row
– Rather like Fifth Normal Form.
– If Semantically Organized, then Rapid Response to
user’s ad-hoc aggregation requests.
– Prefers batch loading, always loads once per
column, even if loading one row.
• Continues to Appear and Operate as a normal
Row-wise cousin.
copywrite, Lucidata, 2013
Columnar IO example
Compression becomes
much more effective
Reading a Column is
like reading a Row
copywrite, Lucidata, 2013
Design Pattern for Log Data
Data Stewards for
Master Data
Data Stewards for
Metadata
Architects
integrate data
and metadata
Architects
organize data for
analysis with
physical in mind
Architects identify levels for
analysis, and distributionColumnar
MPP
copywrite, Lucidata, 2013
Importance of Reference Data
copywrite, Lucidata, 2013
Infobright’s Database Landscape 2011
copywrite, Lucidata, 2013
Analytic Database Comparison
Actian
ParAccel
IBM
Netezza
HP
Vertica
Green
plum
Tera
data
Sybase
IQ
copywrite, Lucidata, 2013
Gartner’s Magic Quadrant
copywrite, Lucidata, 2013
Hadoop (Cloudera & Hortonworks)
“Although it’s true that Hadoop can be valuable as an analytic silo, most
organizations will prefer to get the most business value out of Hadoop by
integrating it with—or into—their BI, DW, DI, and analytics technology
stacks.” – Philip Russom TDWI
http://tdwi.org/webcasts/2013/04/integrating-hadoop-into-business-intelligence-and-data-warehousing.aspx
copywrite, Lucidata, 2013
Hadoop for Analytics?
Analytics performs
best on Structured
Data, for good
reasons.
Maintain MPP strengths in
the solution through
Architecture.
copywrite, Lucidata, 2013
Message from Hortonworks (Hadoop)
“Although it’s true that Hadoop can be valuable as an analytic silo, most
organizations will prefer to get the most business value out of Hadoop by
integrating it with—or into—their BI, DW, DI, and analytics technology
stacks.” – Philip Russom TDWI
http://tdwi.org/webcasts/2013/04/integrating-hadoop-into-business-intelligence-and-data-warehousing.aspxcopywrite, Lucidata, 2013
Hadoop as ETL
copywrite, Lucidata, 2013
Data Flow Reference Architecture
copywrite, Lucidata, 2013
Message from Neo4J NoSQL
copywrite, Lucidata, 2013
Message from MongoDB (NoSQL)
http://www.slideshare.net/fullscreen/mongodb/schema-design-by-example/1copywrite, Lucidata, 2013
Message from Couchbase (NoSQL)
http://www.couchbase.com/why-nosql/nosql-databasecopywrite, Lucidata, 2013

Mais conteúdo relacionado

Mais procurados

Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
pcherukumalla
 
Optimize Workloads with IBM Solutions and Services
Optimize Workloads with IBM Solutions and ServicesOptimize Workloads with IBM Solutions and Services
Optimize Workloads with IBM Solutions and Services
IBM India Smarter Computing
 
Austin fraser sap hana presentation
Austin fraser sap hana presentationAustin fraser sap hana presentation
Austin fraser sap hana presentation
Shane Sale
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business Intelligence
Prithwis Mukerjee
 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)
Bikramjit Sarkar, Ph.D.
 

Mais procurados (20)

Bi Dw Presentation
Bi Dw PresentationBi Dw Presentation
Bi Dw Presentation
 
Mr bi
Mr biMr bi
Mr bi
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
BA Summit 2014 Ontdek de nieuwe mogelijkheden van IBM SPSS Modeler 16.0
BA Summit 2014 Ontdek de nieuwe mogelijkheden van IBM SPSS Modeler 16.0BA Summit 2014 Ontdek de nieuwe mogelijkheden van IBM SPSS Modeler 16.0
BA Summit 2014 Ontdek de nieuwe mogelijkheden van IBM SPSS Modeler 16.0
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!Mammothdb - Public VC Pitchdeck!
Mammothdb - Public VC Pitchdeck!
 
Column Oriented Databases
Column Oriented DatabasesColumn Oriented Databases
Column Oriented Databases
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Project+team+1 slides (2)
Project+team+1 slides (2)Project+team+1 slides (2)
Project+team+1 slides (2)
 
A hadoop map reduce
A hadoop map reduceA hadoop map reduce
A hadoop map reduce
 
BI architecture presentation and involved models (short)
BI architecture presentation and involved models (short)BI architecture presentation and involved models (short)
BI architecture presentation and involved models (short)
 
Optimize Workloads with IBM Solutions and Services
Optimize Workloads with IBM Solutions and ServicesOptimize Workloads with IBM Solutions and Services
Optimize Workloads with IBM Solutions and Services
 
7 - Enterprise IT in Action
7 - Enterprise IT in Action7 - Enterprise IT in Action
7 - Enterprise IT in Action
 
Austin fraser sap hana presentation
Austin fraser sap hana presentationAustin fraser sap hana presentation
Austin fraser sap hana presentation
 
What exactly is Business Intelligence?
What exactly is Business Intelligence?What exactly is Business Intelligence?
What exactly is Business Intelligence?
 
SAP HANA Integrated with Microstrategy
SAP HANA Integrated with MicrostrategySAP HANA Integrated with Microstrategy
SAP HANA Integrated with Microstrategy
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business Intelligence
 
Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)Data Mining and Data Warehousing (MAKAUT)
Data Mining and Data Warehousing (MAKAUT)
 
Keynote Sap UA Conference March 23 a zeier final
Keynote Sap UA Conference March 23 a zeier  finalKeynote Sap UA Conference March 23 a zeier  final
Keynote Sap UA Conference March 23 a zeier final
 
Resume Pallavi Mishra as of 2017 Feb
Resume Pallavi Mishra as of 2017 FebResume Pallavi Mishra as of 2017 Feb
Resume Pallavi Mishra as of 2017 Feb
 

Semelhante a Data modelingzone geoffrey-clark-v2

The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
Terry Bunio
 
Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010
ERwin Modeling
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
Fabio Fumarola
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
Ben Stopford
 

Semelhante a Data modelingzone geoffrey-clark-v2 (20)

The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
 
C* Summit 2013: Data Modelers Still Have Jobs - Adjusting For the NoSQL Envir...
C* Summit 2013: Data Modelers Still Have Jobs - Adjusting For the NoSQL Envir...C* Summit 2013: Data Modelers Still Have Jobs - Adjusting For the NoSQL Envir...
C* Summit 2013: Data Modelers Still Have Jobs - Adjusting For the NoSQL Envir...
 
2009/11 Database Architechs Presentation
2009/11   Database Architechs Presentation2009/11   Database Architechs Presentation
2009/11 Database Architechs Presentation
 
BI Introduction
BI IntroductionBI Introduction
BI Introduction
 
Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?Agile & Data Modeling – How Can They Work Together?
Agile & Data Modeling – How Can They Work Together?
 
Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010Mastering your data with ca e rwin dm 09082010
Mastering your data with ca e rwin dm 09082010
 
Integrating Semantic Web with the Real World - A Journey between Two Cities ...
Integrating Semantic Web with the Real World  - A Journey between Two Cities ...Integrating Semantic Web with the Real World  - A Journey between Two Cities ...
Integrating Semantic Web with the Real World - A Journey between Two Cities ...
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
 
Information processing architectures
Information processing architecturesInformation processing architectures
Information processing architectures
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database World
 
Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2Bbbt presentation 210415_final_2
Bbbt presentation 210415_final_2
 
86921864 olap-case-study-vj
86921864 olap-case-study-vj86921864 olap-case-study-vj
86921864 olap-case-study-vj
 
Big learning 1.2
Big learning   1.2Big learning   1.2
Big learning 1.2
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
 
One Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database RevolutionOne Size Doesn't Fit All: The New Database Revolution
One Size Doesn't Fit All: The New Database Revolution
 
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo BrignoliL'architettura di classe enterprise di nuova generazione - Massimo Brignoli
L'architettura di classe enterprise di nuova generazione - Massimo Brignoli
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 

Último

sample sample sample sample sample sample
sample sample sample sample sample samplesample sample sample sample sample sample
sample sample sample sample sample sample
Casey Keith
 
Sample sample sample sample sample sample
Sample sample sample sample sample sampleSample sample sample sample sample sample
Sample sample sample sample sample sample
Casey Keith
 
sample sample sample sample sample sample
sample sample sample sample sample samplesample sample sample sample sample sample
sample sample sample sample sample sample
Casey Keith
 
💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati
💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati
💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati
Apsara Of India
 
Ahmedabad Escort Service Ahmedabad Call Girl 0000000000
Ahmedabad Escort Service Ahmedabad Call Girl 0000000000Ahmedabad Escort Service Ahmedabad Call Girl 0000000000
Ahmedabad Escort Service Ahmedabad Call Girl 0000000000
mountabuangels4u
 

Último (20)

sample sample sample sample sample sample
sample sample sample sample sample samplesample sample sample sample sample sample
sample sample sample sample sample sample
 
Genuine 8250077686 Hot and Beautiful 💕 Chennai Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Chennai Escorts call GirlsGenuine 8250077686 Hot and Beautiful 💕 Chennai Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Chennai Escorts call Girls
 
Sample sample sample sample sample sample
Sample sample sample sample sample sampleSample sample sample sample sample sample
Sample sample sample sample sample sample
 
Genuine 8250077686 Hot and Beautiful 💕 Bhavnagar Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Bhavnagar Escorts call GirlsGenuine 8250077686 Hot and Beautiful 💕 Bhavnagar Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Bhavnagar Escorts call Girls
 
sample sample sample sample sample sample
sample sample sample sample sample samplesample sample sample sample sample sample
sample sample sample sample sample sample
 
💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati
💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati
💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati
 
Hire 💕 8617697112 Surat Call Girls Service Call Girls Agency
Hire 💕 8617697112 Surat Call Girls Service Call Girls AgencyHire 💕 8617697112 Surat Call Girls Service Call Girls Agency
Hire 💕 8617697112 Surat Call Girls Service Call Girls Agency
 
Night 7k to 12k Daman Call Girls 👉👉 8617697112⭐⭐ 100% Genuine Escort Service ...
Night 7k to 12k Daman Call Girls 👉👉 8617697112⭐⭐ 100% Genuine Escort Service ...Night 7k to 12k Daman Call Girls 👉👉 8617697112⭐⭐ 100% Genuine Escort Service ...
Night 7k to 12k Daman Call Girls 👉👉 8617697112⭐⭐ 100% Genuine Escort Service ...
 
Ahmedabad Escort Service Ahmedabad Call Girl 0000000000
Ahmedabad Escort Service Ahmedabad Call Girl 0000000000Ahmedabad Escort Service Ahmedabad Call Girl 0000000000
Ahmedabad Escort Service Ahmedabad Call Girl 0000000000
 
❤Personal Contact Number Mcleodganj Call Girls 8617697112💦✅.
❤Personal Contact Number Mcleodganj Call Girls 8617697112💦✅.❤Personal Contact Number Mcleodganj Call Girls 8617697112💦✅.
❤Personal Contact Number Mcleodganj Call Girls 8617697112💦✅.
 
Genuine 8250077686 Hot and Beautiful 💕 Hosur Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Hosur Escorts call GirlsGenuine 8250077686 Hot and Beautiful 💕 Hosur Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Hosur Escorts call Girls
 
Ooty call girls 📞 8617697112 At Low Cost Cash Payment Booking
Ooty call girls 📞 8617697112 At Low Cost Cash Payment BookingOoty call girls 📞 8617697112 At Low Cost Cash Payment Booking
Ooty call girls 📞 8617697112 At Low Cost Cash Payment Booking
 
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room package
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room packageWhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room package
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room package
 
Mathura Call Girls 8250077686 Service Offer VIP Hot Model
Mathura Call Girls 8250077686 Service Offer VIP Hot ModelMathura Call Girls 8250077686 Service Offer VIP Hot Model
Mathura Call Girls 8250077686 Service Offer VIP Hot Model
 
Andheri East Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Andheri East Call Girls 🥰 8617370543 Service Offer VIP Hot ModelAndheri East Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Andheri East Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Genuine 9332606886 Hot and Beautiful 💕 Pune Escorts call Girls
Genuine 9332606886 Hot and Beautiful 💕 Pune Escorts call GirlsGenuine 9332606886 Hot and Beautiful 💕 Pune Escorts call Girls
Genuine 9332606886 Hot and Beautiful 💕 Pune Escorts call Girls
 
Kurnool Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Kurnool Call Girls 🥰 8617370543 Service Offer VIP Hot ModelKurnool Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Kurnool Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Elevate Your Busy Season Email Marketing, Holly May Webinar.pptx
Elevate Your Busy Season Email Marketing, Holly May Webinar.pptxElevate Your Busy Season Email Marketing, Holly May Webinar.pptx
Elevate Your Busy Season Email Marketing, Holly May Webinar.pptx
 
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls Agency
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls AgencyHire 💕 8617697112 Reckong Peo Call Girls Service Call Girls Agency
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls Agency
 
❤Personal Contact Number Varanasi Call Girls 8617697112💦✅.
❤Personal Contact Number Varanasi Call Girls 8617697112💦✅.❤Personal Contact Number Varanasi Call Girls 8617697112💦✅.
❤Personal Contact Number Varanasi Call Girls 8617697112💦✅.
 

Data modelingzone geoffrey-clark-v2

  • 1. Physical Database Design for MPP and Columnar Databases Geoffrey Clark Principal at Lucidata, Inc. September 2013 copywrite, Lucidata, 2013
  • 2. Conceptual, Logical, Physical • Conceptual links to Business Strategy. – This is now becoming more quantitative • Logical maps to the Business Semantics. – Con-way example • Physical maps to your Data Stores – These will be more varied and heterogeneous in the future, due to specialization. copywrite, Lucidata, 2013
  • 3. HBR Business Strategy The New Dynamics of Competition, Michael D. Ryall, Harvard Business Review, June 2013 Michael Porter’s Five Forces has dominated strategic and competitive analysis since 1979. This analysis has largely been conceptual in nature. Quantitative analysis on structured data in context is changing the nature of business culture, and improving business decisions. This drives the demand for data modeling and management. copywrite, Lucidata, 2013
  • 4. Design and Evolution • Hierarchies – 14th Century Europe and the Financial Revolution – Aggregations & Allocations • Cards, Tapes – physical analog media • Computer Science – Moore’s Law • Processor Speed Improvements • Memory Improvements • Media Improvements – Punch Cards, Tape, Disk, Memory • Design for Context & the Future – Character encoding - Internationalization – Calendars – Gregorian, Fiscal, Lunar, ... Y2K? • Files and Fields – Separation of Data and Metadata – Modern versions -> XML, JSON • Joins! – Data Sets – Super types, Sub types – Associations describe Networks! copywrite, Lucidata, 2013
  • 6. ... and Demand Forecast copywrite, Lucidata, 2013
  • 7. Separation of Church and State • Operational uses – Capture the data, hand-entered <- validation – A Data Flow, such as Order to Cash cycle – Con-way example of PRO(-gressive) numbers • Analytical uses – Desire for reports, Reporting crashes the Operational cycle, Cash flow problem. – Banished from OLTP, go make an ODS copywrite, Lucidata, 2013
  • 8. The Star Schema The purpose of business computers is to sort data. A graphical representation of sorted data is called a ‘Star Schema’. – Michael Silves, Principal at Datamorphosis • The right design at the right time, becomes default doctrine for DW – Early RDBMS (Relational Data Base Management Systems) • Low memory, slow disks, slow CPU • Big Demand, with questions that spanned the datasets • Performance issues over large datasets – Interview Business people to get questions • Pre-process the data, based on business questions – Separation into Dimensions and Facts/Metrics • Link to Business Semantics • OLAP (On-Line Analytical Processing) • Educate Users on Aggregation and Allocation • Conformed Dimensions across Departments to give an Enterprise-wide view of the data. • But as technology changes, problems emerge – Ad-hoc questions require redesign & rework – With business hierarchies when one concept is both a fact & dimension, e.g. Shipment – Fact tables become difficult to distribute for MPP ... e.g. Teradata prefers a normalized DW • Example – transportation networks copywrite, Lucidata, 2013
  • 9. Example – Multi-Modal Freight • Shipments are agreements between a Carrier and a Shipper to move goods between two places. • Shipments can be split into “ProFreight” (which is assigned a cost via activity-based costing). • Shipments/ProFreight are composed of Freight handling units. • Freight can be “re-tendered” to another carrier, in which case is is linked to the original and the new Shipment. • Freight moves between places on one or many “VFCs” or Containers. • Containers are moved between places on Trips. copywrite, Lucidata, 2013
  • 10. Kimball on Transportation, 3NF copywrite, Lucidata, 2013
  • 11. Kimball on Transportation, Star copywrite, Lucidata, 2013
  • 12. Table Level DW diagram copywrite, Lucidata, 2013
  • 13. Dim Modeling Dogma • “Our carefully normalized data model can not be translated into a star schema... “ – Dimensional modeling is necessary in order to generate correct queries – Any (normalized) data model can be transformed in a dimensional model... – ... and there exists an algorithm to do it copywrite, Lucidata, 2013
  • 16. Bridge table (remember, we tried this) We tried this with hesmith When selecting a main hierarchy is has too much of a downside, and you don’t have a weight factor … copywrite, Lucidata, 2013
  • 19. Basic DW diagram copywrite, Lucidata, 2013
  • 20. Build Dimensional Model in BI copywrite, Lucidata, 2013
  • 21. Freight moves through Networks copywrite, Lucidata, 2013
  • 22. Information Factory & MPP • Normalized Base – Integrate data once • Source -> Normalized -> Denormalized -> OK • Source -> Denormalized? -> Un-normalized -> ? – Detect problems and fix them once! • Does not preclude Data Marts • Massive Parallel Processing – Data distribution • Optimizations – Broadcast, Co-location, Re-distribution • Scalability, the quest for 1:1 • Normalized data - reduced IO, better match for copywrite, Lucidata, 2013
  • 23. Bob Conway’s Rapid Methodology copywrite, Lucidata, 2013
  • 24. Core Model with many Roles Transaction Tables Reference Tables copywrite, Lucidata, 2013
  • 25. Power of Conformed Dimensions copywrite, Lucidata, 2013
  • 26. Example Data Model & Hierarchy copywrite, Lucidata, 2013
  • 27. Data Flow and Usage copywrite, Lucidata, 2013
  • 28. Cubes and In-memory BI • Multi-Dimensional OLAP (MOLAP) – Drag-and-Drop OLAP environment, analysts become capable of self-service. – Dealt with Ragged Hierarchies, common in Financial data such as General Ledger (GL) – Limited by memory size – Pressure for more dimensionality floods cube size, build times from relational sources exceed load windows ... • Relational OLAP (ROLAP) copywrite, Lucidata, 2013
  • 29. But a network this size choked it copywrite, Lucidata, 2013
  • 30. Columnar vs Row-wise • Physically store data by Column vs Row – Rather like Fifth Normal Form. – If Semantically Organized, then Rapid Response to user’s ad-hoc aggregation requests. – Prefers batch loading, always loads once per column, even if loading one row. • Continues to Appear and Operate as a normal Row-wise cousin. copywrite, Lucidata, 2013
  • 31. Columnar IO example Compression becomes much more effective Reading a Column is like reading a Row copywrite, Lucidata, 2013
  • 32. Design Pattern for Log Data Data Stewards for Master Data Data Stewards for Metadata Architects integrate data and metadata Architects organize data for analysis with physical in mind Architects identify levels for analysis, and distributionColumnar MPP copywrite, Lucidata, 2013
  • 33. Importance of Reference Data copywrite, Lucidata, 2013
  • 34. Infobright’s Database Landscape 2011 copywrite, Lucidata, 2013
  • 37. Hadoop (Cloudera & Hortonworks) “Although it’s true that Hadoop can be valuable as an analytic silo, most organizations will prefer to get the most business value out of Hadoop by integrating it with—or into—their BI, DW, DI, and analytics technology stacks.” – Philip Russom TDWI http://tdwi.org/webcasts/2013/04/integrating-hadoop-into-business-intelligence-and-data-warehousing.aspx copywrite, Lucidata, 2013
  • 38. Hadoop for Analytics? Analytics performs best on Structured Data, for good reasons. Maintain MPP strengths in the solution through Architecture. copywrite, Lucidata, 2013
  • 39. Message from Hortonworks (Hadoop) “Although it’s true that Hadoop can be valuable as an analytic silo, most organizations will prefer to get the most business value out of Hadoop by integrating it with—or into—their BI, DW, DI, and analytics technology stacks.” – Philip Russom TDWI http://tdwi.org/webcasts/2013/04/integrating-hadoop-into-business-intelligence-and-data-warehousing.aspxcopywrite, Lucidata, 2013
  • 40. Hadoop as ETL copywrite, Lucidata, 2013
  • 41. Data Flow Reference Architecture copywrite, Lucidata, 2013
  • 42. Message from Neo4J NoSQL copywrite, Lucidata, 2013
  • 43. Message from MongoDB (NoSQL) http://www.slideshare.net/fullscreen/mongodb/schema-design-by-example/1copywrite, Lucidata, 2013
  • 44. Message from Couchbase (NoSQL) http://www.couchbase.com/why-nosql/nosql-databasecopywrite, Lucidata, 2013

Notas do Editor

  1. Jeff Kibler @ Infobright