SlideShare uma empresa Scribd logo
1 de 25
Turning the Tables
– The Columnar Alternative
SkySQL & MariaDB: Solutions Day
Calpont Proprietary and Confidential
®
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.2
Agenda
• Who are we?
• Columnar database basics
oStructural differences
• Understanding workloads
oQuery Vision/Scope (OLTP vs. Analytic)
oQuery Variety (Static vs. Ad-Hoc/Dynamic)
oData Volume, Data Structure
• Putting it all together
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.
Calpont and InfiniDB
• Calpont Corporation
oHeadquartered in Frisco, TX
oTeam members in California, Colorado, Boston
3
• Products
oInfiniDB Community Initial release Oct 2009
Latest release 2.2
oInfiniDB Enterprise Initial release Feb 2010
Latest release 3.6
®
Introduction to Columnar databases
• Columnar Concepts
Calpont Proprietary and Confidential
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.5
Row-Oriented vs. Column-Oriented
Row-oriented: rows stored sequentially
Column-oriented: each column is stored in
a separate file
Each column for a given row is at the same offset.
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Index Key RowID
1 223346757356
2 223346757123
3 223346755340
4 223346894343
5 223346757120
Index Key RowID
1 223346757356
2 223346757123
3 223346755340
4 223346894343
5 223346757120
Index Key RowID
1 223346757356
2 223346757123
3 223346755340
4 223346894343
5 223346757120
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.6
Columnar Implicit Row Identifier
• Implicit row identifier with columnar.
• Avoidance of record and field meta-data with columnar.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.7
Single-Row Operation (Insertion)
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Row-oriented: new row inserted
Column-oriented: value deleted from each file
6 Marvin Martian CA 91602 (818) 761-9964 26 M
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Index Key RowID
1 223346757356
2 223346757123
3 223346755340
4 223346894343
5 223346757120
6 223346757121
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.8
Single-Row Operation (Deletion)
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Row-oriented: new rows deleted
Column-oriented: value deleted from each file
6 Marvin Martian CA 91602 (818) 761-9964 26 M
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Index Key RowID
1 223346757356
2 223346757123
3 223346755340
4 223346894343
5 223346757120
6 223346757121
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.9
Update Operations
Row-oriented: Update 100% of rows means
change 100% of blocks on disk.
Column-oriented: Update just the blocks needed
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.10
Single-Row Operations
•Columnar not efficient for singleton insertions.
•Columnar not efficient for singleton deletions.
•Columnar efficient for ranged column updates.
•Columnar efficient for batched inserts -bulk load
•Columnar efficient for batched partition drop.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.11
Add a New Column
Row-oriented: Usually requires rebuilding table
Column-oriented: Create another file
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Golf
Y
N
Y
Y
N
Golf
Y
N
Y
Y
N
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.12
Add a New Column
• Columnar very flexible around adding columns.
• No table rebuild required with columnar.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.13
Columnar Basic Differences
• What we know so far:
o Columnar not suited for OLTP style individual row insertions/deletions.
o Columnar slower than a well-tuned index when finding individual
rows.
• But wait, columnar databases actually load faster? How?
o Avoiding transactional load in favour of batching.
Workloads
• Query Vision/Scope (OLTP vs. Analytic)
• Query Variety (Static vs. Ad-Hoc/Dynamic)
• Data Volume, Data Structure
Calpont Proprietary and Confidential
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.15
Workload – Query Vision/Scope
Forest
Tree
Query Vision/Scope
1 100 10,000 1,000,000 100,000,000 10,000,000,000
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.16
Workload – Query Vision/Scope
1 100 10,000 1,000,000 100,000,000 10,000,000,000
Query Vision/Scope
OLTP Workloads Analytic Workloads
General purpose DBMS missed the target
( dated database technology generally not optimal )
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.17
Where are your workloads?
1 100 10,000 1,000,000 100,000,000 10,000,000,000
Query Vision/Scope
OLTP Workloads Analytic Workloads
• Most customers do both, and we recommend two engines
o May require ETL or Asynchronous Replication (Tungsten)
• If your Analytic workloads are small, probably don’t need columnar
• If your transactional workloads are small, then don’t need row
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.18
Workload – Query Variety
1 10 100 1000 10000
How many different types of Analysis are done?
How many dimensions? ( How many indexes? )
Static Business
Requirements
Ad-Hoc/Dynamic
Business Requirements
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.19
Workload – Query Variety
1 10 100 1000 10000
• If you can easily cover your queries with a couple of indexes and
business requirements change slowly: then you may not need a
columnar DBMS.
• If you need more Analytics, faster Analytics, and faster deployments
of new Analytics, then columnar DBMS is a good fit.
How many different types of Analysis are done?
How many dimensions? ( How many indexes? )
Static Business
Requirements
Ad-Hoc/Dynamic
Business Requirements
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.20
Data Volume
1 100 10,000 1,000,000 100,000,000 10,000,000,000
Total Rows Stored
Analytics Optimized DBMS (Columnar)
+
OLTP Optimized DBMS (shards or other)
General purpose DBMS can be suitable at
small scales
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.21
Data Volume
1 100 10,000 1,000,000 100,000,000 10,000,000,000
Total Rows Stored
Analytics Query + Big Data
= Columnar + MPP
1 100 10,000 1,000,000 100,000,000 10,000,000,000
Query Vision/Scope
• Some Columnar DBMS also offer MPP (Massively Parallel
Processing) to distribute workload to the data nodes.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.22
Data Structure
Key Varchar_8000
1 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna
2 aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
3 Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint
4 occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
5 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna
Key
1
2
3
4
5
Row-oriented: heavy text usage
Column-oriented: heavy text usage
Varchar_8000
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint
occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna
• Columnar DBMS and Row DBMS I/O will be about the same.
• Candidate for Sphinx Search or other tool.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.23
Data Structure - Flexibility
• Columnar allows for on-line schema modifications.
• No penalty for infrequently used columns.
• Sparse columns will compress to virtually nothing.
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.24
Putting it all together
• Designed for massive, high performance analytics
• Designed for ad-hoc flexibility
• Not suited for OLTP, KeyValue, NoSQL workloads
• Hadoop connectivity and beyond
InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.
InfiniDB Product Footprint
25

Mais conteúdo relacionado

Semelhante a Turning the tables: The Columnar Alternative

Swift design session - public object storage scalability
Swift design session  - public object storage scalabilitySwift design session  - public object storage scalability
Swift design session - public object storage scalability
Alan Jiang
 
Lessons learned from Isbank - A Story of a DB2 for z/OS Initiative
Lessons learned from Isbank - A Story of a DB2 for z/OS InitiativeLessons learned from Isbank - A Story of a DB2 for z/OS Initiative
Lessons learned from Isbank - A Story of a DB2 for z/OS Initiative
Cuneyt Goksu
 
Cw13 journy to the cloud by mohamed el mofty
Cw13 journy to the cloud by mohamed el moftyCw13 journy to the cloud by mohamed el mofty
Cw13 journy to the cloud by mohamed el mofty
TheInevitableCloud
 
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
DataStax Academy
 
Data center pov 2017 v3
Data center pov 2017 v3Data center pov 2017 v3
Data center pov 2017 v3
Jeff Green
 

Semelhante a Turning the tables: The Columnar Alternative (20)

MariaDB ColumnStore - LONDON MySQL Meetup
MariaDB ColumnStore - LONDON MySQL MeetupMariaDB ColumnStore - LONDON MySQL Meetup
MariaDB ColumnStore - LONDON MySQL Meetup
 
Transactional and Analytics together: MariaDB and ColumnStore
Transactional and Analytics together: MariaDB and ColumnStoreTransactional and Analytics together: MariaDB and ColumnStore
Transactional and Analytics together: MariaDB and ColumnStore
 
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
Sesión técnica: Big Data Analytics con MariaDB ColumnStoreSesión técnica: Big Data Analytics con MariaDB ColumnStore
Sesión técnica: Big Data Analytics con MariaDB ColumnStore
 
Big Data Analytics with MariaDB AX
Big Data Analytics with MariaDB AXBig Data Analytics with MariaDB AX
Big Data Analytics with MariaDB AX
 
Demystifying Columnar Databases
Demystifying Columnar DatabasesDemystifying Columnar Databases
Demystifying Columnar Databases
 
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStoreBig Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
Big Data LDN 2017: Big Data Analytics with MariaDB ColumnStore
 
Delivering fast, powerful and scalable analytics #OPEN18
Delivering fast, powerful and scalable analytics #OPEN18Delivering fast, powerful and scalable analytics #OPEN18
Delivering fast, powerful and scalable analytics #OPEN18
 
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
 
Swift design session - public object storage scalability
Swift design session  - public object storage scalabilitySwift design session  - public object storage scalability
Swift design session - public object storage scalability
 
Lessons learned from Isbank - A Story of a DB2 for z/OS Initiative
Lessons learned from Isbank - A Story of a DB2 for z/OS InitiativeLessons learned from Isbank - A Story of a DB2 for z/OS Initiative
Lessons learned from Isbank - A Story of a DB2 for z/OS Initiative
 
Dean Sheehan [InfluxData] | InfluxDB Time Series Engine Overview | InfluxDays...
Dean Sheehan [InfluxData] | InfluxDB Time Series Engine Overview | InfluxDays...Dean Sheehan [InfluxData] | InfluxDB Time Series Engine Overview | InfluxDays...
Dean Sheehan [InfluxData] | InfluxDB Time Series Engine Overview | InfluxDays...
 
DB2 10 for z/OS Update
DB2 10 for z/OS UpdateDB2 10 for z/OS Update
DB2 10 for z/OS Update
 
Cw13 journy to the cloud by mohamed el mofty
Cw13 journy to the cloud by mohamed el moftyCw13 journy to the cloud by mohamed el mofty
Cw13 journy to the cloud by mohamed el mofty
 
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
Cassandra Summit 2014: A Train of Thoughts About Growing and Scalability — Bu...
 
Hotsos 2012
Hotsos 2012Hotsos 2012
Hotsos 2012
 
Data Science
Data ScienceData Science
Data Science
 
Data center pov 2017 v3
Data center pov 2017 v3Data center pov 2017 v3
Data center pov 2017 v3
 
StorPool Storage presenting at Storage Field Day 25pdf
StorPool Storage presenting at Storage Field Day 25pdfStorPool Storage presenting at Storage Field Day 25pdf
StorPool Storage presenting at Storage Field Day 25pdf
 
DA_04_SQL_Modern_DW.pptx
DA_04_SQL_Modern_DW.pptxDA_04_SQL_Modern_DW.pptx
DA_04_SQL_Modern_DW.pptx
 
Compression for DB2 for z/OS
Compression for DB2 for z/OS Compression for DB2 for z/OS
Compression for DB2 for z/OS
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Turning the tables: The Columnar Alternative

  • 1. Turning the Tables – The Columnar Alternative SkySQL & MariaDB: Solutions Day Calpont Proprietary and Confidential ®
  • 2. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.2 Agenda • Who are we? • Columnar database basics oStructural differences • Understanding workloads oQuery Vision/Scope (OLTP vs. Analytic) oQuery Variety (Static vs. Ad-Hoc/Dynamic) oData Volume, Data Structure • Putting it all together
  • 3. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved. Calpont and InfiniDB • Calpont Corporation oHeadquartered in Frisco, TX oTeam members in California, Colorado, Boston 3 • Products oInfiniDB Community Initial release Oct 2009 Latest release 2.2 oInfiniDB Enterprise Initial release Feb 2010 Latest release 3.6 ®
  • 4. Introduction to Columnar databases • Columnar Concepts Calpont Proprietary and Confidential
  • 5. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.5 Row-Oriented vs. Column-Oriented Row-oriented: rows stored sequentially Column-oriented: each column is stored in a separate file Each column for a given row is at the same offset. Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F Index Key RowID 1 223346757356 2 223346757123 3 223346755340 4 223346894343 5 223346757120 Index Key RowID 1 223346757356 2 223346757123 3 223346755340 4 223346894343 5 223346757120 Index Key RowID 1 223346757356 2 223346757123 3 223346755340 4 223346894343 5 223346757120
  • 6. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.6 Columnar Implicit Row Identifier • Implicit row identifier with columnar. • Avoidance of record and field meta-data with columnar.
  • 7. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.7 Single-Row Operation (Insertion) Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F Row-oriented: new row inserted Column-oriented: value deleted from each file 6 Marvin Martian CA 91602 (818) 761-9964 26 M 6 Marvin Martian CA 91602 (818) 761-9964 26 M Index Key RowID 1 223346757356 2 223346757123 3 223346755340 4 223346894343 5 223346757120 6 223346757121
  • 8. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.8 Single-Row Operation (Deletion) Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F Row-oriented: new rows deleted Column-oriented: value deleted from each file 6 Marvin Martian CA 91602 (818) 761-9964 26 M 6 Marvin Martian CA 91602 (818) 761-9964 26 M Index Key RowID 1 223346757356 2 223346757123 3 223346755340 4 223346894343 5 223346757120 6 223346757121
  • 9. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.9 Update Operations Row-oriented: Update 100% of rows means change 100% of blocks on disk. Column-oriented: Update just the blocks needed Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F
  • 10. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.10 Single-Row Operations •Columnar not efficient for singleton insertions. •Columnar not efficient for singleton deletions. •Columnar efficient for ranged column updates. •Columnar efficient for batched inserts -bulk load •Columnar efficient for batched partition drop.
  • 11. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.11 Add a New Column Row-oriented: Usually requires rebuilding table Column-oriented: Create another file Key Fname Lname State Zip Phone Age Sex 1 Bugs Bunny NY 11217 (718) 938-3235 34 M 2 Yosemite Sam CA 95389 (209) 375-6572 52 M 3 Daffy Duck NY 10013 (212) 227-1810 35 M 4 Elmer Fudd ME 04578 (207) 882-7323 43 M 5 Witch Hazel MA 01970 (978) 744-0991 57 F Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F Golf Y N Y Y N Golf Y N Y Y N
  • 12. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.12 Add a New Column • Columnar very flexible around adding columns. • No table rebuild required with columnar.
  • 13. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.13 Columnar Basic Differences • What we know so far: o Columnar not suited for OLTP style individual row insertions/deletions. o Columnar slower than a well-tuned index when finding individual rows. • But wait, columnar databases actually load faster? How? o Avoiding transactional load in favour of batching.
  • 14. Workloads • Query Vision/Scope (OLTP vs. Analytic) • Query Variety (Static vs. Ad-Hoc/Dynamic) • Data Volume, Data Structure Calpont Proprietary and Confidential
  • 15. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.15 Workload – Query Vision/Scope Forest Tree Query Vision/Scope 1 100 10,000 1,000,000 100,000,000 10,000,000,000
  • 16. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.16 Workload – Query Vision/Scope 1 100 10,000 1,000,000 100,000,000 10,000,000,000 Query Vision/Scope OLTP Workloads Analytic Workloads General purpose DBMS missed the target ( dated database technology generally not optimal )
  • 17. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.17 Where are your workloads? 1 100 10,000 1,000,000 100,000,000 10,000,000,000 Query Vision/Scope OLTP Workloads Analytic Workloads • Most customers do both, and we recommend two engines o May require ETL or Asynchronous Replication (Tungsten) • If your Analytic workloads are small, probably don’t need columnar • If your transactional workloads are small, then don’t need row
  • 18. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.18 Workload – Query Variety 1 10 100 1000 10000 How many different types of Analysis are done? How many dimensions? ( How many indexes? ) Static Business Requirements Ad-Hoc/Dynamic Business Requirements
  • 19. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.19 Workload – Query Variety 1 10 100 1000 10000 • If you can easily cover your queries with a couple of indexes and business requirements change slowly: then you may not need a columnar DBMS. • If you need more Analytics, faster Analytics, and faster deployments of new Analytics, then columnar DBMS is a good fit. How many different types of Analysis are done? How many dimensions? ( How many indexes? ) Static Business Requirements Ad-Hoc/Dynamic Business Requirements
  • 20. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.20 Data Volume 1 100 10,000 1,000,000 100,000,000 10,000,000,000 Total Rows Stored Analytics Optimized DBMS (Columnar) + OLTP Optimized DBMS (shards or other) General purpose DBMS can be suitable at small scales
  • 21. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.21 Data Volume 1 100 10,000 1,000,000 100,000,000 10,000,000,000 Total Rows Stored Analytics Query + Big Data = Columnar + MPP 1 100 10,000 1,000,000 100,000,000 10,000,000,000 Query Vision/Scope • Some Columnar DBMS also offer MPP (Massively Parallel Processing) to distribute workload to the data nodes.
  • 22. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.22 Data Structure Key Varchar_8000 1 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna 2 aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. 3 Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint 4 occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 5 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna Key 1 2 3 4 5 Row-oriented: heavy text usage Column-oriented: heavy text usage Varchar_8000 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna • Columnar DBMS and Row DBMS I/O will be about the same. • Candidate for Sphinx Search or other tool.
  • 23. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.23 Data Structure - Flexibility • Columnar allows for on-line schema modifications. • No penalty for infrequently used columns. • Sparse columns will compress to virtually nothing.
  • 24. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.24 Putting it all together • Designed for massive, high performance analytics • Designed for ad-hoc flexibility • Not suited for OLTP, KeyValue, NoSQL workloads • Hadoop connectivity and beyond
  • 25. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved. InfiniDB Product Footprint 25

Notas do Editor

  1. A better mental picture is a high-speed scalable architecture with rich SQL functionality layered on top and tightly integrated.