Mais conteúdo relacionado Semelhante a Turning the tables: The Columnar Alternative (20) Turning the tables: The Columnar Alternative1. Turning the Tables
– The Columnar Alternative
SkySQL & MariaDB: Solutions Day
Calpont Proprietary and Confidential
®
2. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.2
Agenda
• Who are we?
• Columnar database basics
oStructural differences
• Understanding workloads
oQuery Vision/Scope (OLTP vs. Analytic)
oQuery Variety (Static vs. Ad-Hoc/Dynamic)
oData Volume, Data Structure
• Putting it all together
3. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.
Calpont and InfiniDB
• Calpont Corporation
oHeadquartered in Frisco, TX
oTeam members in California, Colorado, Boston
3
• Products
oInfiniDB Community Initial release Oct 2009
Latest release 2.2
oInfiniDB Enterprise Initial release Feb 2010
Latest release 3.6
®
5. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.5
Row-Oriented vs. Column-Oriented
Row-oriented: rows stored sequentially
Column-oriented: each column is stored in
a separate file
Each column for a given row is at the same offset.
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Index Key RowID
1 223346757356
2 223346757123
3 223346755340
4 223346894343
5 223346757120
Index Key RowID
1 223346757356
2 223346757123
3 223346755340
4 223346894343
5 223346757120
Index Key RowID
1 223346757356
2 223346757123
3 223346755340
4 223346894343
5 223346757120
6. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.6
Columnar Implicit Row Identifier
• Implicit row identifier with columnar.
• Avoidance of record and field meta-data with columnar.
7. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.7
Single-Row Operation (Insertion)
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Row-oriented: new row inserted
Column-oriented: value deleted from each file
6 Marvin Martian CA 91602 (818) 761-9964 26 M
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Index Key RowID
1 223346757356
2 223346757123
3 223346755340
4 223346894343
5 223346757120
6 223346757121
8. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.8
Single-Row Operation (Deletion)
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Row-oriented: new rows deleted
Column-oriented: value deleted from each file
6 Marvin Martian CA 91602 (818) 761-9964 26 M
6 Marvin Martian CA 91602 (818) 761-9964 26 M
Index Key RowID
1 223346757356
2 223346757123
3 223346755340
4 223346894343
5 223346757120
6 223346757121
9. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.9
Update Operations
Row-oriented: Update 100% of rows means
change 100% of blocks on disk.
Column-oriented: Update just the blocks needed
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
10. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.10
Single-Row Operations
•Columnar not efficient for singleton insertions.
•Columnar not efficient for singleton deletions.
•Columnar efficient for ranged column updates.
•Columnar efficient for batched inserts -bulk load
•Columnar efficient for batched partition drop.
11. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.11
Add a New Column
Row-oriented: Usually requires rebuilding table
Column-oriented: Create another file
Key Fname Lname State Zip Phone Age Sex
1 Bugs Bunny NY 11217 (718) 938-3235 34 M
2 Yosemite Sam CA 95389 (209) 375-6572 52 M
3 Daffy Duck NY 10013 (212) 227-1810 35 M
4 Elmer Fudd ME 04578 (207) 882-7323 43 M
5 Witch Hazel MA 01970 (978) 744-0991 57 F
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Golf
Y
N
Y
Y
N
Golf
Y
N
Y
Y
N
12. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.12
Add a New Column
• Columnar very flexible around adding columns.
• No table rebuild required with columnar.
13. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.13
Columnar Basic Differences
• What we know so far:
o Columnar not suited for OLTP style individual row insertions/deletions.
o Columnar slower than a well-tuned index when finding individual
rows.
• But wait, columnar databases actually load faster? How?
o Avoiding transactional load in favour of batching.
14. Workloads
• Query Vision/Scope (OLTP vs. Analytic)
• Query Variety (Static vs. Ad-Hoc/Dynamic)
• Data Volume, Data Structure
Calpont Proprietary and Confidential
15. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.15
Workload – Query Vision/Scope
Forest
Tree
Query Vision/Scope
1 100 10,000 1,000,000 100,000,000 10,000,000,000
16. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.16
Workload – Query Vision/Scope
1 100 10,000 1,000,000 100,000,000 10,000,000,000
Query Vision/Scope
OLTP Workloads Analytic Workloads
General purpose DBMS missed the target
( dated database technology generally not optimal )
17. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.17
Where are your workloads?
1 100 10,000 1,000,000 100,000,000 10,000,000,000
Query Vision/Scope
OLTP Workloads Analytic Workloads
• Most customers do both, and we recommend two engines
o May require ETL or Asynchronous Replication (Tungsten)
• If your Analytic workloads are small, probably don’t need columnar
• If your transactional workloads are small, then don’t need row
18. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.18
Workload – Query Variety
1 10 100 1000 10000
How many different types of Analysis are done?
How many dimensions? ( How many indexes? )
Static Business
Requirements
Ad-Hoc/Dynamic
Business Requirements
19. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.19
Workload – Query Variety
1 10 100 1000 10000
• If you can easily cover your queries with a couple of indexes and
business requirements change slowly: then you may not need a
columnar DBMS.
• If you need more Analytics, faster Analytics, and faster deployments
of new Analytics, then columnar DBMS is a good fit.
How many different types of Analysis are done?
How many dimensions? ( How many indexes? )
Static Business
Requirements
Ad-Hoc/Dynamic
Business Requirements
20. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.20
Data Volume
1 100 10,000 1,000,000 100,000,000 10,000,000,000
Total Rows Stored
Analytics Optimized DBMS (Columnar)
+
OLTP Optimized DBMS (shards or other)
General purpose DBMS can be suitable at
small scales
21. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.21
Data Volume
1 100 10,000 1,000,000 100,000,000 10,000,000,000
Total Rows Stored
Analytics Query + Big Data
= Columnar + MPP
1 100 10,000 1,000,000 100,000,000 10,000,000,000
Query Vision/Scope
• Some Columnar DBMS also offer MPP (Massively Parallel
Processing) to distribute workload to the data nodes.
22. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.22
Data Structure
Key Varchar_8000
1 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna
2 aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
3 Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint
4 occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
5 Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna
Key
1
2
3
4
5
Row-oriented: heavy text usage
Column-oriented: heavy text usage
Varchar_8000
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint
occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna
• Columnar DBMS and Row DBMS I/O will be about the same.
• Candidate for Sphinx Search or other tool.
23. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.23
Data Structure - Flexibility
• Columnar allows for on-line schema modifications.
• No penalty for infrequently used columns.
• Sparse columns will compress to virtually nothing.
24. InfiniDB® Scalable. Fast. Simple. Copyright © 2011 Calpont. All Rights Reserved.24
Putting it all together
• Designed for massive, high performance analytics
• Designed for ad-hoc flexibility
• Not suited for OLTP, KeyValue, NoSQL workloads
• Hadoop connectivity and beyond
Notas do Editor A better mental picture is a high-speed scalable architecture with rich SQL functionality layered on top and tightly integrated.