SlideShare uma empresa Scribd logo
1 de 45
Baixar para ler offline
Architecture of an Open
Source RDBMS powered by
HBase and Spark
January 12, 2017
Spark Meetup Barcelona
Daniel Gómez Ferro
▪ Introduction to Splice Machine
▪ 1.x: the need for Spark
▪ 2.0: Spark introduction, challenges and wins
▪ Future
2
Agenda
▪ Splice Machine
▪ Distributed database company
▪ Open source
▪ VC-backed
▪ Offices in San Francisco and St Louis (MO)
3
Who are we?
4
What do we do?
The Open Source RDBMS Powered By Hadoop & Spark
ANSI SQL
No retraining or rewrites for SQL-based
analysts, reports, and applications
¼ the Cost
Scales out on
commodity hardware
SQL Scale Out Speed
Transactions
Ensure reliable updates
across multiple rows
Mixed Workloads
Simultaneously support
OLTP and OLAP workloads
Elastic
Increase scale in
just a few minutes
10x Faster
Leverages Spark
in-memory technology
Timeline
5
Where are we?
Founded
2012
v1.0
Nov 2014
v1.5
Oct 2015
v2.0
Open Source
Spark integration
Jul 2016
v2.5
Feb? 2017
1.x: the need for Spark
▪ SQL Parser
▪ SQL Planner
▪ SQL Optimizer
▪ Data storage
▪ Execution Engine
7
RDBMS building blocks
▪ SQL Parser
▪ SQL Planner
▪ SQL Optimizer
▪ Data storage
▪ Execution Engine
8
RDBMS building blocks
Apache Derby
Apache HBase
▪ Distributed Sorted Map
▪ Keys and Values are byte[]
▪ Lexicographically sorted (like a dictionary)
▪ [] < [0x00; 0x00] < [0x01] < [0x01, 0x00]
▪ Key range is partitioned/sharded in ‘regions’
▪ Operations
▪ get(byte[] key) -> byte[]
▪ put(byte[] key, byte[] value)
▪ scan(byte[] start, byte[] stop) -> scanner
▪ Extensions
▪ Coprocessors
9
HBase Introduction
10
HBase Architecture
HDFS
▪ Sits on top of HDFS
▪ Distributed append-only Filesystem
▪ Files are immutable once closed
11
HBase challenges
▪ Sits on top of HDFS
▪ Distributed append-only Filesystem
▪ Files are immutable once closed
▪ Log-Structured Merge Tree
▪ Multi level storage
▪ Inserts are buffered in memory
▪ Flush this buffer to disk, writing indexed, sorted files
12
HBase challenges
In-memory
buffer
[a; m]
File 1 [a; p]
File 2 [f; z]
▪ Sits on top of HDFS
▪ Distributed append-only Filesystem
▪ Files are immutable once closed
▪ Log-Structured Merge Tree
▪ Multi level storage
▪ Inserts are buffered in memory
▪ Flush this buffer to disk, writing indexed, sorted files
13
HBase challenges
File 3 [a; m]
File 1 [a; p]
File 2 [f; z]
In-memory
buffer
[]
14
HBase Architecture
HDFS
15
HBase Architecture
▪ HRegion
▪ MemStore
▪ One or more HFiles
▪ HLog
▪ Writes
▪ Add it to the MemStore
▪ Write it to the HLog
▪ When the MemStore gets big enough
▪ Flush: dump the MemStore into a new HFile
▪ Reads
▪ In parallel from the MemStore and all HFiles
▪ We reused several Derby components
▪ JDBC driver
▪ SQL Parser/Planner/Optimizer
▪ In-memory data formats
▪ Bytecode generation
▪ Developed some custom solutions
▪ TEMP table for transient data (joins, aggregates, etc.)
▪ Task framework (using HBase’s coprocessors)
▪ Connection pooling
▪ Switched Derby’s datastore for HBase
▪ Primary Keys and Indexes make use of HBase’s sorting order
▪ Removing Derby’s assumptions about running on a single machine...
16
Derby - HBase integration
▪ Great for
▪ Operational workloads
▪ Replacing non-scalable RDBMS solutions
▪ SQL support
▪ SQL 99, Indexes, Triggers, Foreign Keys, cost based optimizer...
17
Splice Machine 1.x
▪ Great for
▪ Operational workloads
▪ Replacing non-scalable RDBMS solutions
▪ SQL support
▪ SQL 99, Indexes, Triggers, Foreign Keys, cost based optimizer...
▪ But...
▪ Struggled on analytical queries
▪ HBase’s compactions created instabilities
▪ Minimum latency was too high (due to Task Framework)
18
Splice Machine 1.x
▪ Great for
▪ Operational workloads
▪ Replacing non-scalable RDBMS solutions
▪ SQL support
▪ SQL 99, Indexes, Triggers, Foreign Keys, cost based optimizer...
▪ But...
▪ Struggled on analytical queries
▪ HBase’s compactions created instabilities
▪ Minimum latency was too high (due to Task Framework)
19
Splice Machine 1.x
2.0: Spark introduction,
challenges and wins
▪ Challenging but natural
▪ Matched tree of database operators with RDD transformations
21
Spark Integration
Aggregate
Join
Scan Restriction
Scan
reduceByKey()
join()
newAPIHadoopRDD() filter()
newAPIHadoopRDD()
22
▪ Abstracted away the Spark API
▪ Two implementations
▪ In-memory using Guava’s FluentIterable APIs
▪ Distributed using Spark
▪ SQL operations have a single implementation
▪ In-memory use case:
▪ OLTP workloads
▪ Very low latency
▪ Bring data in, perform computation locally
▪ Anti-pattern in distributed systems, but it works
Unified API
▪ Got rid of TEMP table
▪ Spark maintains temporary data in memory
▪ Got rid of Task Framework
▪ Spark performs the same job, less complexity
▪ Resource isolation
▪ HBase and Spark in separate processes
▪ Analytical queries don’t impact as much HBase stability
23
Spark Integration Benefits
▪ Many serialization boundaries: poor performance
▪ HDFS -> HBase -> Spark -> Client
▪ Task granularity
▪ Too coarse
▪ Multiple Spark contexts
▪ One per HRegionServer
▪ Derby legacy issues
▪ Custom datatypes
▪ Thread context assumptions
24
Integration Problems
▪ Remove serialization boundaries
▪ Hybrid scanners:
▪ Custom InputFormat that reads HFiles directly from HDFS into Spark
▪ Merges those values with a fast scanner on the MemStore
▪ Most data: HDFS -> Spark
▪ Small part: HBase (in-memory) -> Spark
▪ Requires some hooks in HBase
▪ Compactions remove HFiles, Spark might be reading them
▪ Flushes add HFiles
▪ Much better read performance
25
Solving: serialization boundaries
▪ Increase task granularity
▪ HTableInputFormat default is:
▪ 1 region = 1 partition
▪ Each region could be 1 GB or more
▪ SpliceInputFormat subdivides regions into blocks (default 32Mb)
▪ Better parallelism
▪ Better performance
▪ This also needs hooks in HBase (coprocessors)
26
Solving: task granularity
▪ Single shared Spark context
▪ JobServer wasn’t good enough
▪ It would become a bottleneck, results would go through it
▪ Custom JobServer (called OLAPServer)
▪ Single Spark context on this server
▪ Currently colocated with the HMaster (fault tolerant for free)
▪ Makes Spark jobs stream results directly to the client
▪ Runs several partitions in parallel
▪ Starts streaming as soon as there’s data
27
Solving: multiple Spark contexts
28
JobServer vs OLAPServerTime
JobServer
Start partition 1
Next row
Next row
…
End partition 1, send
Start partition 2
Next row
Next row
…
End partition 2, send
Client
Run partition 1
Get results
Run partition 2
Get results
During this time the client is
blocked waiting for more data
29
JobServer vs OLAPServerTime
JobServer
Start partition 1
Next row
Next row
…
End partition 1, send
Start partition 2
Next row
Next row
…
End partition 2, send
Client
Run partition 1
Get results
Run partition 2
Get results
OLAPServer
Start partition 1, 2, 3
Get and send row
Get and send row
…
End partition 1
Start partition 4
Get and send row
Get and send row
…
End partition 2, 3
Start partition 5, 6
Client
Run partition 1, 2, 3
Get result
Get result
Run partition 4
Get result
Get result
Run partition 5, 6
30
Single shared Spark context
▪ Custom datatypes:
▪ Custom Kryo serializers for Derby objects
▪ Thread contexts
▪ Not completely solved
▪ Use TaskContext.addTaskCompletionListener() to cleanup after ourselves
▪ Still finding resource leaks from time to time...
31
Solving: Derby legacy issues
▪ HBase compactions in Spark:
▪ HBase compactions can be expensive
▪ Reading and writing lots of data
▪ If they happen in the HBase JVM they can kill OLTP performance
▪ We made possible running them in Spark
▪ Maintaining data locality
▪ Scheduled among other jobs
▪ Fallback to HBase if the Spark scheduler doesn’t have resources
32
Other Spark goodies
33
Other Spark goodies
▪ Integration with Spark streaming:
▪ We can ingest data directly from Spark Streaming
▪ Easy to write to Splice Machine from Kafka through it
Future work and roadmap
35
▪ Move to DataFrame APIs
▪ Catalyst optimizer
▪ Whole stage code generation (better than Derby’s codegen)
▪ Already transitioned some operations
▪ Requires good UnsafeRow support
▪ UnsafeRow
▪ Compact in memory representation
▪ Rows are a serialized in a continuous block of memory
▪ Better memory management
▪ Less GC time
Future Spark work
36
Future Spark work
Row1
Current
DataDes[]
SQLDate SomeOtherField
SQLInteger
SQLInteger
UnsafeRow
MemoryBlock: byte[]
Row1
0, 100
Row2
100, 50
37
▪ Columnar storage format
▪ We already have ‘Pinned’ tables:
▪ Create Parquet snapshot of table
▪ Get columnar access
▪ Good for read-only data
▪ Planning on maintaining dual representation
▪ Row-oriented for recently written values
▪ Column-oriented for historical data
▪ Merge those on the fly
Future Spark work
38
▪ Better Spark shell integration
▪ Our SparkContext resides in the OLAPServer
▪ Getting data to a Spark shell incurs a serialization boundary
▪ From Splice’s SparkContext to the shell context
▪ We want to achieve transparent conversion
▪ ResultSet -> DataFrame
Future Spark work
39
▪ Performance increases across the board
▪ TPCC, TPCH, Backup/Restore, ODBC driver…
▪ Incremental backup
▪ Native PL/SQL support (in Beta)
▪ No excuses left for migrating those Oracle databases
▪ Client load balancing/failover
▪ Via HAProxy
▪ Statistics improvements
▪ Histograms, sketching libraries
▪ RDD caching (pinning)
2.5 Roadmap
Thank You!
We are hiring :-)
Open Source:
github.com/splicemachine
Community:
community.splicemachine.com
Slack channel:
tinyurl.com/SMslack
42
TPCH 100 Load Throughput
TPCH 100 Query Times (seconds)
Query
1 395 TRAFODION-2237 99
2 PHOENIX-3322 516 44
3 PHOENIX-3322 TRAFODION-2237 126
4 PHOENIX-3322 TBD 133
5 PHOENIX-3322 TBD 192
6 74 3178 38
7 PHOENIX-3322 4442 220
8 PHOENIX-3322 TRAFODION-2239 620
9 PHOENIX-3322 941 273
10 PHOENIX-3322 TRAFODION-2241 101
11 PHOENIX-3317 463 56
TPCH 100 Query Times (seconds)
Query
12 379 TBD 85
13 PHOENIX-3318 TBD 71
14 PHOENIX-3322 TBD 50
15 PHOENIX-3319 TBD 102
16 PHOENIX-3322 TBD 33
17 PHOENIX-3322 TBD 929
18 PHOENIX-3322 TBD SPLICE-34
19 PHOENIX-3322 TBD 57
20 PHOENIX-3320 TBD SPLICE-410
21 PHOENIX-3321 TBD 479
22 PHOENIX-3322 TBD 219

Mais conteúdo relacionado

Último

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 

Último (20)

Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Destaque

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 

Destaque (20)

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 

Splice Machine: Architecture of an Open Source RDMS powered by HBase and Spark

  • 1. Architecture of an Open Source RDBMS powered by HBase and Spark January 12, 2017 Spark Meetup Barcelona Daniel Gómez Ferro
  • 2. ▪ Introduction to Splice Machine ▪ 1.x: the need for Spark ▪ 2.0: Spark introduction, challenges and wins ▪ Future 2 Agenda
  • 3. ▪ Splice Machine ▪ Distributed database company ▪ Open source ▪ VC-backed ▪ Offices in San Francisco and St Louis (MO) 3 Who are we?
  • 4. 4 What do we do? The Open Source RDBMS Powered By Hadoop & Spark ANSI SQL No retraining or rewrites for SQL-based analysts, reports, and applications ¼ the Cost Scales out on commodity hardware SQL Scale Out Speed Transactions Ensure reliable updates across multiple rows Mixed Workloads Simultaneously support OLTP and OLAP workloads Elastic Increase scale in just a few minutes 10x Faster Leverages Spark in-memory technology
  • 5. Timeline 5 Where are we? Founded 2012 v1.0 Nov 2014 v1.5 Oct 2015 v2.0 Open Source Spark integration Jul 2016 v2.5 Feb? 2017
  • 6. 1.x: the need for Spark
  • 7. ▪ SQL Parser ▪ SQL Planner ▪ SQL Optimizer ▪ Data storage ▪ Execution Engine 7 RDBMS building blocks
  • 8. ▪ SQL Parser ▪ SQL Planner ▪ SQL Optimizer ▪ Data storage ▪ Execution Engine 8 RDBMS building blocks Apache Derby Apache HBase
  • 9. ▪ Distributed Sorted Map ▪ Keys and Values are byte[] ▪ Lexicographically sorted (like a dictionary) ▪ [] < [0x00; 0x00] < [0x01] < [0x01, 0x00] ▪ Key range is partitioned/sharded in ‘regions’ ▪ Operations ▪ get(byte[] key) -> byte[] ▪ put(byte[] key, byte[] value) ▪ scan(byte[] start, byte[] stop) -> scanner ▪ Extensions ▪ Coprocessors 9 HBase Introduction
  • 11. ▪ Sits on top of HDFS ▪ Distributed append-only Filesystem ▪ Files are immutable once closed 11 HBase challenges
  • 12. ▪ Sits on top of HDFS ▪ Distributed append-only Filesystem ▪ Files are immutable once closed ▪ Log-Structured Merge Tree ▪ Multi level storage ▪ Inserts are buffered in memory ▪ Flush this buffer to disk, writing indexed, sorted files 12 HBase challenges In-memory buffer [a; m] File 1 [a; p] File 2 [f; z]
  • 13. ▪ Sits on top of HDFS ▪ Distributed append-only Filesystem ▪ Files are immutable once closed ▪ Log-Structured Merge Tree ▪ Multi level storage ▪ Inserts are buffered in memory ▪ Flush this buffer to disk, writing indexed, sorted files 13 HBase challenges File 3 [a; m] File 1 [a; p] File 2 [f; z] In-memory buffer []
  • 15. 15 HBase Architecture ▪ HRegion ▪ MemStore ▪ One or more HFiles ▪ HLog ▪ Writes ▪ Add it to the MemStore ▪ Write it to the HLog ▪ When the MemStore gets big enough ▪ Flush: dump the MemStore into a new HFile ▪ Reads ▪ In parallel from the MemStore and all HFiles
  • 16. ▪ We reused several Derby components ▪ JDBC driver ▪ SQL Parser/Planner/Optimizer ▪ In-memory data formats ▪ Bytecode generation ▪ Developed some custom solutions ▪ TEMP table for transient data (joins, aggregates, etc.) ▪ Task framework (using HBase’s coprocessors) ▪ Connection pooling ▪ Switched Derby’s datastore for HBase ▪ Primary Keys and Indexes make use of HBase’s sorting order ▪ Removing Derby’s assumptions about running on a single machine... 16 Derby - HBase integration
  • 17. ▪ Great for ▪ Operational workloads ▪ Replacing non-scalable RDBMS solutions ▪ SQL support ▪ SQL 99, Indexes, Triggers, Foreign Keys, cost based optimizer... 17 Splice Machine 1.x
  • 18. ▪ Great for ▪ Operational workloads ▪ Replacing non-scalable RDBMS solutions ▪ SQL support ▪ SQL 99, Indexes, Triggers, Foreign Keys, cost based optimizer... ▪ But... ▪ Struggled on analytical queries ▪ HBase’s compactions created instabilities ▪ Minimum latency was too high (due to Task Framework) 18 Splice Machine 1.x
  • 19. ▪ Great for ▪ Operational workloads ▪ Replacing non-scalable RDBMS solutions ▪ SQL support ▪ SQL 99, Indexes, Triggers, Foreign Keys, cost based optimizer... ▪ But... ▪ Struggled on analytical queries ▪ HBase’s compactions created instabilities ▪ Minimum latency was too high (due to Task Framework) 19 Splice Machine 1.x
  • 21. ▪ Challenging but natural ▪ Matched tree of database operators with RDD transformations 21 Spark Integration Aggregate Join Scan Restriction Scan reduceByKey() join() newAPIHadoopRDD() filter() newAPIHadoopRDD()
  • 22. 22 ▪ Abstracted away the Spark API ▪ Two implementations ▪ In-memory using Guava’s FluentIterable APIs ▪ Distributed using Spark ▪ SQL operations have a single implementation ▪ In-memory use case: ▪ OLTP workloads ▪ Very low latency ▪ Bring data in, perform computation locally ▪ Anti-pattern in distributed systems, but it works Unified API
  • 23. ▪ Got rid of TEMP table ▪ Spark maintains temporary data in memory ▪ Got rid of Task Framework ▪ Spark performs the same job, less complexity ▪ Resource isolation ▪ HBase and Spark in separate processes ▪ Analytical queries don’t impact as much HBase stability 23 Spark Integration Benefits
  • 24. ▪ Many serialization boundaries: poor performance ▪ HDFS -> HBase -> Spark -> Client ▪ Task granularity ▪ Too coarse ▪ Multiple Spark contexts ▪ One per HRegionServer ▪ Derby legacy issues ▪ Custom datatypes ▪ Thread context assumptions 24 Integration Problems
  • 25. ▪ Remove serialization boundaries ▪ Hybrid scanners: ▪ Custom InputFormat that reads HFiles directly from HDFS into Spark ▪ Merges those values with a fast scanner on the MemStore ▪ Most data: HDFS -> Spark ▪ Small part: HBase (in-memory) -> Spark ▪ Requires some hooks in HBase ▪ Compactions remove HFiles, Spark might be reading them ▪ Flushes add HFiles ▪ Much better read performance 25 Solving: serialization boundaries
  • 26. ▪ Increase task granularity ▪ HTableInputFormat default is: ▪ 1 region = 1 partition ▪ Each region could be 1 GB or more ▪ SpliceInputFormat subdivides regions into blocks (default 32Mb) ▪ Better parallelism ▪ Better performance ▪ This also needs hooks in HBase (coprocessors) 26 Solving: task granularity
  • 27. ▪ Single shared Spark context ▪ JobServer wasn’t good enough ▪ It would become a bottleneck, results would go through it ▪ Custom JobServer (called OLAPServer) ▪ Single Spark context on this server ▪ Currently colocated with the HMaster (fault tolerant for free) ▪ Makes Spark jobs stream results directly to the client ▪ Runs several partitions in parallel ▪ Starts streaming as soon as there’s data 27 Solving: multiple Spark contexts
  • 28. 28 JobServer vs OLAPServerTime JobServer Start partition 1 Next row Next row … End partition 1, send Start partition 2 Next row Next row … End partition 2, send Client Run partition 1 Get results Run partition 2 Get results During this time the client is blocked waiting for more data
  • 29. 29 JobServer vs OLAPServerTime JobServer Start partition 1 Next row Next row … End partition 1, send Start partition 2 Next row Next row … End partition 2, send Client Run partition 1 Get results Run partition 2 Get results OLAPServer Start partition 1, 2, 3 Get and send row Get and send row … End partition 1 Start partition 4 Get and send row Get and send row … End partition 2, 3 Start partition 5, 6 Client Run partition 1, 2, 3 Get result Get result Run partition 4 Get result Get result Run partition 5, 6
  • 31. ▪ Custom datatypes: ▪ Custom Kryo serializers for Derby objects ▪ Thread contexts ▪ Not completely solved ▪ Use TaskContext.addTaskCompletionListener() to cleanup after ourselves ▪ Still finding resource leaks from time to time... 31 Solving: Derby legacy issues
  • 32. ▪ HBase compactions in Spark: ▪ HBase compactions can be expensive ▪ Reading and writing lots of data ▪ If they happen in the HBase JVM they can kill OLTP performance ▪ We made possible running them in Spark ▪ Maintaining data locality ▪ Scheduled among other jobs ▪ Fallback to HBase if the Spark scheduler doesn’t have resources 32 Other Spark goodies
  • 33. 33 Other Spark goodies ▪ Integration with Spark streaming: ▪ We can ingest data directly from Spark Streaming ▪ Easy to write to Splice Machine from Kafka through it
  • 34. Future work and roadmap
  • 35. 35 ▪ Move to DataFrame APIs ▪ Catalyst optimizer ▪ Whole stage code generation (better than Derby’s codegen) ▪ Already transitioned some operations ▪ Requires good UnsafeRow support ▪ UnsafeRow ▪ Compact in memory representation ▪ Rows are a serialized in a continuous block of memory ▪ Better memory management ▪ Less GC time Future Spark work
  • 36. 36 Future Spark work Row1 Current DataDes[] SQLDate SomeOtherField SQLInteger SQLInteger UnsafeRow MemoryBlock: byte[] Row1 0, 100 Row2 100, 50
  • 37. 37 ▪ Columnar storage format ▪ We already have ‘Pinned’ tables: ▪ Create Parquet snapshot of table ▪ Get columnar access ▪ Good for read-only data ▪ Planning on maintaining dual representation ▪ Row-oriented for recently written values ▪ Column-oriented for historical data ▪ Merge those on the fly Future Spark work
  • 38. 38 ▪ Better Spark shell integration ▪ Our SparkContext resides in the OLAPServer ▪ Getting data to a Spark shell incurs a serialization boundary ▪ From Splice’s SparkContext to the shell context ▪ We want to achieve transparent conversion ▪ ResultSet -> DataFrame Future Spark work
  • 39. 39 ▪ Performance increases across the board ▪ TPCC, TPCH, Backup/Restore, ODBC driver… ▪ Incremental backup ▪ Native PL/SQL support (in Beta) ▪ No excuses left for migrating those Oracle databases ▪ Client load balancing/failover ▪ Via HAProxy ▪ Statistics improvements ▪ Histograms, sketching libraries ▪ RDD caching (pinning) 2.5 Roadmap
  • 40. Thank You! We are hiring :-)
  • 42. 42
  • 43. TPCH 100 Load Throughput
  • 44. TPCH 100 Query Times (seconds) Query 1 395 TRAFODION-2237 99 2 PHOENIX-3322 516 44 3 PHOENIX-3322 TRAFODION-2237 126 4 PHOENIX-3322 TBD 133 5 PHOENIX-3322 TBD 192 6 74 3178 38 7 PHOENIX-3322 4442 220 8 PHOENIX-3322 TRAFODION-2239 620 9 PHOENIX-3322 941 273 10 PHOENIX-3322 TRAFODION-2241 101 11 PHOENIX-3317 463 56
  • 45. TPCH 100 Query Times (seconds) Query 12 379 TBD 85 13 PHOENIX-3318 TBD 71 14 PHOENIX-3322 TBD 50 15 PHOENIX-3319 TBD 102 16 PHOENIX-3322 TBD 33 17 PHOENIX-3322 TBD 929 18 PHOENIX-3322 TBD SPLICE-34 19 PHOENIX-3322 TBD 57 20 PHOENIX-3320 TBD SPLICE-410 21 PHOENIX-3321 TBD 479 22 PHOENIX-3322 TBD 219