SlideShare uma empresa Scribd logo
1 de 42
Splice Machine Proprietary and Confidential
Open Source RDBMS
For Mixed Operational and Analytical Workloads
Monte Zweben
October 20, 2016
Splice Machine Proprietary and Confidential
Who We Are
The Open Source RDBMS Powered By Hadoop & Spark
2
ANSI SQL
No retraining or rewrites for SQL-based
analysts, reports, and applications
¼ the Cost
Scales out on
commodity hardware
SQL Scale Out Speed
Transactions
Ensure reliable updates
across multiple rows
Mixed Workloads
Simultaneously support
OLTP and OLAP workloads
Elastic
Increase scale in
just a few minutes
10x Faster
Leverages Spark
in-memory technology
Splice Machine Proprietary and Confidential
Life Sciences
Digital Marketing Financial Services
DECISIONS IN THE MOMENT
Supply Chain Optimization
Splice Machine Proprietary and Confidential
Today’s Reality: Stale Data, Backward-Looking Decisions
4
How old is the data in your reports?
 1 day +
 1 day
 4 hours +
 1 hour +
 Real-time
Splice Machine Proprietary and Confidential
Today’s Reality: Stale Data, Backward-Looking Decisions
5
24%
50%
7%
9%
9%
* Source: Webinars on 11-3-15 and 12-10-15, 237 respondents
How old is the data in your reports?
 1 day +
 1 day
 4 hours +
 1 hour +
 Real-time
Splice Machine Proprietary and Confidential
Legacy ETL Architectures Unable to Keep Up
Ad Hoc
Analytics
Executive
Business Reports
Operational Reports
ERP
CRM
Supply
Chain
HR
…
Data
Warehouse
Datamart
Stream or Batch
Updates
Mixed Workload
Apps
ODS
ETL
OLTP
Systems
Extract
Transform
Load
OLAP
Systems Pain
 Separate OLTP & OLAP
systems
 Messy ETL “glue”
 Why?
 Different workloads
 Different data structures
 Hard to isolate workloads
 No longer adequate
 Can’t afford to wait days or
hours to analyze data
6
Splice Machine Proprietary and Confidential
Recent Approach: Lambda Architecture
Complex to setup and maintain
7
Speed Layer
Batch Layer
Serving Layer
Developer Integrates Specialized Compute Engines
Splice Machine Proprietary and Confidential
New Approach: Lambda-In-A-Box Architecture
Easy to use with SQL
8
Speed Layer
Batch Layer
SQL Optimizer Selects Pre-Integrated Compute Engines
Serving Layer
Splice Machine Proprietary and Confidential
Simultaneous OLTP & OLAP Workloads
9
Unique Dual-Engine Architecture isolates workloads
Traditional RDBMSs Splice Machine
HBASE
Engine
SPARK
Engine
BOTTLENECKS, DELAYS
O L A P
WORKLOAD ISOLATION
O L T P
K E Y
Splice Machine Proprietary and Confidential
Simultaneous OLTP & OLAP Workloads
10
Unique Dual-Engine Architecture isolates workloads
Traditional RDBMSs Splice Machine
As OLAP load rises,
OLTP response times increase
OLAP LOAD
OLTPRESPONSETIME
As OLAP load rises,
OLTP response times remain flat
OLAP LOAD
OLTPRESPONSETIME
Splice Machine Proprietary and Confidential
Power Old and New Applications
Splice Machine Proprietary and Confidential
Proven Building Blocks: Spark, Hadoop and Derby
Apache Derby
 ANSI SQL-99 RDBMS
 Java-based
 ODBC/JDBC Compliant
Apache HBase/Hadoop
 Auto-sharding
 High availability
 Scalability to 100s of PBs
Apache Spark
 Analytical engine
 Fast, in-memory technology
 Memory resilient to node failure
12
Splice Machine Proprietary and Confidential
HBase: Proven Scale-Out
 Auto-sharding
 Scales with commodity hardware
 Cost-effective from GBs to PBs
 High availability thru failover and replication
 LSM-trees
13
Splice Machine Proprietary and Confidential
Apache
14
Unmatched Performance
 Fastest sort of 1PB of data
Advanced In-Memory Technology
 Spill-to-disk for large datasets
 Resilient against node failures
 Pipelining for computation parallelism
Most Active Apache Community
 Almost 1000 contributors
Extensive Libraries
 Over 140 and growing
 Libraries for machine learning,
streaming and graph processing
Splice Machine Proprietary and Confidential
Splice Machine: Advanced Spark Integration
15
Innovative, High-Performance
RDD Creation
 Fast access to HFiles in HDFS
 Merged with deltas from Memstore
 Avoids slower HBase API
Universal Execution Plan
and Byte Code
 Optimizer, plan and code shared across
Spark or HBase execution
•••
HBase Region Server
HDFS
•••
Region 1
Memstore
Spark Worker
•••RDD 1
HFile HFile•••
P H Y S I C A L N O D E
RDD N
HFile••• HFile•••
Region N
Memstore
HBase Region Server
HDFS
•••
Region 1
Memstore
Spark Worker
•••RDD 1
HFile HFile•••
P H Y S I C A L N O D E
RDD N
HFile••• HFile•••
Region N
Memstore
Splice Machine Proprietary and Confidential
Splice Machine Architecture
1. Standard install of HBase
Cluster (HBase, HDFS,
ZooKeeper) with Spark
HBase
Co-Processor
L
E
G
E
N
D
2. Distribute Splice Machine
JAR to each region server
3. Automatically invoke co-
processors on each region
16
Cach
e
•••
Tas
k
Executor
Tas
k
HBase Region Server
•••
HDFS
SPLICE PARSER
SPLICE PLANNER
SPLICE OPTIMIZER
SPLICE EXECUTOR
• Snapshot Isolation
• Indexes
Region Region
SPLICE EXECUTOR
• Snapshot Isolation
• Indexes
Spark Worker RDD
Spark Master
RDD
Cach
e
•••
Tas
k
Executor
Tas
k
•••
•••
•••
Cach
e
•••
Tas
k
Executor
Tas
k
HBase Region Server
HDFS
SPLICE PARSER
SPLICE PLANNER
SPLICE OPTIMIZER
SPLICE EXECUTOR
• Snapshot Isolation
• Indexes
Region Region
SPLICE EXECUTOR
• Snapshot Isolation
• Indexes
Spark Worker RDDRDD
Cach
e
•••
Tas
k
Executor
Tas
k
•••
•••
•••
HMasterZookeeper
Splice Machine Proprietary and Confidential
Splice Machine: Query Execution
17
Splice Machine Proprietary and Confidential
Splice Machine: Query Execution
18
1. Parse SQL
• Generate Abstract Syntax Tree (AST)
• Bind AST to Transactional Dictionary
Splice Machine Proprietary and Confidential
Splice Machine: Query Execution
19
1. Parse SQL
2. Optimize query plan
• Determine access plan (e.g., base table,
index), join order and join algorithm
using cost-based statistics (e.g.,
cardinality estimates)
• Unroll nested subqueries
Splice Machine Proprietary and Confidential
Splice Machine: Query Execution
20
3. Generate optimal byte code
1. Parse SQL
2. Optimize query plan
Splice Machine Proprietary and Confidential
Splice Machine: Query Execution
21
OLTP Execution on HBase
4a. Execute OLTP query from
byte code
5a. Use block cache and bloom
filters to optimize data access
6a. Return results
3. Generate optimal byte code
1. Parse SQL
2. Optimize query plan
Splice Machine Proprietary and Confidential
Splice Machine: Query Execution
22
OLAP Execution on Spark
4b. Generate Spark execution plan
OLTP Execution on HBase
4a. Execute OLTP query from
byte code
5a. Use block cache and bloom
filters to optimize data access
6a. Return results
3. Generate optimal byte code
1. Parse SQL
2. Optimize query plan
OLAP Execution on Spark
4b. Generate Spark execution plan
5b. Submit Spark plan with byte code
6b. Fair scheduling of distributed of tasks
7b. Generate RDD from HFiles and Memstore
8b. Execute query and return results
Splice Machine Proprietary and Confidential
Isolated Resource Management
23
Isolate Spark & HBase resources through Linux Cgroups
Splice Machine Proprietary and Confidential
Isolated Resource Management
24
Isolate Spark & HBase resources through Linux Cgroups
Splice Machine Proprietary and Confidential
Configurable Spark Resource Management
25
Prioritize Spark resources between Query, Admin & Import jobs
Custom resource pools
through XML
Splice Machine Proprietary and Confidential
Spark Query Management
26
Visualization of active and completed queries
Splice Machine Proprietary and Confidential
Spark Query Management (cont’d)
27
Visualization of stages for each query, plus kill function
Splice Machine Proprietary and Confidential
Spark Query Management (cont’d)
28
Visualization of stages for query plan, plus kill function
Splice Machine Proprietary and Confidential
Spark Query Management (cont’d)
29
Detailed metrics for tasks in each stage
Splice Machine Proprietary and Confidential
Spark Query Management (cont’d)
30
Splice Machine Proprietary and Confidential
Working With External Data and Compute Engines
31
Virtual Table Interface (VTI)
 Execute federated queries against external
files, libraries or databases
 External Databases
 Use JDBC to access data in DBs such as Oracle
and DB2
 External Libraries
 Access over 140 Spark libraries for machine
learning and streaming
 External Files
 Pre-defined or dynamic schema
 Access local FS, HDFS, AWS S3
 Sample query:
MapReduce I/O Formats
 Accept federated queries from
MapReduce, Pig, and Hive
 Register Splice Machine schema in
HCATALOG
 Merge structured (Splice) and
unstructured data in ad-hoc query
 Seamless integration to Hadoop
ecosystem
Splice Machine Proprietary and Confidential
ANSI SQL-99+ Coverage
32
 Data types – e.g., INTEGER, REAL,
CHARACTER, DATE, BOOLEAN, BIGINT
 DDL – e.g., CREATE TABLE, CREATE SCHEMA,
ALTER TABLE, DELETE, UPDATE TABLE
 Predicates – e.g., IN, BETWEEN, LIKE, EXISTS
 DML – e.g., INSERT, DELETE, UPDATE, SELECT
 Query specification – e.g., GROUP BY,
HAVING
 SET functions – e.g., UNION, ABS, MOD, ALL,
INTERSECT, EXCEPT
 Aggregation functions – e.g., AVG, MAX,
COUNT
 String functions – e.g., SUBSTRING,
concatenation, UPPER, LOWER, TRIM,
LENGTH
 Constraints – e.g., PRIMARY KEY, CHECK,
FOREIGN KEY, UNIQUE, NOT NULL
 Conditional functions – e.g., CASE,
searched CASE
 Privileges – e.g., privileges for SELECT,
DELETE, INSERT, EXECUTE
 Joins – e.g., INNER JOIN, LEFT OUTER JOIN
 Transactions – e.g., COMMIT, ROLLBACK,
Snapshot Isolation
 Sub-queries
 Triggers
 User-defined functions (UDFs)
 Views – including grouped views
 Window Functions – e.g., FIRST_VALUE,
LAST_VALUE, LEAD, LAG
Splice Machine Proprietary and Confidential 33
High Concurrency, ACID transactions
Required to support OLTP applications
share_quantity share_price
TIMESTAMP VALUE TIMESTAMP VALUE
T12 4,000
“Virtual”
Snapshot
T7 $15.11
T7 2,000 T5 $15.65
T3 5,000
Transaction
@T6
T2 $15.74
T1 3,000 T0 $15.27
T3 5,000
Transaction
@T6
T2 $15.74
T5 $15.65
value_held = share_quality* share_price
@T6: value_held = 5,000 * $15.65
@T3: value_held = 5,000 * $15.74
 State-of-the-art, distributed
snapshot isolation
 Form of Multi-Version
Concurrency Control (MVCC)
 Writers do not block readers
 Fast, high concurrency
 Delivers performance for small
reads/writes & batch loads
 Extends research from Google
Percolator & Yahoo Labs
 Patent pending technology
Splice Machine Proprietary and Confidential
BI and SQL tool support via ODBC/JDBC
34
No application rewrites needed
Splice Machine Proprietary and Confidential
Open Source
Features Community
Edition
Enterprise
Edition
Scale-out Architecture, ANSI SQL & Concurrent ACID Transactions ✓ ✓
OLAP and OLTP Resource Isolation ✓ ✓
Distributed In-Memory Joins, Aggregations, Scans and Groupings ✓ ✓
Cost-Based Statistics, Query Optimizer, Management Console ✓ ✓
Compaction Optimization ✓ ✓
Apache Kafka-enabled Streaming ✓ ✓
Virtual Table Interfaces ✓ ✓
New Releases and Maintenance Updates ✓ ✓
Tutorials, Forums, Videos, Documentation, Community Support ✓ ✓
Backup and Restore, Column Access Control ✓
Encryption, Kerberos, LDAP Support ✓
24/7 Support via Web and Phone ✓
Complimentary Account Management Services ✓
Splice Machine Proprietary and Confidential
Try it at scale immediately on AWS Sandbox
 5 Click Sand Box
 Cluster has full system deployed
 SSH for CLI
 URL to Management Consoles
 Open SQL connection on any
node
 Customize template
Splice Machine Proprietary and Confidential
Community
 Slack channel - #splicecommunity
 Video and code tutorials
 GitHub
Splice Machine Proprietary and Confidential
Advisory Board
41
Advisory Board includes luminaries in databases and technology
Roger Bamford
Former Principal Architect at Oracle
Father of Oracle RAC
Mike Franklin
Chair,Dept of Computer Science, UChicago
Director, UC Berkeley AMPLab
Founder of Apache Spark
Marie-Anne Neimat
Co-Founder, Times-Ten Database
Former VP, Database Eng. at Oracle
Ken Rudin
Head of Growth and Analytics for Google Search
Head of Analytics at Facebook
Abhinav Gupta
Co-Founder, Rocket Fuel
Runs 15PB HBase Cluster
Splice Machine Proprietary and Confidential 42
WE ARE HIRING
Splice Machine Proprietary and Confidential
Seasoned Team
43
Monte Zweben
Co-Founder &
Chief Executive
Officer
John Leach
Co-Founder &
Chief Technology
Officer
St. Louis
Hadoop
User Group
Krishnan
Parasuraman
VP of Sales and
Business
Development
Eran Pilovsky
Chief Financial
Officer
Gene Davis
Co-Founder & VP
of Products &
Operations
Eric Kalabacos
VP of Customer
Solutions
Splice Machine Proprietary and Confidential
Next Steps
44
Try Us!
splicemachine.com/get-started
GitHub • Tutorials • Sandbox
Splice Machine Proprietary and Confidential
Powering Real-Time
Applications & Analytics
Enabling Decisions in the Moment
October 20, 2016

Mais conteúdo relacionado

Mais procurados

Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaDataWorks Summit
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLArseny Chernov
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...Yahoo Developer Network
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...Michael Stack
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightHBaseCon
 
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...DataStax Academy
 
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase Cloudera, Inc.
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketCloudera, Inc.
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataRyan Bosshart
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtMichael Stack
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardMatthew Blair
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!Timo Walther
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程HBaseCon
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
 

Mais procurados (20)

Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
 
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...
 
Rds data lake @ Robinhood
Rds data lake @ Robinhood Rds data lake @ Robinhood
Rds data lake @ Robinhood
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHarmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightOptimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight
 
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
C* Summit 2013: Time for a New Relationship - Intuit's Journey from RDBMS to ...
 
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase
 
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, PhotobucketHBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
HBaseCon 2012 | Solbase - Kyungseog Oh, Photobucket
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 
HBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the ArtHBaseConEast2016: HBase and Spark, State of the Art
HBaseConEast2016: HBase and Spark, State of the Art
 
HBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ FlipboardHBaseCon 2015- HBase @ Flipboard
HBaseCon 2015- HBase @ Flipboard
 
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!ApacheCon 2020 - Flink SQL in 2020: Time to show off!
ApacheCon 2020 - Flink SQL in 2020: Time to show off!
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 

Semelhante a October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and Spark 

Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014cdmaxime
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoMapR Technologies
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014cdmaxime
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsContinuent
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsGuy Harrison
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesAmazon Web Services
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Databricks
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Mac Moore
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedwhoschek
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloudDmitry Tolpeko
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problemsAbhishek Gupta
 

Semelhante a October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and Spark  (20)

Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014Cloudera Impala - San Diego Big Data Meetup August 13th 2014
Cloudera Impala - San Diego Big Data Meetup August 13th 2014
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Intro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of TwingoIntro to Apache Spark by CTO of Twingo
Intro to Apache Spark by CTO of Twingo
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Replicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analyticsReplicate from Oracle to data warehouses and analytics
Replicate from Oracle to data warehouses and analytics
 
From oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other toolsFrom oracle to hadoop with Sqoop and other tools
From oracle to hadoop with Sqoop and other tools
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
 
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
Spark Saturday: Spark SQL & DataFrame Workshop with Apache Spark 2.3
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 

Mais de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Mais de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Último

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Último (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and Spark 

  • 1. Splice Machine Proprietary and Confidential Open Source RDBMS For Mixed Operational and Analytical Workloads Monte Zweben October 20, 2016
  • 2. Splice Machine Proprietary and Confidential Who We Are The Open Source RDBMS Powered By Hadoop & Spark 2 ANSI SQL No retraining or rewrites for SQL-based analysts, reports, and applications ¼ the Cost Scales out on commodity hardware SQL Scale Out Speed Transactions Ensure reliable updates across multiple rows Mixed Workloads Simultaneously support OLTP and OLAP workloads Elastic Increase scale in just a few minutes 10x Faster Leverages Spark in-memory technology
  • 3. Splice Machine Proprietary and Confidential Life Sciences Digital Marketing Financial Services DECISIONS IN THE MOMENT Supply Chain Optimization
  • 4. Splice Machine Proprietary and Confidential Today’s Reality: Stale Data, Backward-Looking Decisions 4 How old is the data in your reports?  1 day +  1 day  4 hours +  1 hour +  Real-time
  • 5. Splice Machine Proprietary and Confidential Today’s Reality: Stale Data, Backward-Looking Decisions 5 24% 50% 7% 9% 9% * Source: Webinars on 11-3-15 and 12-10-15, 237 respondents How old is the data in your reports?  1 day +  1 day  4 hours +  1 hour +  Real-time
  • 6. Splice Machine Proprietary and Confidential Legacy ETL Architectures Unable to Keep Up Ad Hoc Analytics Executive Business Reports Operational Reports ERP CRM Supply Chain HR … Data Warehouse Datamart Stream or Batch Updates Mixed Workload Apps ODS ETL OLTP Systems Extract Transform Load OLAP Systems Pain  Separate OLTP & OLAP systems  Messy ETL “glue”  Why?  Different workloads  Different data structures  Hard to isolate workloads  No longer adequate  Can’t afford to wait days or hours to analyze data 6
  • 7. Splice Machine Proprietary and Confidential Recent Approach: Lambda Architecture Complex to setup and maintain 7 Speed Layer Batch Layer Serving Layer Developer Integrates Specialized Compute Engines
  • 8. Splice Machine Proprietary and Confidential New Approach: Lambda-In-A-Box Architecture Easy to use with SQL 8 Speed Layer Batch Layer SQL Optimizer Selects Pre-Integrated Compute Engines Serving Layer
  • 9. Splice Machine Proprietary and Confidential Simultaneous OLTP & OLAP Workloads 9 Unique Dual-Engine Architecture isolates workloads Traditional RDBMSs Splice Machine HBASE Engine SPARK Engine BOTTLENECKS, DELAYS O L A P WORKLOAD ISOLATION O L T P K E Y
  • 10. Splice Machine Proprietary and Confidential Simultaneous OLTP & OLAP Workloads 10 Unique Dual-Engine Architecture isolates workloads Traditional RDBMSs Splice Machine As OLAP load rises, OLTP response times increase OLAP LOAD OLTPRESPONSETIME As OLAP load rises, OLTP response times remain flat OLAP LOAD OLTPRESPONSETIME
  • 11. Splice Machine Proprietary and Confidential Power Old and New Applications
  • 12. Splice Machine Proprietary and Confidential Proven Building Blocks: Spark, Hadoop and Derby Apache Derby  ANSI SQL-99 RDBMS  Java-based  ODBC/JDBC Compliant Apache HBase/Hadoop  Auto-sharding  High availability  Scalability to 100s of PBs Apache Spark  Analytical engine  Fast, in-memory technology  Memory resilient to node failure 12
  • 13. Splice Machine Proprietary and Confidential HBase: Proven Scale-Out  Auto-sharding  Scales with commodity hardware  Cost-effective from GBs to PBs  High availability thru failover and replication  LSM-trees 13
  • 14. Splice Machine Proprietary and Confidential Apache 14 Unmatched Performance  Fastest sort of 1PB of data Advanced In-Memory Technology  Spill-to-disk for large datasets  Resilient against node failures  Pipelining for computation parallelism Most Active Apache Community  Almost 1000 contributors Extensive Libraries  Over 140 and growing  Libraries for machine learning, streaming and graph processing
  • 15. Splice Machine Proprietary and Confidential Splice Machine: Advanced Spark Integration 15 Innovative, High-Performance RDD Creation  Fast access to HFiles in HDFS  Merged with deltas from Memstore  Avoids slower HBase API Universal Execution Plan and Byte Code  Optimizer, plan and code shared across Spark or HBase execution ••• HBase Region Server HDFS ••• Region 1 Memstore Spark Worker •••RDD 1 HFile HFile••• P H Y S I C A L N O D E RDD N HFile••• HFile••• Region N Memstore HBase Region Server HDFS ••• Region 1 Memstore Spark Worker •••RDD 1 HFile HFile••• P H Y S I C A L N O D E RDD N HFile••• HFile••• Region N Memstore
  • 16. Splice Machine Proprietary and Confidential Splice Machine Architecture 1. Standard install of HBase Cluster (HBase, HDFS, ZooKeeper) with Spark HBase Co-Processor L E G E N D 2. Distribute Splice Machine JAR to each region server 3. Automatically invoke co- processors on each region 16 Cach e ••• Tas k Executor Tas k HBase Region Server ••• HDFS SPLICE PARSER SPLICE PLANNER SPLICE OPTIMIZER SPLICE EXECUTOR • Snapshot Isolation • Indexes Region Region SPLICE EXECUTOR • Snapshot Isolation • Indexes Spark Worker RDD Spark Master RDD Cach e ••• Tas k Executor Tas k ••• ••• ••• Cach e ••• Tas k Executor Tas k HBase Region Server HDFS SPLICE PARSER SPLICE PLANNER SPLICE OPTIMIZER SPLICE EXECUTOR • Snapshot Isolation • Indexes Region Region SPLICE EXECUTOR • Snapshot Isolation • Indexes Spark Worker RDDRDD Cach e ••• Tas k Executor Tas k ••• ••• ••• HMasterZookeeper
  • 17. Splice Machine Proprietary and Confidential Splice Machine: Query Execution 17
  • 18. Splice Machine Proprietary and Confidential Splice Machine: Query Execution 18 1. Parse SQL • Generate Abstract Syntax Tree (AST) • Bind AST to Transactional Dictionary
  • 19. Splice Machine Proprietary and Confidential Splice Machine: Query Execution 19 1. Parse SQL 2. Optimize query plan • Determine access plan (e.g., base table, index), join order and join algorithm using cost-based statistics (e.g., cardinality estimates) • Unroll nested subqueries
  • 20. Splice Machine Proprietary and Confidential Splice Machine: Query Execution 20 3. Generate optimal byte code 1. Parse SQL 2. Optimize query plan
  • 21. Splice Machine Proprietary and Confidential Splice Machine: Query Execution 21 OLTP Execution on HBase 4a. Execute OLTP query from byte code 5a. Use block cache and bloom filters to optimize data access 6a. Return results 3. Generate optimal byte code 1. Parse SQL 2. Optimize query plan
  • 22. Splice Machine Proprietary and Confidential Splice Machine: Query Execution 22 OLAP Execution on Spark 4b. Generate Spark execution plan OLTP Execution on HBase 4a. Execute OLTP query from byte code 5a. Use block cache and bloom filters to optimize data access 6a. Return results 3. Generate optimal byte code 1. Parse SQL 2. Optimize query plan OLAP Execution on Spark 4b. Generate Spark execution plan 5b. Submit Spark plan with byte code 6b. Fair scheduling of distributed of tasks 7b. Generate RDD from HFiles and Memstore 8b. Execute query and return results
  • 23. Splice Machine Proprietary and Confidential Isolated Resource Management 23 Isolate Spark & HBase resources through Linux Cgroups
  • 24. Splice Machine Proprietary and Confidential Isolated Resource Management 24 Isolate Spark & HBase resources through Linux Cgroups
  • 25. Splice Machine Proprietary and Confidential Configurable Spark Resource Management 25 Prioritize Spark resources between Query, Admin & Import jobs Custom resource pools through XML
  • 26. Splice Machine Proprietary and Confidential Spark Query Management 26 Visualization of active and completed queries
  • 27. Splice Machine Proprietary and Confidential Spark Query Management (cont’d) 27 Visualization of stages for each query, plus kill function
  • 28. Splice Machine Proprietary and Confidential Spark Query Management (cont’d) 28 Visualization of stages for query plan, plus kill function
  • 29. Splice Machine Proprietary and Confidential Spark Query Management (cont’d) 29 Detailed metrics for tasks in each stage
  • 30. Splice Machine Proprietary and Confidential Spark Query Management (cont’d) 30
  • 31. Splice Machine Proprietary and Confidential Working With External Data and Compute Engines 31 Virtual Table Interface (VTI)  Execute federated queries against external files, libraries or databases  External Databases  Use JDBC to access data in DBs such as Oracle and DB2  External Libraries  Access over 140 Spark libraries for machine learning and streaming  External Files  Pre-defined or dynamic schema  Access local FS, HDFS, AWS S3  Sample query: MapReduce I/O Formats  Accept federated queries from MapReduce, Pig, and Hive  Register Splice Machine schema in HCATALOG  Merge structured (Splice) and unstructured data in ad-hoc query  Seamless integration to Hadoop ecosystem
  • 32. Splice Machine Proprietary and Confidential ANSI SQL-99+ Coverage 32  Data types – e.g., INTEGER, REAL, CHARACTER, DATE, BOOLEAN, BIGINT  DDL – e.g., CREATE TABLE, CREATE SCHEMA, ALTER TABLE, DELETE, UPDATE TABLE  Predicates – e.g., IN, BETWEEN, LIKE, EXISTS  DML – e.g., INSERT, DELETE, UPDATE, SELECT  Query specification – e.g., GROUP BY, HAVING  SET functions – e.g., UNION, ABS, MOD, ALL, INTERSECT, EXCEPT  Aggregation functions – e.g., AVG, MAX, COUNT  String functions – e.g., SUBSTRING, concatenation, UPPER, LOWER, TRIM, LENGTH  Constraints – e.g., PRIMARY KEY, CHECK, FOREIGN KEY, UNIQUE, NOT NULL  Conditional functions – e.g., CASE, searched CASE  Privileges – e.g., privileges for SELECT, DELETE, INSERT, EXECUTE  Joins – e.g., INNER JOIN, LEFT OUTER JOIN  Transactions – e.g., COMMIT, ROLLBACK, Snapshot Isolation  Sub-queries  Triggers  User-defined functions (UDFs)  Views – including grouped views  Window Functions – e.g., FIRST_VALUE, LAST_VALUE, LEAD, LAG
  • 33. Splice Machine Proprietary and Confidential 33 High Concurrency, ACID transactions Required to support OLTP applications share_quantity share_price TIMESTAMP VALUE TIMESTAMP VALUE T12 4,000 “Virtual” Snapshot T7 $15.11 T7 2,000 T5 $15.65 T3 5,000 Transaction @T6 T2 $15.74 T1 3,000 T0 $15.27 T3 5,000 Transaction @T6 T2 $15.74 T5 $15.65 value_held = share_quality* share_price @T6: value_held = 5,000 * $15.65 @T3: value_held = 5,000 * $15.74  State-of-the-art, distributed snapshot isolation  Form of Multi-Version Concurrency Control (MVCC)  Writers do not block readers  Fast, high concurrency  Delivers performance for small reads/writes & batch loads  Extends research from Google Percolator & Yahoo Labs  Patent pending technology
  • 34. Splice Machine Proprietary and Confidential BI and SQL tool support via ODBC/JDBC 34 No application rewrites needed
  • 35. Splice Machine Proprietary and Confidential Open Source Features Community Edition Enterprise Edition Scale-out Architecture, ANSI SQL & Concurrent ACID Transactions ✓ ✓ OLAP and OLTP Resource Isolation ✓ ✓ Distributed In-Memory Joins, Aggregations, Scans and Groupings ✓ ✓ Cost-Based Statistics, Query Optimizer, Management Console ✓ ✓ Compaction Optimization ✓ ✓ Apache Kafka-enabled Streaming ✓ ✓ Virtual Table Interfaces ✓ ✓ New Releases and Maintenance Updates ✓ ✓ Tutorials, Forums, Videos, Documentation, Community Support ✓ ✓ Backup and Restore, Column Access Control ✓ Encryption, Kerberos, LDAP Support ✓ 24/7 Support via Web and Phone ✓ Complimentary Account Management Services ✓
  • 36. Splice Machine Proprietary and Confidential Try it at scale immediately on AWS Sandbox  5 Click Sand Box  Cluster has full system deployed  SSH for CLI  URL to Management Consoles  Open SQL connection on any node  Customize template
  • 37. Splice Machine Proprietary and Confidential Community  Slack channel - #splicecommunity  Video and code tutorials  GitHub
  • 38. Splice Machine Proprietary and Confidential Advisory Board 41 Advisory Board includes luminaries in databases and technology Roger Bamford Former Principal Architect at Oracle Father of Oracle RAC Mike Franklin Chair,Dept of Computer Science, UChicago Director, UC Berkeley AMPLab Founder of Apache Spark Marie-Anne Neimat Co-Founder, Times-Ten Database Former VP, Database Eng. at Oracle Ken Rudin Head of Growth and Analytics for Google Search Head of Analytics at Facebook Abhinav Gupta Co-Founder, Rocket Fuel Runs 15PB HBase Cluster
  • 39. Splice Machine Proprietary and Confidential 42 WE ARE HIRING
  • 40. Splice Machine Proprietary and Confidential Seasoned Team 43 Monte Zweben Co-Founder & Chief Executive Officer John Leach Co-Founder & Chief Technology Officer St. Louis Hadoop User Group Krishnan Parasuraman VP of Sales and Business Development Eran Pilovsky Chief Financial Officer Gene Davis Co-Founder & VP of Products & Operations Eric Kalabacos VP of Customer Solutions
  • 41. Splice Machine Proprietary and Confidential Next Steps 44 Try Us! splicemachine.com/get-started GitHub • Tutorials • Sandbox
  • 42. Splice Machine Proprietary and Confidential Powering Real-Time Applications & Analytics Enabling Decisions in the Moment October 20, 2016

Notas do Editor

  1. The Hadoop RDBMS is designed to scale-out from a single server to thousands of machines, with a high degree of fault tolerance. Rather than relying on high-end hardware, Splice Machine uses the proven scale-out and high availability of Hadoop, proven in production clusters of dozens of petabytes at large scale leaders like Yahoo, Facebook, and Twitter. The Hadoop RDBMS benefits include: Affordability – scale-out -- using commodity hardware Elasticity -- expand or scale back easily Transactional – execute real time updates and ACID transactions ANSI SQL -- leverage existing SQL code, tools, and skills Flexibility -- support both operational and analytical workloads Notes: SQL: Structured Query Language. SQL is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS).
  2. Splice Machine has focused on the orange blocks to maximize the value of our R&D investment Derby's database engine, is a full-functioned relational embedded database-engine, supporting JDBC and SQL as programming APIs. It uses IBM DB2 SQL syntax. Apache Derby originated at Cloudscape Inc, an Oakland, California, start-up founded in 1996 In 1999 Informix Software, Inc., acquired Cloudscape, Inc. In 2001 IBM acquired the database assets of Informix Software, including Cloudscape In August 2004 IBM contributed the code to the Apache Software Foundation as Derby Splice Machine has focused the middle of the stack to maximize the value created by our R&D Our parallelization engine to execute SQL Secondary indexes Join strategies Query optimizers Performance High concurrency, lockless programming
  3. HBase is a “distributed, versioned, non-relational database modeled after Google's Bigtable, a distributed storage system for structured data”. HBase can handle very high throughput scaling. Fully leverage Hbase as a storage engine for horizontal scale-out Auto-sharding in Hbase is based on regions with regions assigned to region servers Hbase does region balancing/re-balancing Recall that regions are ranges of rowkeys in sorted order RDBMS primary key uses Hbase rowkeys Fast single row selects Fast range based scans Dense secondary indices stored in separate tables Write pipeline Writes to WAL first (not shown) for durability The writes to memstore Memstores are eventually flushed to disk to storefiles Read pipeline Blocks are read from storefiles and memstores Blocks are cached in Block Cache (not shown) Remember that HDFS is an immutable file system Storefiles are written and never updated Updates are really inserts (upserts)
  4. Key points A theme in Distributed computing is moving the code to where the data is because the data is big Splice Machine has its own query execution and task parallelization engine Secret sauce Not based on map/reduce Predicates pushed to shards and locally applied Each region executes local HBase operations Results are returned to the controlling node and “spliced together” hence the “splice” in the company name Serialization Highly compressed storage format for table row Snappy compression Reduces network traffic! Join strategies: Nested Loop, SortMerge, Merge, Broadcast Rely on HBase co-processors: end-points and observers Performance Speed of the read and async write pipelines 20 msecs for read 30 msecs for write
  5. This is big area of focus for us Based on MVCC “snapshot isolation” Lockless is key here Patent pending in this area Based on timestamps Transaction C will see changes from transaction A Transaction C won’t see changes from transaction B Here's an explanation of what is depicted in Figure 1: Transaction T1 bumps up the Qty for A by 10 twice, then commits at time t6. At the commit, A's Qty is 30, which is now visible to other transactions. Note, however, that T2 started at time t4 before T1's commit, so its value for A is still 10. Thus when it computes C = A+10, this results in 20. T3 starts at t7, as an overlap to T2, and attempts to update B just as T2 did, resulting in a write-write conflict. T3 rolls back, and attempts a reissue with T3'. This succeeds with the previously committed value from T2.