SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Securely explore your data
PERFORMANCE MODELS
FOR APACHE ACCUMULO:
THE HEAVY TAIL OF A SHARED-
NOTHING ARCHITECTURE
Chris McCubbin
Director of Data Science
Sqrrl Data, Inc.
TODAY’S TALK
1.  Quick intro to performance optimization
2.  Techniques for targeted distributed application
modeling performance improvement
3.  A deep dive in to improving bulk load application
performance
4.  A shallow dive in to partial schemas
2©2014 Sqrrl Data, Inc
SO, YOUR DISTRIBUTED
APPLICATION IS SLOW
•  Today’s distributed applications run on tens or
hundreds of library components
•  Many versions so internet advice could be
ineffective, or worse, flat out wrong
•  Hundreds of settings
•  Some, shall we say, could be better documented
•  Shared-nothing architectures are usually
“shared-little” architectures with tricky
interactions
•  Profiling is hard and time-consuming
©2014 Sqrrl Data, Inc 3
ROUND UP THE ‘USUAL
SUSPECTS’?
•  “Common knowledge” that some things can cause
performance issues
•  Too much network usage
•  Disk Bound
•  Stragglers
•  Framework settings
•  Unbalanced distribution
•  SerDe
•  This might be a good start, but we really want to
focus on the biggest problem if we can
•  Technology, installations and use cases have high
variability: what works for one job on one cluster may
be useless on another
©2014 Sqrrl Data, Inc 4
PERFORMANCE ANALYSIS CYCLE
5©2014 Sqrrl Data, Inc
Simulate &
Experiment
Modify
Code
Analyze
Start:
Create
Model
Refine
Model
Outputs:
Better Code
+ Models
MAKING A MODEL
©2014 Sqrrl Data, Inc 6
•  Determine points of low-impact metrics
•  Add some if needed
•  Create parallel state machine models with
components driven by these metrics
•  Estimate running times and bottlenecks from
a-priori information and/or apply measured
statistics
•  Focus testing on validation of the initial
model and the (estimated) pain points
•  Apply Amdahl’s Law
•  Rinse, repeat
The Apache Accumulo™ sorted, distributed key/value store is a secure, robust,
scalable, high performance data storage and retrieval system.
•  Many applications in real-time storage and analysis of “big data”:
•  Spatio-temporal indexing in non-relational distributed databases - Fox et al
2013 IEEE International Congress on Big Data
•  Big Data Dimensional Analysis - Gadepally et al IEEE HPEC 2014
•  Leading its peers in performance and scalability:
•  Achieving 100,000,000 database inserts per second using Accumulo and
D4M - Kepner et al IEEE HPEC 2014
•  An NSA Big Graph experiment (Technical Report NSA-RD-2013-056002v1)
•  Benchmarking Apache Accumulo BigData Distributed Table Store Using Its
Continuous Test Suite - Sen et al 2013 IEEE International Congress on Big
Data
For more papers and presentations, see http://accumulo.apache.org/papers.html
7©2014 Sqrrl Data, Inc
•  Collections of KV pairs form Tables
•  Tables are partitioned into Tablets
•  Metadata tablets hold info about
other tablets, forming a 3-level
hierarchy
•  A Tablet is a unit of work for a
Tablet Server
Data	
  Tablet	
  
-­‐∞	
  :	
  thing	
  
Data	
  Tablet	
  
thing	
  :	
  ∞	
  	
  
Data	
  Tablet	
  
-­‐∞	
  :	
  Ocelot	
  	
  
Data	
  Tablet	
  
Ocelot	
  :	
  Yak	
  	
  
Data	
  Tablet	
  
Yak	
  :	
  ∞	
  	
  
Data	
  Tablet	
  
-­‐∞	
  to	
  ∞	
  	
  
Table:	
  	
  Adam’s	
  Table	
   Table:	
  	
  Encyclopedia	
   Table:	
  	
  Foo	
  
SCALING UP: DIVIDE & CONQUER
Well-­‐Known	
  
Loca9on	
  
(zookeeper)	
  
Root	
  Tablet	
  
-­‐∞	
  to	
  ∞	
  	
  
Metadata	
  Tablet	
  2	
  
“Encyclopedia:Ocelot”	
  to	
  ∞	
  
Metadata	
  Tablet	
  1	
  
-­‐∞	
  to	
  “Encyclopedia:Ocelot”	
  
8©2014 Sqrrl Data, Inc
BULK INGEST OVERVIEW
•  Accumulo supports two mechanisms to bring
data in: streaming ingest and bulk ingest.
•  Bulk Ingest
•  Goal: maximize throughput without constraining
latency.
•  Create a set of Accumulo Rfiles by some means,
then register those files with Accumulo.
•  RFiles are groups of sorted key-value pairs with
some indexing information
•  MapReduce has a built-in key sorting phase: a good
fit to produce RFiles
©2014 Sqrrl Data, Inc 9
BULK INGEST MODEL
10
Map Reduce Register
Time
©2014 Sqrrl Data, Inc
BULK INGEST MODEL
11
Time
•  100% CPU
•  20% Disk
•  0% Network
•  46 seconds
•  40% CPU
•  100% Disk
•  20% Network
•  168 seconds
•  10% CPU
•  20% Disk
•  40% Network
•  17 seconds
Hypothetical Resource Usage
©2014 Sqrrl Data, Inc
Map Reduce Register
INSIGHT
12
Time
•  100% CPU
•  20% Disk
•  0% Network
•  46 seconds
•  40% CPU
•  100% Disk
•  20% Network
•  168 seconds
•  10% CPU
•  20% Disk
•  40% Network
•  17 seconds
•  Spare disk here, spare CPU there – can we even out resource consumption?
•  Why did reduce take 168 seconds? It should be more like 40 seconds.
•  No clear bottleneck during registration – is there a synchronization or
serialization problem?
©2014 Sqrrl Data, Inc
Map Reduce Register
Reduce Thread
Map Thread
LOOKING DEEPER:
REFINED BULK INGEST MODEL
13
Map
Setup
Map Sort
Sort Reduce Output
Spill Merge
Shuffle
Serve
Time
©2014 Sqrrl Data, Inc
Parallel Latch
BULK INGEST MODEL PREDICTIONS
•  We can constrain parts of the model by physical
throughput limitations
•  Disk -> memory (100Mbps avg 7200rpm seq. read rate)
•  Input reader
•  Memory -> Disk (100Mbps)
•  Spill, OutputWriter
•  Disk -> Disk (50Mbps)
•  Merge
•  Network (Gigabit = 125Mbps)
•  Shuffle
•  And/or algorithmic limitations
•  Sort, (Our) Map, (Our) Reduce, SerDe
©2014 Sqrrl Data, Inc 14
PERFORMANCE GOAL MODEL
©2014 Sqrrl Data, Inc 15
Performance goals obtained through:
•  Simulation of individual components
•  Prediction of available resources at runtime
INSTRUMENTATION
application version 1.3.3 SYSTEM DATA
application sha 8d17baf8 node num 1 input type arcsight
yarn.nodemanager.resource.memory-mb 43008 map num containers 20 input block size 32
yarn.scheduler.minimum-allocation-mb 2048 red num containers 20 input block count 20
yarn.scheduler.maximum-allocation-mb 43008 cores physical 12 input total 672054649
yarn.app.mapreduce.am.resource.mb 2048 cores logical 24 output map 9313303723
yarn.app.mapreduce.am.command-opts -Xmx1536m disk num 8 output map:combine input records 243419324
mapreduce.map.memory.mb 2048 disk bandwidth 100 output map:combine records out 209318830
mapreduce.map.java.opts -Xmx1638m replication 1 output map:spill 7325671992
mapreduce.reduce.memory.mb 2048 monitoring TRUE output final 573802787
mapreduce.reduce.java.opts -Xmx1638m output map:combine 7301374577
mapreduce.task.io.sort.mb 100 TIME
mapreduce.map.sort.spill.percent 0.8 map:setup avg 8 RATIOS
mapreduce.task.io.sort.factor 10 map:map avg 12 input explosion factor 13.877904
mapreduce.reduce.shuffle.parallelcopies 5 map:sort avg 12 compression intermediate 1.003327786
mapreduce.job.reduce.slowstart.completedmaps 1 map:spill avg 12 load combiner output 0.783972562
mapreduce.map.output.compress FALSE map:spill count 7 total ratio 0.786581455
mapred.map.output.compression.codec n/a map:merge avg 46
description baseline map total 290 CONSTANTS
red:shuffle avg 6 avg schema entry size (bytes) 59
red:merge avg 38
red:reduce avg 68 effective MB/sec 1.618488025
red:total avg 112
red:reducer count 20
job:total 396
16©2014 Sqrrl Data, Inc
PERFORMANCE MEASUREMENT
Baseline (naive implementation)
17©2014 Sqrrl Data, Inc
Reduce Thread
Map Thread
Map
Setup
Map Sort
Sort Reduce Output
Spill Merge
Shuffle
Serve
PATH TO IMPROVEMENT
1.  Profiling revealed much time spent serializing/deserializing
Accumulo’s Key class
1.  Supported by recent investigations on e.g. spark jobs
1.  “as much as half of the CPU time is spent deserializing and
decompressing data.” https://www.eecs.berkeley.edu/~keo/
publications/nsdi15-final147.pdf
2.  With proper configuration, MapReduce supports
comparison of MR keys in serialized form
3.  Rewriting Key’s serialization lead to an order-preserving
encoding, easy to compare in serialized form
4.  Configure MapReduce to use native code to compare Keys
5.  Tweak map input size and spill memory for as few spills as
possible
18©2014 Sqrrl Data, Inc
PERFORMANCE MEASUREMENT
Optimized sorting
•  Improvements:
•  Time for map-side merge went down
•  Sort performance drastically improved in both
map and reduce phases
•  300% faster
19©2014 Sqrrl Data, Inc
PERFORMANCE MEASUREMENT
Optimized sorting
Insights:
•  Map is slower than expected
•  Intermediate data inflation ratio (output from map) is very high, and the
mapper is now disk-bound
•  Amdahl’s law strikes again
•  Reducer Output is also already disk bound.
•  Can we trade disk time in Map for ‘free’ CPU time in Reduce?
20©2014 Sqrrl Data, Inc
Reduce Thread
Map Thread
Map
Setup
Map Sort
Sort Reduce Output
Spill Merge
Shuffle
Serve
PATH TO IMPROVEMENT
•  Evaluation of data passed from map to reduce
revealed inefficiencies:
•  Constant timestamp cost 8 bytes per key
•  Repeated column names could be encoded/
compressed
•  Some Key/Value pairs didn’t need to be created until
reduce
•  Blocks of data output from the mapper guaranteed to
transfer ‘en masse’ to the same reducer
•  Hypothesis
•  Create ‘dehydrated’ key-value pairs of consecutive
values when possible
•  Spend CPU time in reduce to ‘rehydrate’ the key-values
prior to output
•  Fewer keys in shuffle also means the sort phase is more
efficient
21©2014 Sqrrl Data, Inc
PERFORMANCE MEASUREMENT
Optimized map code
•  Improvement:
•  Big speedup in map function
•  Twice as fast
•  Reduced intermediate inflation sped up all
steps between map and reduce
22©2014 Sqrrl Data, Inc
DO TRY THIS AT HOME
With these steps, we achieved 6X speedup:
•  Perform comparisons on serialized objects
•  With Map/Reduce, calculate how many merge
steps are needed
•  Avoid premature data inflation
•  Leverage compression to shift bottlenecks
•  Always consider how fast your code should run
Hints for Accumulo Application Optimization
23©2014 Sqrrl Data, Inc
POSTSCRIPT: CARRYING
IMPROVEMENTS IN TO THE
APPLICATION
©2014 Sqrrl Data, Inc 24
•  Recall that we “dehydrated” consecutive KVs
into one KV out of map, and “rehydrated”
them in reduce
•  Specifically, document storage
•  We can do this if we know the schema of the
document in advance
•  What if we just store dehydrated documents
on disk?
POSTSCRIPT: PARTIAL SCHEMAS
©2014 Sqrrl Data, Inc 25
•  Advantages
•  Bulk ingest just got even faster (no rehydrate step)
•  Disk footprint smaller
•  Potentially faster query response
•  Potential issues
•  Need to keep schemas around (but still want to
have flexible schemas)
•  How do you handle (lazy) updates?
•  Documents need to be rehydrated at some point…
when? And what’s the perf trade-off?
•  Perhaps we should model this?
•  To be continued…
Securely explore your data
QUESTIONS?
Chris McCubbin
Director of Data Science
Sqrrl Data, Inc.

Mais conteúdo relacionado

Mais procurados

Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Databricks
 
TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...
TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...
TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...
Databricks
 
Whitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryWhitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success Story
Kristofferson A
 

Mais procurados (20)

ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
Building a Video Streaming Testing Framework with Seastar
Building a Video Streaming Testing Framework with SeastarBuilding a Video Streaming Testing Framework with Seastar
Building a Video Streaming Testing Framework with Seastar
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
Optimizing Performance and Computing Resource Efficiency of In-Memory Big Dat...
 
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim TkachenkoWebinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
Webinar 2017. Supercharge your analytics with ClickHouse. Vadim Tkachenko
 
Evolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming PipelinesEvolution is Continuous, and so are Big Data and Streaming Pipelines
Evolution is Continuous, and so are Big Data and Streaming Pipelines
 
Hadoop engineering bo_f_final
Hadoop engineering bo_f_finalHadoop engineering bo_f_final
Hadoop engineering bo_f_final
 
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
ClickHouse Analytical DBMS. Introduction and usage, by Alexander ZaitsevClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
ClickHouse Analytical DBMS. Introduction and usage, by Alexander Zaitsev
 
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
Cassandra Exports as a Trivially Parallelizable Problem (Emilio Del Tessandor...
 
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
Cassandra @ Yahoo Japan (Satoshi Konno, Yahoo) | Cassandra Summit 2016
 
Scylla @ Disney+ Hotstar
Scylla @ Disney+ HotstarScylla @ Disney+ Hotstar
Scylla @ Disney+ Hotstar
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...
TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...
TuneIn: How to Get Your Hadoop/Spark Jobs Tuned While You’re Sleeping with Ma...
 
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing-(Eric Kacz...
 
DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale DataEngConf SF16 - Collecting and Moving Data at Scale
DataEngConf SF16 - Collecting and Moving Data at Scale
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Whitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success StoryWhitepaper: Exadata Consolidation Success Story
Whitepaper: Exadata Consolidation Success Story
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...
 

Destaque

Accumulo Summit 2015: Zookeeper, Accumulo, and You [Internals]
Accumulo Summit 2015: Zookeeper, Accumulo, and You [Internals]Accumulo Summit 2015: Zookeeper, Accumulo, and You [Internals]
Accumulo Summit 2015: Zookeeper, Accumulo, and You [Internals]
Accumulo Summit
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit
 
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit
 
Accumulo meetup 20130109
Accumulo meetup 20130109Accumulo meetup 20130109
Accumulo meetup 20130109
Sqrrl
 
Accumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the EnterpriseAccumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit
 
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit
 

Destaque (20)

Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]
Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]
Accumulo Summit 2015: Accumulo 2.0: A New Client API [API]
 
Accumulo Summit 2015: Zookeeper, Accumulo, and You [Internals]
Accumulo Summit 2015: Zookeeper, Accumulo, and You [Internals]Accumulo Summit 2015: Zookeeper, Accumulo, and You [Internals]
Accumulo Summit 2015: Zookeeper, Accumulo, and You [Internals]
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
 
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
 
Accumulo meetup 20130109
Accumulo meetup 20130109Accumulo meetup 20130109
Accumulo meetup 20130109
 
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
 
Accumulo design
Accumulo designAccumulo design
Accumulo design
 
Accumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the EnterpriseAccumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the Enterprise
 
Apache Accumulo and the Data Lake
Apache Accumulo and the Data LakeApache Accumulo and the Data Lake
Apache Accumulo and the Data Lake
 
Large Scale Accumulo Clusters
Large Scale Accumulo ClustersLarge Scale Accumulo Clusters
Large Scale Accumulo Clusters
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
 
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
 
Accumulo: A Quick Introduction
Accumulo: A Quick IntroductionAccumulo: A Quick Introduction
Accumulo: A Quick Introduction
 
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
 
Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big DataOct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
 
Apache Accumulo Overview
Apache Accumulo OverviewApache Accumulo Overview
Apache Accumulo Overview
 
SQL on Accumulo
SQL on AccumuloSQL on Accumulo
SQL on Accumulo
 

Semelhante a Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail of a Shared-Nothing Architecture [Performance]

Performance Models for Apache Accumulo
Performance Models for Apache AccumuloPerformance Models for Apache Accumulo
Performance Models for Apache Accumulo
Sqrrl
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Databricks
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
Rose Toomey
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Kristofferson A
 
KSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success StoryKSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success Story
Kristofferson A
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Databricks
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
Cisco Canada
 

Semelhante a Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail of a Shared-Nothing Architecture [Performance] (20)

Performance Models for Apache Accumulo
Performance Models for Apache AccumuloPerformance Models for Apache Accumulo
Performance Models for Apache Accumulo
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
Real World Performance - Data Warehouses
Real World Performance - Data WarehousesReal World Performance - Data Warehouses
Real World Performance - Data Warehouses
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Apache Spark At Scale in the Cloud
Apache Spark At Scale in the CloudApache Spark At Scale in the Cloud
Apache Spark At Scale in the Cloud
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
 
Webinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageWebinar: Untethering Compute from Storage
Webinar: Untethering Compute from Storage
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
Apache Spark AI Use Case in Telco: Network Quality Analysis and Prediction wi...
 
KSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success StoryKSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success Story
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov... Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Lov...
 
Spark_Intro_Syed_Academy
Spark_Intro_Syed_AcademySpark_Intro_Syed_Academy
Spark_Intro_Syed_Academy
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail of a Shared-Nothing Architecture [Performance]

  • 1. Securely explore your data PERFORMANCE MODELS FOR APACHE ACCUMULO: THE HEAVY TAIL OF A SHARED- NOTHING ARCHITECTURE Chris McCubbin Director of Data Science Sqrrl Data, Inc.
  • 2. TODAY’S TALK 1.  Quick intro to performance optimization 2.  Techniques for targeted distributed application modeling performance improvement 3.  A deep dive in to improving bulk load application performance 4.  A shallow dive in to partial schemas 2©2014 Sqrrl Data, Inc
  • 3. SO, YOUR DISTRIBUTED APPLICATION IS SLOW •  Today’s distributed applications run on tens or hundreds of library components •  Many versions so internet advice could be ineffective, or worse, flat out wrong •  Hundreds of settings •  Some, shall we say, could be better documented •  Shared-nothing architectures are usually “shared-little” architectures with tricky interactions •  Profiling is hard and time-consuming ©2014 Sqrrl Data, Inc 3
  • 4. ROUND UP THE ‘USUAL SUSPECTS’? •  “Common knowledge” that some things can cause performance issues •  Too much network usage •  Disk Bound •  Stragglers •  Framework settings •  Unbalanced distribution •  SerDe •  This might be a good start, but we really want to focus on the biggest problem if we can •  Technology, installations and use cases have high variability: what works for one job on one cluster may be useless on another ©2014 Sqrrl Data, Inc 4
  • 5. PERFORMANCE ANALYSIS CYCLE 5©2014 Sqrrl Data, Inc Simulate & Experiment Modify Code Analyze Start: Create Model Refine Model Outputs: Better Code + Models
  • 6. MAKING A MODEL ©2014 Sqrrl Data, Inc 6 •  Determine points of low-impact metrics •  Add some if needed •  Create parallel state machine models with components driven by these metrics •  Estimate running times and bottlenecks from a-priori information and/or apply measured statistics •  Focus testing on validation of the initial model and the (estimated) pain points •  Apply Amdahl’s Law •  Rinse, repeat
  • 7. The Apache Accumulo™ sorted, distributed key/value store is a secure, robust, scalable, high performance data storage and retrieval system. •  Many applications in real-time storage and analysis of “big data”: •  Spatio-temporal indexing in non-relational distributed databases - Fox et al 2013 IEEE International Congress on Big Data •  Big Data Dimensional Analysis - Gadepally et al IEEE HPEC 2014 •  Leading its peers in performance and scalability: •  Achieving 100,000,000 database inserts per second using Accumulo and D4M - Kepner et al IEEE HPEC 2014 •  An NSA Big Graph experiment (Technical Report NSA-RD-2013-056002v1) •  Benchmarking Apache Accumulo BigData Distributed Table Store Using Its Continuous Test Suite - Sen et al 2013 IEEE International Congress on Big Data For more papers and presentations, see http://accumulo.apache.org/papers.html 7©2014 Sqrrl Data, Inc
  • 8. •  Collections of KV pairs form Tables •  Tables are partitioned into Tablets •  Metadata tablets hold info about other tablets, forming a 3-level hierarchy •  A Tablet is a unit of work for a Tablet Server Data  Tablet   -­‐∞  :  thing   Data  Tablet   thing  :  ∞     Data  Tablet   -­‐∞  :  Ocelot     Data  Tablet   Ocelot  :  Yak     Data  Tablet   Yak  :  ∞     Data  Tablet   -­‐∞  to  ∞     Table:    Adam’s  Table   Table:    Encyclopedia   Table:    Foo   SCALING UP: DIVIDE & CONQUER Well-­‐Known   Loca9on   (zookeeper)   Root  Tablet   -­‐∞  to  ∞     Metadata  Tablet  2   “Encyclopedia:Ocelot”  to  ∞   Metadata  Tablet  1   -­‐∞  to  “Encyclopedia:Ocelot”   8©2014 Sqrrl Data, Inc
  • 9. BULK INGEST OVERVIEW •  Accumulo supports two mechanisms to bring data in: streaming ingest and bulk ingest. •  Bulk Ingest •  Goal: maximize throughput without constraining latency. •  Create a set of Accumulo Rfiles by some means, then register those files with Accumulo. •  RFiles are groups of sorted key-value pairs with some indexing information •  MapReduce has a built-in key sorting phase: a good fit to produce RFiles ©2014 Sqrrl Data, Inc 9
  • 10. BULK INGEST MODEL 10 Map Reduce Register Time ©2014 Sqrrl Data, Inc
  • 11. BULK INGEST MODEL 11 Time •  100% CPU •  20% Disk •  0% Network •  46 seconds •  40% CPU •  100% Disk •  20% Network •  168 seconds •  10% CPU •  20% Disk •  40% Network •  17 seconds Hypothetical Resource Usage ©2014 Sqrrl Data, Inc Map Reduce Register
  • 12. INSIGHT 12 Time •  100% CPU •  20% Disk •  0% Network •  46 seconds •  40% CPU •  100% Disk •  20% Network •  168 seconds •  10% CPU •  20% Disk •  40% Network •  17 seconds •  Spare disk here, spare CPU there – can we even out resource consumption? •  Why did reduce take 168 seconds? It should be more like 40 seconds. •  No clear bottleneck during registration – is there a synchronization or serialization problem? ©2014 Sqrrl Data, Inc Map Reduce Register
  • 13. Reduce Thread Map Thread LOOKING DEEPER: REFINED BULK INGEST MODEL 13 Map Setup Map Sort Sort Reduce Output Spill Merge Shuffle Serve Time ©2014 Sqrrl Data, Inc Parallel Latch
  • 14. BULK INGEST MODEL PREDICTIONS •  We can constrain parts of the model by physical throughput limitations •  Disk -> memory (100Mbps avg 7200rpm seq. read rate) •  Input reader •  Memory -> Disk (100Mbps) •  Spill, OutputWriter •  Disk -> Disk (50Mbps) •  Merge •  Network (Gigabit = 125Mbps) •  Shuffle •  And/or algorithmic limitations •  Sort, (Our) Map, (Our) Reduce, SerDe ©2014 Sqrrl Data, Inc 14
  • 15. PERFORMANCE GOAL MODEL ©2014 Sqrrl Data, Inc 15 Performance goals obtained through: •  Simulation of individual components •  Prediction of available resources at runtime
  • 16. INSTRUMENTATION application version 1.3.3 SYSTEM DATA application sha 8d17baf8 node num 1 input type arcsight yarn.nodemanager.resource.memory-mb 43008 map num containers 20 input block size 32 yarn.scheduler.minimum-allocation-mb 2048 red num containers 20 input block count 20 yarn.scheduler.maximum-allocation-mb 43008 cores physical 12 input total 672054649 yarn.app.mapreduce.am.resource.mb 2048 cores logical 24 output map 9313303723 yarn.app.mapreduce.am.command-opts -Xmx1536m disk num 8 output map:combine input records 243419324 mapreduce.map.memory.mb 2048 disk bandwidth 100 output map:combine records out 209318830 mapreduce.map.java.opts -Xmx1638m replication 1 output map:spill 7325671992 mapreduce.reduce.memory.mb 2048 monitoring TRUE output final 573802787 mapreduce.reduce.java.opts -Xmx1638m output map:combine 7301374577 mapreduce.task.io.sort.mb 100 TIME mapreduce.map.sort.spill.percent 0.8 map:setup avg 8 RATIOS mapreduce.task.io.sort.factor 10 map:map avg 12 input explosion factor 13.877904 mapreduce.reduce.shuffle.parallelcopies 5 map:sort avg 12 compression intermediate 1.003327786 mapreduce.job.reduce.slowstart.completedmaps 1 map:spill avg 12 load combiner output 0.783972562 mapreduce.map.output.compress FALSE map:spill count 7 total ratio 0.786581455 mapred.map.output.compression.codec n/a map:merge avg 46 description baseline map total 290 CONSTANTS red:shuffle avg 6 avg schema entry size (bytes) 59 red:merge avg 38 red:reduce avg 68 effective MB/sec 1.618488025 red:total avg 112 red:reducer count 20 job:total 396 16©2014 Sqrrl Data, Inc
  • 17. PERFORMANCE MEASUREMENT Baseline (naive implementation) 17©2014 Sqrrl Data, Inc Reduce Thread Map Thread Map Setup Map Sort Sort Reduce Output Spill Merge Shuffle Serve
  • 18. PATH TO IMPROVEMENT 1.  Profiling revealed much time spent serializing/deserializing Accumulo’s Key class 1.  Supported by recent investigations on e.g. spark jobs 1.  “as much as half of the CPU time is spent deserializing and decompressing data.” https://www.eecs.berkeley.edu/~keo/ publications/nsdi15-final147.pdf 2.  With proper configuration, MapReduce supports comparison of MR keys in serialized form 3.  Rewriting Key’s serialization lead to an order-preserving encoding, easy to compare in serialized form 4.  Configure MapReduce to use native code to compare Keys 5.  Tweak map input size and spill memory for as few spills as possible 18©2014 Sqrrl Data, Inc
  • 19. PERFORMANCE MEASUREMENT Optimized sorting •  Improvements: •  Time for map-side merge went down •  Sort performance drastically improved in both map and reduce phases •  300% faster 19©2014 Sqrrl Data, Inc
  • 20. PERFORMANCE MEASUREMENT Optimized sorting Insights: •  Map is slower than expected •  Intermediate data inflation ratio (output from map) is very high, and the mapper is now disk-bound •  Amdahl’s law strikes again •  Reducer Output is also already disk bound. •  Can we trade disk time in Map for ‘free’ CPU time in Reduce? 20©2014 Sqrrl Data, Inc Reduce Thread Map Thread Map Setup Map Sort Sort Reduce Output Spill Merge Shuffle Serve
  • 21. PATH TO IMPROVEMENT •  Evaluation of data passed from map to reduce revealed inefficiencies: •  Constant timestamp cost 8 bytes per key •  Repeated column names could be encoded/ compressed •  Some Key/Value pairs didn’t need to be created until reduce •  Blocks of data output from the mapper guaranteed to transfer ‘en masse’ to the same reducer •  Hypothesis •  Create ‘dehydrated’ key-value pairs of consecutive values when possible •  Spend CPU time in reduce to ‘rehydrate’ the key-values prior to output •  Fewer keys in shuffle also means the sort phase is more efficient 21©2014 Sqrrl Data, Inc
  • 22. PERFORMANCE MEASUREMENT Optimized map code •  Improvement: •  Big speedup in map function •  Twice as fast •  Reduced intermediate inflation sped up all steps between map and reduce 22©2014 Sqrrl Data, Inc
  • 23. DO TRY THIS AT HOME With these steps, we achieved 6X speedup: •  Perform comparisons on serialized objects •  With Map/Reduce, calculate how many merge steps are needed •  Avoid premature data inflation •  Leverage compression to shift bottlenecks •  Always consider how fast your code should run Hints for Accumulo Application Optimization 23©2014 Sqrrl Data, Inc
  • 24. POSTSCRIPT: CARRYING IMPROVEMENTS IN TO THE APPLICATION ©2014 Sqrrl Data, Inc 24 •  Recall that we “dehydrated” consecutive KVs into one KV out of map, and “rehydrated” them in reduce •  Specifically, document storage •  We can do this if we know the schema of the document in advance •  What if we just store dehydrated documents on disk?
  • 25. POSTSCRIPT: PARTIAL SCHEMAS ©2014 Sqrrl Data, Inc 25 •  Advantages •  Bulk ingest just got even faster (no rehydrate step) •  Disk footprint smaller •  Potentially faster query response •  Potential issues •  Need to keep schemas around (but still want to have flexible schemas) •  How do you handle (lazy) updates? •  Documents need to be rehydrated at some point… when? And what’s the perf trade-off? •  Perhaps we should model this? •  To be continued…
  • 26. Securely explore your data QUESTIONS? Chris McCubbin Director of Data Science Sqrrl Data, Inc.