SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Securely explore your data 
PERFORMANCE MODELS 
FOR APACHE ACCUMULO: 
THE HEAVY TAIL OF A SHARED-NOTHING 
ARCHITECTURE 
Chris McCubbin 
Director of Data Science 
Sqrrl Data, Inc.
I’M NOT ADAM FUCHS 
• But perhaps I’m still an interesting guy 
• MS in CS from UMBC in Network Security and 
Quantum Computing 
• 8 years at JHU/APL working on UxV Swarms 
• 4 years at JHU/APL and TexelTek creating Big 
Data Applications for the NSA 
• Co-founder and Director of Data Science at Sqrrl 
©2014 Sqrrl Data, Inc 2
SO, YOUR DISTRIBUTED 
APPLICATION IS SLOW 
• Today’s distributed applications run on tens or 
hundreds of library components 
• Many versions so internet advice could be ineffective, or 
worse, flat out wrong 
• Hundreds of settings 
• Some, shall we say, could be better documented 
• Shared-nothing architectures are usually “shared-little” 
architectures with tricky interactions 
• Profiling is hard and time-consuming 
• What do we do? 
©2014 Sqrrl Data, Inc 3
TODAY’S TALK 
1. Quick intro to performance optimization 
2. Tricks and techniques for targeted distributed 
application modeling performance improvement 
3. A deep dive into improving bulk load application 
performance 
©2014 Sqrrl Data, Inc 4
The Apache Accumulo™ sorted, distributed key/value store is a secure, robust, 
scalable, high performance data storage and retrieval system. 
• Many applications in real-time storage and analysis of “big data”: 
• Spatio-temporal indexing in non-relational distributed databases - Fox et al 
2013 IEEE International Congress on Big Data 
• Big Data Dimensional Analysis - Gadepally et al IEEE HPEC 2014 
• Leading its peers in performance and scalability: 
• Achieving 100,000,000 database inserts per second using Accumulo and 
D4M - Kepner et al IEEE HPEC 2014 
• An NSA Big Graph experiment (Technical Report NSA-RD-2013-056002v1) 
• Benchmarking Apache Accumulo BigData Distributed Table Store Using Its 
Continuous Test Suite - Sen et al 2013 IEEE International Congress on Big 
Data 
For more papers and presentations, see http://accumulo.apache.org/papers.html 
©2014 Sqrrl Data, Inc 5
SCALING UP: DIVIDE & CONQUER 
• Collections of KV pairs form Tables 
• Tables are partitioned into Tablets 
• Metadata tablets hold info about 
other tablets, forming a 3-level 
hierarchy 
• A Tablet is a unit of work for a 
Tablet Server 
Table: 
Adam’s 
Table 
Table: 
Encyclopedia 
Table: 
Foo 
Data 
Tablet 
-­‐∞ 
: 
thing 
Data 
Tablet 
thing 
: 
∞ 
Data 
Tablet 
-­‐∞ 
: 
Ocelot 
Data 
Tablet 
Ocelot 
: 
Yak 
Data 
Tablet 
Yak 
: 
∞ 
Data 
Tablet 
-­‐∞ 
to 
∞ 
Well-­‐Known 
Loca9on 
(zookeeper) 
Root 
Tablet 
-­‐∞ 
to 
∞ 
Metadata 
Tablet 
2 
“Encyclopedia:Ocelot” 
to 
∞ 
Metadata 
Tablet 
1 
-­‐∞ 
to 
“Encyclopedia:Ocelot” 
©2014 Sqrrl Data, Inc 6
PERFORMANCE ANALYSIS CYCLE 
Simulate & 
Experiment 
Modify 
Code 
Analyze 
Start: 
Create 
Model 
Refine 
Model 
Outputs: 
Better Code 
+ Models 
©2014 Sqrrl Data, Inc 7
MAKING A MODEL 
• Determine points of low-impact metrics 
• Add some if needed 
• Create parallel state machine models with 
components driven by these metrics 
• Estimate running times and bottlenecks from 
a-priori information and/or apply measured 
statistics 
• Focus testing on validation of the initial 
model and the (estimated) pain points 
• Apply Amdahl’s Law 
• Rinse, repeat 
©2014 Sqrrl Data, Inc 8
BULK INGEST OVERVIEW 
• Accumulo supports two mechanisms to bring 
data in: streaming ingest and bulk ingest. 
• Bulk Ingest 
• Goal: maximize throughput without constraining 
latency. 
• create a set of Accumulo Rfiles, then register those 
files with Accumulo. 
• RFiles are groups of sorted key-value pairs with 
some indexing information 
• MapReduce has a built-in key sorting phase: a good 
fit to produce RFiles 
©2014 Sqrrl Data, Inc 9
BULK INGEST MODEL 
10 
Map Reduce Register 
Time 
©2014 Sqrrl Data, Inc
BULK INGEST MODEL 
11 
Hypothetical Resource Usage 
Time 
• 100% CPU 
• 20% Disk 
• 0% Network 
• 46 seconds 
• 40% CPU 
• 100% Disk 
• 20% Network 
• 168 seconds 
• 10% CPU 
• 20% Disk 
• 40% Network 
• 17 seconds 
©2014 Sqrrl Data, Inc 
Map Reduce Register
INSIGHT 
• Spare disk here, spare CPU there – can we even out resource consumption? 
• Why did reduce take 168 seconds? It should be more like 40 seconds. 
• No clear bottleneck during registration – is there a synchronization or 
serialization problem? 
12 
Time 
• 100% CPU 
• 20% Disk 
• 0% Network 
• 46 seconds 
• 40% CPU 
• 100% Disk 
• 20% Network 
• 168 seconds 
• 10% CPU 
• 20% Disk 
• 40% Network 
• 17 seconds 
©2014 Sqrrl Data, Inc 
Map Reduce Register
LOOKING DEEPER: 
REFINED BULK INGEST MODEL 
Reduce Thread 
Map Thread 
13 
Map 
Setup Map Sort 
Sort Reduce Output 
Spill Merge 
Serve 
Shuffle 
Time 
©2014 Sqrrl Data, Inc 
Parallel Latch
BULK INGEST MODEL PREDICTIONS 
• We can constrain parts of the model by physical 
throughput limitations 
• Disk -> memory (100Mbps avg 7200rpm seq. read rate) 
• Input reader 
• Memory -> Disk (100Mbps) 
• Spill, OutputWriter 
• Disk -> Disk (50Mbps) 
• Merge 
• Network (Gigabit = 125Mbps) 
• Shuffle 
• And/or algorithmic limitations 
• Sort, (Our) Map, (Our) Reduce, SerDe 
©2014 Sqrrl Data, Inc 14
PERFORMANCE GOAL MODEL 
Performance goals obtained through: 
• Simulation of individual components 
• Prediction of available resources at runtime 
©2014 Sqrrl Data, Inc 15
INSTRUMENTATION 
application version 1.3.3 SYSTEM DATA 
application sha 8d17baf8 node num 1 input type arcsight 
yarn.nodemanager.resource.memory-mb 43008 map num containers 20 input block size 32 
yarn.scheduler.minimum-allocation-mb 2048 red num containers 20 input block count 20 
yarn.scheduler.maximum-allocation-mb 43008 cores physical 12 input total 672054649 
yarn.app.mapreduce.am.resource.mb 2048 cores logical 24 output map 9313303723 
yarn.app.mapreduce.am.command-opts -Xmx1536m disk num 8 output map:combine input records 243419324 
mapreduce.map.memory.mb 2048 disk bandwidth 100 output map:combine records out 209318830 
mapreduce.map.java.opts -Xmx1638m replication 1 output map:spill 7325671992 
mapreduce.reduce.memory.mb 2048 monitoring TRUE output final 573802787 
mapreduce.reduce.java.opts -Xmx1638m output map:combine 7301374577 
mapreduce.task.io.sort.mb 100 TIME 
mapreduce.map.sort.spill.percent 0.8 map:setup avg 8 RATIOS 
mapreduce.task.io.sort.factor 10 map:map avg 12 input explosion factor 13.877904 
mapreduce.reduce.shuffle.parallelcopies 5 map:sort avg 12 compression intermediate 1.003327786 
mapreduce.job.reduce.slowstart.completedmaps 1 map:spill avg 12 load combiner output 0.783972562 
mapreduce.map.output.compress FALSE map:spill count 7 total ratio 0.786581455 
mapred.map.output.compression.codec n/a map:merge avg 46 
description baseline map total 290 CONSTANTS 
red:shuffle avg 6 avg schema entry size (bytes) 59 
red:merge avg 38 
red:reduce avg 68 effective MB/sec 1.618488025 
red:total avg 112 
red:reducer count 20 
job:total 396 
©2014 Sqrrl Data, Inc 16
PERFORMANCE MEASUREMENT 
Baseline (naive implementation) 
Reduce Thread 
Map Thread 
Map 
Setup Map Sort 
Sort Reduce Output 
Spill Merge 
Serve 
Shuffle 
©2014 Sqrrl Data, Inc 17
PATH TO IMPROVEMENT 
1. Profiling revealed much time spent serializing/ 
deserializing Key 
2. With proper configuration, MapReduce supports 
comparison of keys in serialized form 
3. Rewriting Key’s serialization lead to an order-preserving 
encoding, easy to compare in serialized form 
4. Configure MapReduce to use native code to compare 
Keys 
5. Tweak map input size and spill memory for as few spills 
as possible 
©2014 Sqrrl Data, Inc 18
PERFORMANCE MEASUREMENT 
Optimized sorting 
• Improvements: 
• Time for map-side merge went down 
• Sort performance drastically improved in both 
map and reduce phases 
• 300% faster 
©2014 Sqrrl Data, Inc 19
PERFORMANCE MEASUREMENT 
Optimized sorting 
Reduce Thread 
Map Thread 
Map 
Setup Map Sort 
Sort Reduce Output 
Spill Merge 
Serve 
Shuffle 
Insights: 
• Map is slower than expected 
• Output is disk bound maybe we can move more processing to Reduce 
• “Reverse Amdahl’s law” 
• Intermediate data inflation ratio (output/input for map) is very high 
©2014 Sqrrl Data, Inc 20
PATH TO IMPROVEMENT 
1. Profiling revealed much time spent copying data 
2. Evaluation of data passed from map to reduce 
revealed inefficiencies: 
• Constant timestamp cost 8 bytes per key 
• Repeated column names could be encoded/ 
compressed 
• Some Key/Value pairs didn’t need to be created 
until reduce 
©2014 Sqrrl Data, Inc 21
PERFORMANCE MEASUREMENT 
Optimized map code 
• Improvement: 
• Big speedup in map function 
• Twice as fast 
• Reduced intermediate inflation sped up all 
steps between map and reduce 
©2014 Sqrrl Data, Inc 22
DO TRY THIS AT HOME 
Hints for Accumulo Application Optimization 
With these steps, we achieved 6X speedup: 
• Perform comparisons on serialized objects 
• With Map/Reduce, calculate how many merge 
steps are needed 
• Avoid premature data inflation 
• Leverage compression to shift bottlenecks 
• Always consider how fast your code should run 
©2014 Sqrrl Data, Inc 23
SOME CURRENT ACCUMULO 
PERFORMANCE PROJECTS 
• Optimize metadata operations 
• Batch to improve throughput (ACCUMULO-2175, 
ACCUMULO-2889) 
• Remove from critical path where possible 
• Optimize write-ahead log performance 
• Maximize throughput 
• Reduce flushes 
• Parallelize WALs (ACCUMULO-1083) 
• Avoid downtime by pre-allocating 
©2014 Sqrrl Data, Inc 24
Securely explore your data 
SQRRL IS HIRING! 
QUESTIONS? 
Chris McCubbin 
Director of Data Science 
Sqrrl Data, Inc.

Mais conteúdo relacionado

Mais procurados

Splunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilsonSplunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilsonBecky Burwell
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Big Data Spain
 
IoT Architectural Overview - 3 use case studies from InfluxData
IoT Architectural Overview - 3 use case studies from InfluxData IoT Architectural Overview - 3 use case studies from InfluxData
IoT Architectural Overview - 3 use case studies from InfluxData InfluxData
 
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...Spark Summit
 
巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architecture巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architectureWei-Chiu Chuang
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...Jürgen Ambrosi
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrowmagda3695
 
JupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter NotebooksJupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter NotebooksMichelle Ufford
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
 
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersYahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersBrett Sheppard
 
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...Dan Pilone
 
Data Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageData Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageWes McKinney
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...DataWorks Summit
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDataWorks Summit
 
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Databricks
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDataWorks Summit
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 
Elephants in the cloud or how to become cloud ready
Elephants in the cloud or how to become cloud readyElephants in the cloud or how to become cloud ready
Elephants in the cloud or how to become cloud readyKrzysztof Adamski
 

Mais procurados (19)

Splunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilsonSplunking configfiles 20211208_daniel_wilson
Splunking configfiles 20211208_daniel_wilson
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
IoT Architectural Overview - 3 use case studies from InfluxData
IoT Architectural Overview - 3 use case studies from InfluxData IoT Architectural Overview - 3 use case studies from InfluxData
IoT Architectural Overview - 3 use case studies from InfluxData
 
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
Scaling Through Simplicity—How a 300 million User Chat App Reduced Data Engin...
 
巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architecture巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architecture
 
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
6° Sessione - Ambiti applicativi nella ricerca di tecnologie statistiche avan...
 
Big data today and tomorrow
Big data today and tomorrowBig data today and tomorrow
Big data today and tomorrow
 
JupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter NotebooksJupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
JupyterCon 2020 - Supercharging SQL Users with Jupyter Notebooks
 
Analyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-time
 
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop ClustersYahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
Yahoo Enabling Exploratory Analytics of Data in Shared-service Hadoop Clusters
 
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...AWS Earth and Space 2018 -   Element 84 Processing and Streaming GOES-16 Data...
AWS Earth and Space 2018 - Element 84 Processing and Streaming GOES-16 Data...
 
Data Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageData Tools and the Data Scientist Shortage
Data Tools and the Data Scientist Shortage
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron (...
 
Detecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking DataDetecting Hacks: Anomaly Detection on Networking Data
Detecting Hacks: Anomaly Detection on Networking Data
 
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
Risk Management Framework Using Intel FPGA, Apache Spark, and Persistent RDDs...
 
Design Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data AnalyticsDesign Patterns For Real Time Streaming Data Analytics
Design Patterns For Real Time Streaming Data Analytics
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
Elephants in the cloud or how to become cloud ready
Elephants in the cloud or how to become cloud readyElephants in the cloud or how to become cloud ready
Elephants in the cloud or how to become cloud ready
 
Druid @ branch
Druid @ branch Druid @ branch
Druid @ branch
 

Destaque

Cloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
Cloudera Federal Forum 2014: EzBake, the DoDIIS App EngineCloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
Cloudera Federal Forum 2014: EzBake, the DoDIIS App EngineCloudera, Inc.
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudCloudera, Inc.
 
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)Docker, Inc.
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester WebinarCloudera, Inc.
 

Destaque (6)

Cloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
Cloudera Federal Forum 2014: EzBake, the DoDIIS App EngineCloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
Cloudera Federal Forum 2014: EzBake, the DoDIIS App Engine
 
Iframe src
Iframe srcIframe src
Iframe src
 
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the CloudData Engineering: Elastic, Low-Cost Data Processing in the Cloud
Data Engineering: Elastic, Low-Cost Data Processing in the Cloud
 
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
Building Web Scale Apps with Docker and Mesos by Alex Rukletsov (Mesosphere)
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Kudu Forrester Webinar
Kudu Forrester WebinarKudu Forrester Webinar
Kudu Forrester Webinar
 

Semelhante a Performance Models for Apache Accumulo

Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010Cloudera, Inc.
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
DOWNSAMPLING DATA
DOWNSAMPLING DATADOWNSAMPLING DATA
DOWNSAMPLING DATAInfluxData
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataAlexMiowski
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesMurtadha Alsabbagh
 
Data Warehouse Offload
Data Warehouse OffloadData Warehouse Offload
Data Warehouse OffloadJohn Berns
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and HadoopMichael Zhang
 
Presentation mongo db munich
Presentation mongo db munichPresentation mongo db munich
Presentation mongo db munichMongoDB
 
Real World Performance - Data Warehouses
Real World Performance - Data WarehousesReal World Performance - Data Warehouses
Real World Performance - Data WarehousesConnor McDonald
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everythingLew Tucker
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Databricks
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...t_ivanov
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Etu Solution
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsIgor Sfiligoi
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Caserta
 

Semelhante a Performance Models for Apache Accumulo (20)

Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
DOWNSAMPLING DATA
DOWNSAMPLING DATADOWNSAMPLING DATA
DOWNSAMPLING DATA
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning Data
 
OOW13 Exadata and ODI with Parallel
OOW13 Exadata and ODI with ParallelOOW13 Exadata and ODI with Parallel
OOW13 Exadata and ODI with Parallel
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
Data Warehouse Offload
Data Warehouse OffloadData Warehouse Offload
Data Warehouse Offload
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
 
Presentation mongo db munich
Presentation mongo db munichPresentation mongo db munich
Presentation mongo db munich
 
Real World Performance - Data Warehouses
Real World Performance - Data WarehousesReal World Performance - Data Warehouses
Real World Performance - Data Warehouses
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析Track A-2 基於 Spark 的數據分析
Track A-2 基於 Spark 的數據分析
 
Spark etl
Spark etlSpark etl
Spark etl
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the Clouds
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
 

Mais de Sqrrl

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government TechnologySqrrl
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsSqrrl
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkSqrrl
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedSqrrl
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Sqrrl
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphSqrrl
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Sqrrl
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivitySqrrl
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingSqrrl
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Sqrrl
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert TriageSqrrl
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to KnowSqrrl
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data AdvantageSqrrl
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreSqrrl
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelSqrrl
 
April 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with SqrrlApril 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with SqrrlSqrrl
 

Mais de Sqrrl (20)

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government Technology
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your Hunts
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your Network
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar Users
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control Activity
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led Training
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert Triage
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to Know
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use Case
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data Advantage
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, Analyze
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber Hunting
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value Store
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with Pregel
 
April 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with SqrrlApril 2015 Webinar: Cyber Hunting with Sqrrl
April 2015 Webinar: Cyber Hunting with Sqrrl
 

Último

best weekend places near delhi where you should visit.pdf
best weekend places near delhi where you should visit.pdfbest weekend places near delhi where you should visit.pdf
best weekend places near delhi where you should visit.pdftour guide
 
💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati
💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati
💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh HaldighatiApsara Of India
 
9 Days Kenya Ultimate Safari Odyssey with Kibera Holiday Safaris
9 Days Kenya Ultimate Safari Odyssey with Kibera Holiday Safaris9 Days Kenya Ultimate Safari Odyssey with Kibera Holiday Safaris
9 Days Kenya Ultimate Safari Odyssey with Kibera Holiday SafarisKibera Holiday Safaris Safaris
 
Genesis 1:6 || Meditate the Scripture daily verse by verse
Genesis 1:6  ||  Meditate the Scripture daily verse by verseGenesis 1:6  ||  Meditate the Scripture daily verse by verse
Genesis 1:6 || Meditate the Scripture daily verse by versemaricelcanoynuay
 
visa consultant | 📞📞 03094429236 || Best Study Visa Consultant
visa consultant | 📞📞 03094429236 || Best Study Visa Consultantvisa consultant | 📞📞 03094429236 || Best Study Visa Consultant
visa consultant | 📞📞 03094429236 || Best Study Visa ConsultantSherazi Tours
 
DARK TRAVEL AGENCY presented by Khuda Bux
DARK TRAVEL AGENCY presented by Khuda BuxDARK TRAVEL AGENCY presented by Khuda Bux
DARK TRAVEL AGENCY presented by Khuda BuxBeEducate
 
Kanpur Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Kanpur Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceKanpur Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Kanpur Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceDamini Dixit
 
char Dham yatra, Uttarakhand tourism.pptx
char Dham yatra, Uttarakhand tourism.pptxchar Dham yatra, Uttarakhand tourism.pptx
char Dham yatra, Uttarakhand tourism.pptxpalakdigital7
 
Visa Consultant in Lahore || 📞03094429236
Visa Consultant in Lahore || 📞03094429236Visa Consultant in Lahore || 📞03094429236
Visa Consultant in Lahore || 📞03094429236Sherazi Tours
 
Night 7k Call Girls Noida Sector 93 Escorts Call Me: 8448380779
Night 7k Call Girls Noida Sector 93 Escorts Call Me: 8448380779Night 7k Call Girls Noida Sector 93 Escorts Call Me: 8448380779
Night 7k Call Girls Noida Sector 93 Escorts Call Me: 8448380779Delhi Call girls
 
08448380779 Call Girls In Chirag Enclave Women Seeking Men
08448380779 Call Girls In Chirag Enclave Women Seeking Men08448380779 Call Girls In Chirag Enclave Women Seeking Men
08448380779 Call Girls In Chirag Enclave Women Seeking MenDelhi Call girls
 
Study Consultants in Lahore || 📞03094429236
Study Consultants in Lahore || 📞03094429236Study Consultants in Lahore || 📞03094429236
Study Consultants in Lahore || 📞03094429236Sherazi Tours
 
🔥HOT🔥📲9602870969🔥Prostitute Service in Udaipur Call Girls in City Palace Lake...
🔥HOT🔥📲9602870969🔥Prostitute Service in Udaipur Call Girls in City Palace Lake...🔥HOT🔥📲9602870969🔥Prostitute Service in Udaipur Call Girls in City Palace Lake...
🔥HOT🔥📲9602870969🔥Prostitute Service in Udaipur Call Girls in City Palace Lake...Apsara Of India
 
Experience the Magic of Saint Martin and Sint Maarten with Find American Rent...
Experience the Magic of Saint Martin and Sint Maarten with Find American Rent...Experience the Magic of Saint Martin and Sint Maarten with Find American Rent...
Experience the Magic of Saint Martin and Sint Maarten with Find American Rent...Find American Rentals
 
08448380779 Call Girls In Chhattarpur Women Seeking Men
08448380779 Call Girls In Chhattarpur Women Seeking Men08448380779 Call Girls In Chhattarpur Women Seeking Men
08448380779 Call Girls In Chhattarpur Women Seeking MenDelhi Call girls
 

Último (20)

best weekend places near delhi where you should visit.pdf
best weekend places near delhi where you should visit.pdfbest weekend places near delhi where you should visit.pdf
best weekend places near delhi where you should visit.pdf
 
💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati
💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati
💕📲09602870969💓Girl Escort Services Udaipur Call Girls in Chittorgarh Haldighati
 
9 Days Kenya Ultimate Safari Odyssey with Kibera Holiday Safaris
9 Days Kenya Ultimate Safari Odyssey with Kibera Holiday Safaris9 Days Kenya Ultimate Safari Odyssey with Kibera Holiday Safaris
9 Days Kenya Ultimate Safari Odyssey with Kibera Holiday Safaris
 
Genesis 1:6 || Meditate the Scripture daily verse by verse
Genesis 1:6  ||  Meditate the Scripture daily verse by verseGenesis 1:6  ||  Meditate the Scripture daily verse by verse
Genesis 1:6 || Meditate the Scripture daily verse by verse
 
visa consultant | 📞📞 03094429236 || Best Study Visa Consultant
visa consultant | 📞📞 03094429236 || Best Study Visa Consultantvisa consultant | 📞📞 03094429236 || Best Study Visa Consultant
visa consultant | 📞📞 03094429236 || Best Study Visa Consultant
 
DARK TRAVEL AGENCY presented by Khuda Bux
DARK TRAVEL AGENCY presented by Khuda BuxDARK TRAVEL AGENCY presented by Khuda Bux
DARK TRAVEL AGENCY presented by Khuda Bux
 
Kanpur Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Kanpur Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceKanpur Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Kanpur Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
 
char Dham yatra, Uttarakhand tourism.pptx
char Dham yatra, Uttarakhand tourism.pptxchar Dham yatra, Uttarakhand tourism.pptx
char Dham yatra, Uttarakhand tourism.pptx
 
Visa Consultant in Lahore || 📞03094429236
Visa Consultant in Lahore || 📞03094429236Visa Consultant in Lahore || 📞03094429236
Visa Consultant in Lahore || 📞03094429236
 
Night 7k Call Girls Noida Sector 93 Escorts Call Me: 8448380779
Night 7k Call Girls Noida Sector 93 Escorts Call Me: 8448380779Night 7k Call Girls Noida Sector 93 Escorts Call Me: 8448380779
Night 7k Call Girls Noida Sector 93 Escorts Call Me: 8448380779
 
08448380779 Call Girls In Chirag Enclave Women Seeking Men
08448380779 Call Girls In Chirag Enclave Women Seeking Men08448380779 Call Girls In Chirag Enclave Women Seeking Men
08448380779 Call Girls In Chirag Enclave Women Seeking Men
 
Call Girls In Munirka 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Munirka 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In Munirka 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In Munirka 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
Study Consultants in Lahore || 📞03094429236
Study Consultants in Lahore || 📞03094429236Study Consultants in Lahore || 📞03094429236
Study Consultants in Lahore || 📞03094429236
 
🔥HOT🔥📲9602870969🔥Prostitute Service in Udaipur Call Girls in City Palace Lake...
🔥HOT🔥📲9602870969🔥Prostitute Service in Udaipur Call Girls in City Palace Lake...🔥HOT🔥📲9602870969🔥Prostitute Service in Udaipur Call Girls in City Palace Lake...
🔥HOT🔥📲9602870969🔥Prostitute Service in Udaipur Call Girls in City Palace Lake...
 
Rohini Sector 18 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 18 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 18 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 18 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Experience the Magic of Saint Martin and Sint Maarten with Find American Rent...
Experience the Magic of Saint Martin and Sint Maarten with Find American Rent...Experience the Magic of Saint Martin and Sint Maarten with Find American Rent...
Experience the Magic of Saint Martin and Sint Maarten with Find American Rent...
 
Call Girls Service !! Indirapuram!! @9999965857 Delhi 🫦 No Advance VVVIP 🍎 S...
Call Girls Service !! Indirapuram!! @9999965857 Delhi 🫦 No Advance  VVVIP 🍎 S...Call Girls Service !! Indirapuram!! @9999965857 Delhi 🫦 No Advance  VVVIP 🍎 S...
Call Girls Service !! Indirapuram!! @9999965857 Delhi 🫦 No Advance VVVIP 🍎 S...
 
Call Girls Service !! New Friends Colony!! @9999965857 Delhi 🫦 No Advance VV...
Call Girls Service !! New Friends Colony!! @9999965857 Delhi 🫦 No Advance  VV...Call Girls Service !! New Friends Colony!! @9999965857 Delhi 🫦 No Advance  VV...
Call Girls Service !! New Friends Colony!! @9999965857 Delhi 🫦 No Advance VV...
 
Call Girls 🫤 Connaught Place ➡️ 9999965857 ➡️ Delhi 🫦 Russian Escorts FULL ...
Call Girls 🫤 Connaught Place ➡️ 9999965857  ➡️ Delhi 🫦  Russian Escorts FULL ...Call Girls 🫤 Connaught Place ➡️ 9999965857  ➡️ Delhi 🫦  Russian Escorts FULL ...
Call Girls 🫤 Connaught Place ➡️ 9999965857 ➡️ Delhi 🫦 Russian Escorts FULL ...
 
08448380779 Call Girls In Chhattarpur Women Seeking Men
08448380779 Call Girls In Chhattarpur Women Seeking Men08448380779 Call Girls In Chhattarpur Women Seeking Men
08448380779 Call Girls In Chhattarpur Women Seeking Men
 

Performance Models for Apache Accumulo

  • 1. Securely explore your data PERFORMANCE MODELS FOR APACHE ACCUMULO: THE HEAVY TAIL OF A SHARED-NOTHING ARCHITECTURE Chris McCubbin Director of Data Science Sqrrl Data, Inc.
  • 2. I’M NOT ADAM FUCHS • But perhaps I’m still an interesting guy • MS in CS from UMBC in Network Security and Quantum Computing • 8 years at JHU/APL working on UxV Swarms • 4 years at JHU/APL and TexelTek creating Big Data Applications for the NSA • Co-founder and Director of Data Science at Sqrrl ©2014 Sqrrl Data, Inc 2
  • 3. SO, YOUR DISTRIBUTED APPLICATION IS SLOW • Today’s distributed applications run on tens or hundreds of library components • Many versions so internet advice could be ineffective, or worse, flat out wrong • Hundreds of settings • Some, shall we say, could be better documented • Shared-nothing architectures are usually “shared-little” architectures with tricky interactions • Profiling is hard and time-consuming • What do we do? ©2014 Sqrrl Data, Inc 3
  • 4. TODAY’S TALK 1. Quick intro to performance optimization 2. Tricks and techniques for targeted distributed application modeling performance improvement 3. A deep dive into improving bulk load application performance ©2014 Sqrrl Data, Inc 4
  • 5. The Apache Accumulo™ sorted, distributed key/value store is a secure, robust, scalable, high performance data storage and retrieval system. • Many applications in real-time storage and analysis of “big data”: • Spatio-temporal indexing in non-relational distributed databases - Fox et al 2013 IEEE International Congress on Big Data • Big Data Dimensional Analysis - Gadepally et al IEEE HPEC 2014 • Leading its peers in performance and scalability: • Achieving 100,000,000 database inserts per second using Accumulo and D4M - Kepner et al IEEE HPEC 2014 • An NSA Big Graph experiment (Technical Report NSA-RD-2013-056002v1) • Benchmarking Apache Accumulo BigData Distributed Table Store Using Its Continuous Test Suite - Sen et al 2013 IEEE International Congress on Big Data For more papers and presentations, see http://accumulo.apache.org/papers.html ©2014 Sqrrl Data, Inc 5
  • 6. SCALING UP: DIVIDE & CONQUER • Collections of KV pairs form Tables • Tables are partitioned into Tablets • Metadata tablets hold info about other tablets, forming a 3-level hierarchy • A Tablet is a unit of work for a Tablet Server Table: Adam’s Table Table: Encyclopedia Table: Foo Data Tablet -­‐∞ : thing Data Tablet thing : ∞ Data Tablet -­‐∞ : Ocelot Data Tablet Ocelot : Yak Data Tablet Yak : ∞ Data Tablet -­‐∞ to ∞ Well-­‐Known Loca9on (zookeeper) Root Tablet -­‐∞ to ∞ Metadata Tablet 2 “Encyclopedia:Ocelot” to ∞ Metadata Tablet 1 -­‐∞ to “Encyclopedia:Ocelot” ©2014 Sqrrl Data, Inc 6
  • 7. PERFORMANCE ANALYSIS CYCLE Simulate & Experiment Modify Code Analyze Start: Create Model Refine Model Outputs: Better Code + Models ©2014 Sqrrl Data, Inc 7
  • 8. MAKING A MODEL • Determine points of low-impact metrics • Add some if needed • Create parallel state machine models with components driven by these metrics • Estimate running times and bottlenecks from a-priori information and/or apply measured statistics • Focus testing on validation of the initial model and the (estimated) pain points • Apply Amdahl’s Law • Rinse, repeat ©2014 Sqrrl Data, Inc 8
  • 9. BULK INGEST OVERVIEW • Accumulo supports two mechanisms to bring data in: streaming ingest and bulk ingest. • Bulk Ingest • Goal: maximize throughput without constraining latency. • create a set of Accumulo Rfiles, then register those files with Accumulo. • RFiles are groups of sorted key-value pairs with some indexing information • MapReduce has a built-in key sorting phase: a good fit to produce RFiles ©2014 Sqrrl Data, Inc 9
  • 10. BULK INGEST MODEL 10 Map Reduce Register Time ©2014 Sqrrl Data, Inc
  • 11. BULK INGEST MODEL 11 Hypothetical Resource Usage Time • 100% CPU • 20% Disk • 0% Network • 46 seconds • 40% CPU • 100% Disk • 20% Network • 168 seconds • 10% CPU • 20% Disk • 40% Network • 17 seconds ©2014 Sqrrl Data, Inc Map Reduce Register
  • 12. INSIGHT • Spare disk here, spare CPU there – can we even out resource consumption? • Why did reduce take 168 seconds? It should be more like 40 seconds. • No clear bottleneck during registration – is there a synchronization or serialization problem? 12 Time • 100% CPU • 20% Disk • 0% Network • 46 seconds • 40% CPU • 100% Disk • 20% Network • 168 seconds • 10% CPU • 20% Disk • 40% Network • 17 seconds ©2014 Sqrrl Data, Inc Map Reduce Register
  • 13. LOOKING DEEPER: REFINED BULK INGEST MODEL Reduce Thread Map Thread 13 Map Setup Map Sort Sort Reduce Output Spill Merge Serve Shuffle Time ©2014 Sqrrl Data, Inc Parallel Latch
  • 14. BULK INGEST MODEL PREDICTIONS • We can constrain parts of the model by physical throughput limitations • Disk -> memory (100Mbps avg 7200rpm seq. read rate) • Input reader • Memory -> Disk (100Mbps) • Spill, OutputWriter • Disk -> Disk (50Mbps) • Merge • Network (Gigabit = 125Mbps) • Shuffle • And/or algorithmic limitations • Sort, (Our) Map, (Our) Reduce, SerDe ©2014 Sqrrl Data, Inc 14
  • 15. PERFORMANCE GOAL MODEL Performance goals obtained through: • Simulation of individual components • Prediction of available resources at runtime ©2014 Sqrrl Data, Inc 15
  • 16. INSTRUMENTATION application version 1.3.3 SYSTEM DATA application sha 8d17baf8 node num 1 input type arcsight yarn.nodemanager.resource.memory-mb 43008 map num containers 20 input block size 32 yarn.scheduler.minimum-allocation-mb 2048 red num containers 20 input block count 20 yarn.scheduler.maximum-allocation-mb 43008 cores physical 12 input total 672054649 yarn.app.mapreduce.am.resource.mb 2048 cores logical 24 output map 9313303723 yarn.app.mapreduce.am.command-opts -Xmx1536m disk num 8 output map:combine input records 243419324 mapreduce.map.memory.mb 2048 disk bandwidth 100 output map:combine records out 209318830 mapreduce.map.java.opts -Xmx1638m replication 1 output map:spill 7325671992 mapreduce.reduce.memory.mb 2048 monitoring TRUE output final 573802787 mapreduce.reduce.java.opts -Xmx1638m output map:combine 7301374577 mapreduce.task.io.sort.mb 100 TIME mapreduce.map.sort.spill.percent 0.8 map:setup avg 8 RATIOS mapreduce.task.io.sort.factor 10 map:map avg 12 input explosion factor 13.877904 mapreduce.reduce.shuffle.parallelcopies 5 map:sort avg 12 compression intermediate 1.003327786 mapreduce.job.reduce.slowstart.completedmaps 1 map:spill avg 12 load combiner output 0.783972562 mapreduce.map.output.compress FALSE map:spill count 7 total ratio 0.786581455 mapred.map.output.compression.codec n/a map:merge avg 46 description baseline map total 290 CONSTANTS red:shuffle avg 6 avg schema entry size (bytes) 59 red:merge avg 38 red:reduce avg 68 effective MB/sec 1.618488025 red:total avg 112 red:reducer count 20 job:total 396 ©2014 Sqrrl Data, Inc 16
  • 17. PERFORMANCE MEASUREMENT Baseline (naive implementation) Reduce Thread Map Thread Map Setup Map Sort Sort Reduce Output Spill Merge Serve Shuffle ©2014 Sqrrl Data, Inc 17
  • 18. PATH TO IMPROVEMENT 1. Profiling revealed much time spent serializing/ deserializing Key 2. With proper configuration, MapReduce supports comparison of keys in serialized form 3. Rewriting Key’s serialization lead to an order-preserving encoding, easy to compare in serialized form 4. Configure MapReduce to use native code to compare Keys 5. Tweak map input size and spill memory for as few spills as possible ©2014 Sqrrl Data, Inc 18
  • 19. PERFORMANCE MEASUREMENT Optimized sorting • Improvements: • Time for map-side merge went down • Sort performance drastically improved in both map and reduce phases • 300% faster ©2014 Sqrrl Data, Inc 19
  • 20. PERFORMANCE MEASUREMENT Optimized sorting Reduce Thread Map Thread Map Setup Map Sort Sort Reduce Output Spill Merge Serve Shuffle Insights: • Map is slower than expected • Output is disk bound maybe we can move more processing to Reduce • “Reverse Amdahl’s law” • Intermediate data inflation ratio (output/input for map) is very high ©2014 Sqrrl Data, Inc 20
  • 21. PATH TO IMPROVEMENT 1. Profiling revealed much time spent copying data 2. Evaluation of data passed from map to reduce revealed inefficiencies: • Constant timestamp cost 8 bytes per key • Repeated column names could be encoded/ compressed • Some Key/Value pairs didn’t need to be created until reduce ©2014 Sqrrl Data, Inc 21
  • 22. PERFORMANCE MEASUREMENT Optimized map code • Improvement: • Big speedup in map function • Twice as fast • Reduced intermediate inflation sped up all steps between map and reduce ©2014 Sqrrl Data, Inc 22
  • 23. DO TRY THIS AT HOME Hints for Accumulo Application Optimization With these steps, we achieved 6X speedup: • Perform comparisons on serialized objects • With Map/Reduce, calculate how many merge steps are needed • Avoid premature data inflation • Leverage compression to shift bottlenecks • Always consider how fast your code should run ©2014 Sqrrl Data, Inc 23
  • 24. SOME CURRENT ACCUMULO PERFORMANCE PROJECTS • Optimize metadata operations • Batch to improve throughput (ACCUMULO-2175, ACCUMULO-2889) • Remove from critical path where possible • Optimize write-ahead log performance • Maximize throughput • Reduce flushes • Parallelize WALs (ACCUMULO-1083) • Avoid downtime by pre-allocating ©2014 Sqrrl Data, Inc 24
  • 25. Securely explore your data SQRRL IS HIRING! QUESTIONS? Chris McCubbin Director of Data Science Sqrrl Data, Inc.