SlideShare uma empresa Scribd logo
1 de 80
Baixar para ler offline
Using Apache Spark for processing
trillions of records each day at
Datadog
Vadim Semenov
Data Engineer @ Datadog
vadim@datadoghq.com
Initial setup
AWS EMR
100-200x r3.8xlarge 32cores, 244GiB, 640GB SSD, 10GbE
3200-6400 cores
24-48 TiB memory
spot instances
spark 1.6 in yarn-cluster mode
scala + RDD API
Initial setup
AWS EMR
100-200x r3.8xlarge 32cores, 244GiB, 640GB SSD, 10GbE
3200-6400 cores
23.5-47 TiB memory
spot instances
spark 1.6 in yarn-cluster mode
scala + RDD API
only 240.23
GiB available
because of Xen
Some initial settings
yarn.nodemanager.resource.memory-mb 240g
yarn.scheduler.maximum-allocation-mb 240g
spark.driver.memory 8g
spark.yarn.driver.memoryOverhead 3g
spark.executor.memory 201g
spark.yarn.executor.memoryOverhead 28g
spark.driver.cores 4
spark.executor.cores 32
spark.executor.extraJavaOptions -XX:+UseG1GC -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p'
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
-XX:ErrorFile=/tmp/hs_err_pid%p.log
Trillion
How big is a trillion?
2^40 = 1,099,511,627,776
2^31 = 2,147,483,648 = Int.MaxValue
a trillion Integers = 4.3 TiB
OOMs
OOMs
OOMs
- java.lang.OutOfMemoryError: Java heap space (Increase heap size)
OOMs
- java.lang.OutOfMemoryError: Java heap space (Increase heap size)
- java.lang.OutOfMemoryError: GC Overhead limit exceeded (Too much
garbage)
OOMs
- java.lang.OutOfMemoryError: Java heap space (Increase heap size)
- java.lang.OutOfMemoryError: GC Overhead limit exceeded (Too much
garbage)
- java.lang.OutOfMemoryError: Direct buffer memory (NIO)
OOMs
- java.lang.OutOfMemoryError: Java heap space (Increase heap size)
- java.lang.OutOfMemoryError: GC Overhead limit exceeded (Too much
garbage)
- java.lang.OutOfMemoryError: Direct buffer memory (NIO)
- YARN Container is running beyond physical memory limits. Killing
container. (Increase memory overhead)
OOMs
- java.lang.OutOfMemoryError: Java heap space (Increase heap size)
- java.lang.OutOfMemoryError: GC Overhead limit exceeded (Too much
garbage)
- java.lang.OutOfMemoryError: Direct buffer memory (NIO)
- YARN Container is running beyond physical memory limits. Killing
container. (Increase memory overhead)
- There is insufficient memory for the Java Runtime Environment to
continue (Add more memory, reduce memory consumption)
The driver must survive
spark.driver.memory 8g 83g
spark.yarn.driver.memoryOverhead 3g 32g
spark.driver.cores 4 15
spark.executor.memory 201g 166g
spark.yarn.executor.memoryOverhead 28g 64g
spark.executor.cores 32 30
IMAGE: TYNE & WEAR ARCHIVES & MUSEUMS
Measure memory usage
https://github.com/etsy/statsd-jvm-profiler
spark.files = /tmp/statsd-jvm-profiler.jar
spark.executor.extraJavaOptions +=
-javaagent:statsd-jvm-profiler.jar=server=localhost,port=8125,profilers=Mem
oryProfiler
Measure memory usage
Measure memory usage
Measure GC
OOMs
- java.lang.OutOfMemoryError: Java heap space (Increase heap size)
- java.lang.OutOfMemoryError: GC Overhead limit exceeded (Too much
garbage)
- java.lang.OutOfMemoryError: Direct buffer memory (NIO)
- YARN Container is running beyond physical memory limits. Killing
container. (Increase memory overhead)
- There is insufficient memory for the Java Runtime Environment to
continue (Add more memory, reduce memory consumption)
Off-heap OOMs
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
…
at parquet.hadoop.codec.…
Off-heap OOMs
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
…
at parquet.hadoop.codec.…
Off-heap memory
Direct Allocated Buffers (NIO): Parquet, MessagePack, …
Java Native Interface (JNI): dynamically-linked native
libraries like libhadoop.so, GZIP, ZLIB, LZ4
sun.misc.Unsafe: org.apache.hadoop.io.nativeio,
org.apache.spark.unsafe
Process memory
$ cat /proc/<spark driver/executor pid>/status
VmPeak: 190317312 kB
VmSize: 190268160 kB
VmHWM: 187586408 kB
VmRSS: 187586408 kB
VmData: 190044492 kB
Process memory
$ cat /proc/<spark driver/executor pid>/status
VmPeak: 190317312 kB
VmSize: 190268160 kB
VmHWM: 187586408 kB
VmRSS: 187586408 kB
VmData: 190044492 kB
Process memory
Solution: let the java-agent get the memory
usage of its process right from the procfs
https://github.com/DataDog/spark-jvm-profiler
Measure memory usage
Measure each executor
OOMs
- java.lang.OutOfMemoryError: Java heap space (Increase heap size)
- java.lang.OutOfMemoryError: GC Overhead limit exceeded (Too much
garbage)
- java.lang.OutOfMemoryError: Direct buffer memory (NIO)
- There is insufficient memory for the Java Runtime Environment to
continue (Add more memory, reduce memory consumption)
- YARN Container is running beyond physical memory limits. Killing
container. (Increase memory overhead)
Lessons
- Give more resources than you think you
would need, and then reduce
Lessons
- Give more resources than you think you
would need, and then reduce
- Measure memory usage of each executor
Lessons
- Give more resources than you think you
would need, and then reduce
- Measure memory usage of each executor
- Keep an eye on your GC metrics
Measure slow parts
val timer = MaxAndTotalTimeAccumulator
rdd.map(key => {
val startTime = System.nanoTime()
...
val endTime = System.nanoTime()
val millisecondsPassed = ((endTime - startTime) / 1000000).toInt
timer.add(millisecondsPassed)
})
Watch skewed parts
.groupByKey().flatMap({ case (key, iter) =>
val size = iter.size
maxAccumulator.add(key, size)
if (size >= 100,000,000) {
log.info(s"Key $key has $size values")
None
} else {
Report accumulators per partition
sc.addSparkListener(new SparkListener {
override def onTaskEnd(
taskEnd: SparkListenerTaskEnd
): Unit =
Option(taskEnd.taskMetrics)
.foreach(taskMetrics => … )
})
Collect executors metrics
Lessons
- Measure slowest parts of your job
Lessons
- Measure slowest parts of your job
- Count records in the most skewed parts
Lessons
- Measure slowest parts of your job
- Count records in the most skewed parts
- Keep track of how much CPU time your job
actually consumes
Lessons
- Measure slowest parts of your job
- Count records in the most skewed parts
- Keep track of how much CPU time your job
actually consumes
- Have some alerting on these metrics, so you
would know that your job gets slower
Spot instances
Spot instances mitigation
- Break the job into smaller survivable pieces
Spot instances mitigation
- Break the job into smaller survivable pieces
- Use `rdd.checkpoint` instead of `rdd.persist`
to save data to HDFS
Spot instances mitigation
- Break the job into smaller survivable pieces
- Use `rdd.checkpoint` instead of `rdd.persist`
to save data to HDFS
- Helps dynamic allocation since executors don't hold
any data, so they can leave the job and join other
jobs
Spot instances mitigation
- Break the job into smaller survivable pieces
- Use `rdd.checkpoint` instead of `rdd.persist`
to save data to HDFS
- Helps dynamic allocation since executors don't hold
any data, so they can leave the job and join other
jobs
- Losing multiple executors won't result in
recomputing partitions
ExternalShuffleService
Ex1 1 2 3 4
Ex2
Ex3
Driver
ExternalShuffleService
Ex1 1 2 3 4
Ex2
Ex3
Driver
ExternalShuffleService
1 2 3 4
Ex2
Ex3
Driver
ExternalShuffleService
1 2 3 4
Ex2
Ex3
Driver
ExternalShuffleService
1 2 3 4
Ex2
Ex3
Driver
ExternalShuffleService
1 2 3 4
Ex2
Ex3
Driver
ERROR o.a.s.n.shuffle.RetryingBlockFetcher:
Exception while beginning fetch of 13
outstanding blocks
java.io.IOException: Failed to connect to
ip-10-12-32-67.us-west-2.compute.internal/1
0.12.32.67:7337
ExternalShuffleService
2 3 4
Ex2
Ex3
Driver
1
ExternalShuffleService
2 3 4
Ex2
Ex3
Driver
1
ERROR o.a.s.n.shuffle.RetryingBlockFetcher:
Exception while beginning fetch of 13
outstanding blocks
java.io.IOException: Failed to connect to
ip-10-12-32-67.us-west-2.compute.internal/1
0.12.32.67:7337
ExternalShuffleService
3 4
Ex2
Ex3
Driver
1
2
ExternalShuffleService
SPARK-19753 Remove all shuffle files on a host in case
of slave lost of fetch failure
SPARK-20832 Standalone master should explicitly inform
drivers of worker deaths and invalidate external shuffle
service outputs
Other FetchFailures
SPARK-20178 Improve Scheduler fetch failures
Keep an eye on failed tasks
Lessons
- Keep all logs
Lessons
- Keep all logs
- Spark isn't super-resilient even when one
node dies
Lessons
- Keep all logs
- Spark isn't super-resilient even when one
node dies
- Monitor the number of failed
tasks/stages/lost nodes
Late arriving partitions
rddA.cogroup(rddB, rddC).map({ case (k, iterA, iterB, iterC) =>
// We should always have one-to-one join, but who knows …
if (iterA.toSet.size() > 1)
throw new RuntimeException(s"Key $k received more than 1 A record")
if (iterB.toSet.size() > 1)
throw new RuntimeException(…)
if (iterC.toSet.size() > 1) …
Late arriving partitions
rddA.cogroup(rddB, rddC).map({ case (k, iterA, iterB, iterC) =>
// We should always have one-to-one join, but who knows …
if (iterA.toSet.size() > 1)
throw new RuntimeException(s"Key $k received more than 1 A record")
if (iterB.toSet.size() > 1)
throw new RuntimeException(…)
if (iterC.toSet.size() > 1) …
Late arriving partitions
rddA.cogroup(rddB, rddC).map({ case (k, iterA, iterB, iterC) =>
// We should always have one-to-one join, but who knows …
if (iterA.toSet.size() > 1)
throw new RuntimeException(s"Key $k received more than 1 A record")
if (iterB.toSet.size() > 1)
throw new RuntimeException(…)
if (iterC.toSet.size() > 1) …
Late arriving partitions
.map({ case (key, values: Iterator[(Long, Int)]) =>
values.toList.sortBy(_._1)
// (1L, 10), (1L, 1), (2L, 1)
// (1L, 1), (1L, 10), (2L, 1)
})
SPARK-19263 DAGScheduler should avoid sending
conflicting task set
Late arriving partitions
.map({ case (key, values: Iterator[(Long, Int)]) =>
values.toList.sorted
// (1L, 1), (1L, 10), (2L, 1)
})
Lessons
- Trust but put extra checks and log everything
Lessons
- Trust but put extra checks and log everything
- Add extra idempotency even if it should be there
Lessons
- Trust but put extra checks and log everything
- Add extra idempotency even if it should be there
- Fail the job if some unexpected situation is
encountered, but also think ahead of time if such
situations can be overcome
Lessons
- Trust but put extra checks and log everything
- Add extra idempotency even if it should be there
- Fail the job if some unexpected situation is
encountered, but also think ahead of time if such
situations can be overcome
- Have retries on the pipeline scheduler level
Migration to Spark 2
SPARK-13850 TimSort Comparison method violates its general contract
SPARK-14560 Cooperative Memory Management for Spillables
SPARK-14363 Executor OOM due to a memory leak in Sorter
SPARK-14560 Cooperative Memory Management for Spillables
SPARK-22033 BufferHolder, other size checks should account for the specific VM
array size limitations
Lessons
- Check the bug tracker periodically
Lessons
- Check the bug tracker periodically
- Subscribe to mailing lists
Lessons
- Check the bug tracker periodically
- Subscribe to mailing lists
- Participate in discussing issues
In conclusion
- Log everything (driver/executors,
NodeManagers, GC)
In conclusion
- Log everything
- Measure everything (heap/off-heap, GC,
executors cpu, failed tasks/stages, slow
parts, skewed parts)
In conclusion
- Log everything
- Measure everything
- Trust but be ready
In conclusion
- Log everything
- Measure everything
- Trust but be ready
- Smaller survivable pieces
Thanks!
Want to work with us on Spark, Kafka, ES, and
more? Come to our booth!
jobs.datadoghq.com
twitter.com/@databuryat
_@databuryat.com
vadim@datadoghq.com

Mais conteúdo relacionado

Mais procurados

Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache SparkDatabricks
 
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化Ryosuke Uchitate
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkIlya Ganelin
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsSpark Summit
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Linux Binary Exploitation - Heap Exploitation
Linux Binary Exploitation - Heap Exploitation Linux Binary Exploitation - Heap Exploitation
Linux Binary Exploitation - Heap Exploitation Angel Boy
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaSpark Summit
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Linux Binary Exploitation - Stack buffer overflow
Linux Binary Exploitation - Stack buffer overflowLinux Binary Exploitation - Stack buffer overflow
Linux Binary Exploitation - Stack buffer overflowAngel Boy
 
Linux Binary Exploitation - Return-oritend Programing
Linux Binary Exploitation - Return-oritend ProgramingLinux Binary Exploitation - Return-oritend Programing
Linux Binary Exploitation - Return-oritend ProgramingAngel Boy
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...Databricks
 
Extending Spark With Java Agent (handout)
Extending Spark With Java Agent (handout)Extending Spark With Java Agent (handout)
Extending Spark With Java Agent (handout)Jaroslav Bachorik
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsdatamantra
 

Mais procurados (20)

Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
Micrometerでメトリクスを収集してAmazon CloudWatchで可視化
 
Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Linux Binary Exploitation - Heap Exploitation
Linux Binary Exploitation - Heap Exploitation Linux Binary Exploitation - Heap Exploitation
Linux Binary Exploitation - Heap Exploitation
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted MalaskaTop 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
Top 5 Mistakes When Writing Spark Applications by Mark Grover and Ted Malaska
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Linux Binary Exploitation - Stack buffer overflow
Linux Binary Exploitation - Stack buffer overflowLinux Binary Exploitation - Stack buffer overflow
Linux Binary Exploitation - Stack buffer overflow
 
Linux Binary Exploitation - Return-oritend Programing
Linux Binary Exploitation - Return-oritend ProgramingLinux Binary Exploitation - Return-oritend Programing
Linux Binary Exploitation - Return-oritend Programing
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
 
Extending Spark With Java Agent (handout)
Extending Spark With Java Agent (handout)Extending Spark With Java Agent (handout)
Extending Spark With Java Agent (handout)
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 

Semelhante a Using apache spark for processing trillions of records each day at Datadog

Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideIBM
 
Tuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsTuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsSamir Bessalah
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystifiedOmid Vahdaty
 
Spark performance tuning eng
Spark performance tuning engSpark performance tuning eng
Spark performance tuning enghaiteam
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Ontico
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDatabricks
 
Adobe AEM Maintenance - Customer Care Office Hours
Adobe AEM Maintenance - Customer Care Office HoursAdobe AEM Maintenance - Customer Care Office Hours
Adobe AEM Maintenance - Customer Care Office HoursAndrew Khoury
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsSematext Group, Inc.
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Ontico
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas EricssonOSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas EricssonNETWAYS
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsSerge Smetana
 
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...Chris Fregly
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...IndicThreads
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit
 

Semelhante a Using apache spark for processing trillions of records each day at Datadog (20)

Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
 
Tuning tips for Apache Spark Jobs
Tuning tips for Apache Spark JobsTuning tips for Apache Spark Jobs
Tuning tips for Apache Spark Jobs
 
Emr spark tuning demystified
Emr spark tuning demystifiedEmr spark tuning demystified
Emr spark tuning demystified
 
Spark performance tuning eng
Spark performance tuning engSpark performance tuning eng
Spark performance tuning eng
 
Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)Performance tweaks and tools for Linux (Joe Damato)
Performance tweaks and tools for Linux (Joe Damato)
 
Os Gopal
Os GopalOs Gopal
Os Gopal
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
 
Adobe AEM Maintenance - Customer Care Office Hours
Adobe AEM Maintenance - Customer Care Office HoursAdobe AEM Maintenance - Customer Care Office Hours
Adobe AEM Maintenance - Customer Care Office Hours
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)Kafka to the Maxka - (Kafka Performance Tuning)
Kafka to the Maxka - (Kafka Performance Tuning)
 
Spark 101
Spark 101Spark 101
Spark 101
 
Tuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for LogsTuning Elasticsearch Indexing Pipeline for Logs
Tuning Elasticsearch Indexing Pipeline for Logs
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas EricssonOSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
 
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...Advanced Apache Spark Meetup:  How Spark Beat Hadoop @ 100 TB Daytona GraySor...
Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...
 
Apache Flink Hands On
Apache Flink Hands OnApache Flink Hands On
Apache Flink Hands On
 
Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...Best Practices for performance evaluation and diagnosis of Java Applications ...
Best Practices for performance evaluation and diagnosis of Java Applications ...
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 

Último

How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 

Último (20)

How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Using apache spark for processing trillions of records each day at Datadog

  • 1. Using Apache Spark for processing trillions of records each day at Datadog Vadim Semenov Data Engineer @ Datadog vadim@datadoghq.com
  • 2.
  • 3.
  • 4.
  • 5. Initial setup AWS EMR 100-200x r3.8xlarge 32cores, 244GiB, 640GB SSD, 10GbE 3200-6400 cores 24-48 TiB memory spot instances spark 1.6 in yarn-cluster mode scala + RDD API
  • 6. Initial setup AWS EMR 100-200x r3.8xlarge 32cores, 244GiB, 640GB SSD, 10GbE 3200-6400 cores 23.5-47 TiB memory spot instances spark 1.6 in yarn-cluster mode scala + RDD API only 240.23 GiB available because of Xen
  • 7. Some initial settings yarn.nodemanager.resource.memory-mb 240g yarn.scheduler.maximum-allocation-mb 240g spark.driver.memory 8g spark.yarn.driver.memoryOverhead 3g spark.executor.memory 201g spark.yarn.executor.memoryOverhead 28g spark.driver.cores 4 spark.executor.cores 32 spark.executor.extraJavaOptions -XX:+UseG1GC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:OnOutOfMemoryError='kill -9 %p' -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp -XX:ErrorFile=/tmp/hs_err_pid%p.log
  • 8. Trillion How big is a trillion? 2^40 = 1,099,511,627,776 2^31 = 2,147,483,648 = Int.MaxValue a trillion Integers = 4.3 TiB
  • 10. OOMs
  • 11. OOMs - java.lang.OutOfMemoryError: Java heap space (Increase heap size)
  • 12. OOMs - java.lang.OutOfMemoryError: Java heap space (Increase heap size) - java.lang.OutOfMemoryError: GC Overhead limit exceeded (Too much garbage)
  • 13. OOMs - java.lang.OutOfMemoryError: Java heap space (Increase heap size) - java.lang.OutOfMemoryError: GC Overhead limit exceeded (Too much garbage) - java.lang.OutOfMemoryError: Direct buffer memory (NIO)
  • 14. OOMs - java.lang.OutOfMemoryError: Java heap space (Increase heap size) - java.lang.OutOfMemoryError: GC Overhead limit exceeded (Too much garbage) - java.lang.OutOfMemoryError: Direct buffer memory (NIO) - YARN Container is running beyond physical memory limits. Killing container. (Increase memory overhead)
  • 15. OOMs - java.lang.OutOfMemoryError: Java heap space (Increase heap size) - java.lang.OutOfMemoryError: GC Overhead limit exceeded (Too much garbage) - java.lang.OutOfMemoryError: Direct buffer memory (NIO) - YARN Container is running beyond physical memory limits. Killing container. (Increase memory overhead) - There is insufficient memory for the Java Runtime Environment to continue (Add more memory, reduce memory consumption)
  • 16. The driver must survive spark.driver.memory 8g 83g spark.yarn.driver.memoryOverhead 3g 32g spark.driver.cores 4 15 spark.executor.memory 201g 166g spark.yarn.executor.memoryOverhead 28g 64g spark.executor.cores 32 30
  • 17. IMAGE: TYNE & WEAR ARCHIVES & MUSEUMS
  • 18. Measure memory usage https://github.com/etsy/statsd-jvm-profiler spark.files = /tmp/statsd-jvm-profiler.jar spark.executor.extraJavaOptions += -javaagent:statsd-jvm-profiler.jar=server=localhost,port=8125,profilers=Mem oryProfiler
  • 22. OOMs - java.lang.OutOfMemoryError: Java heap space (Increase heap size) - java.lang.OutOfMemoryError: GC Overhead limit exceeded (Too much garbage) - java.lang.OutOfMemoryError: Direct buffer memory (NIO) - YARN Container is running beyond physical memory limits. Killing container. (Increase memory overhead) - There is insufficient memory for the Java Runtime Environment to continue (Add more memory, reduce memory consumption)
  • 23. Off-heap OOMs java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) … at parquet.hadoop.codec.…
  • 24. Off-heap OOMs java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) … at parquet.hadoop.codec.…
  • 25. Off-heap memory Direct Allocated Buffers (NIO): Parquet, MessagePack, … Java Native Interface (JNI): dynamically-linked native libraries like libhadoop.so, GZIP, ZLIB, LZ4 sun.misc.Unsafe: org.apache.hadoop.io.nativeio, org.apache.spark.unsafe
  • 26. Process memory $ cat /proc/<spark driver/executor pid>/status VmPeak: 190317312 kB VmSize: 190268160 kB VmHWM: 187586408 kB VmRSS: 187586408 kB VmData: 190044492 kB
  • 27. Process memory $ cat /proc/<spark driver/executor pid>/status VmPeak: 190317312 kB VmSize: 190268160 kB VmHWM: 187586408 kB VmRSS: 187586408 kB VmData: 190044492 kB
  • 28. Process memory Solution: let the java-agent get the memory usage of its process right from the procfs https://github.com/DataDog/spark-jvm-profiler
  • 31. OOMs - java.lang.OutOfMemoryError: Java heap space (Increase heap size) - java.lang.OutOfMemoryError: GC Overhead limit exceeded (Too much garbage) - java.lang.OutOfMemoryError: Direct buffer memory (NIO) - There is insufficient memory for the Java Runtime Environment to continue (Add more memory, reduce memory consumption) - YARN Container is running beyond physical memory limits. Killing container. (Increase memory overhead)
  • 32. Lessons - Give more resources than you think you would need, and then reduce
  • 33. Lessons - Give more resources than you think you would need, and then reduce - Measure memory usage of each executor
  • 34. Lessons - Give more resources than you think you would need, and then reduce - Measure memory usage of each executor - Keep an eye on your GC metrics
  • 35. Measure slow parts val timer = MaxAndTotalTimeAccumulator rdd.map(key => { val startTime = System.nanoTime() ... val endTime = System.nanoTime() val millisecondsPassed = ((endTime - startTime) / 1000000).toInt timer.add(millisecondsPassed) })
  • 36. Watch skewed parts .groupByKey().flatMap({ case (key, iter) => val size = iter.size maxAccumulator.add(key, size) if (size >= 100,000,000) { log.info(s"Key $key has $size values") None } else {
  • 37. Report accumulators per partition sc.addSparkListener(new SparkListener { override def onTaskEnd( taskEnd: SparkListenerTaskEnd ): Unit = Option(taskEnd.taskMetrics) .foreach(taskMetrics => … ) })
  • 39. Lessons - Measure slowest parts of your job
  • 40. Lessons - Measure slowest parts of your job - Count records in the most skewed parts
  • 41. Lessons - Measure slowest parts of your job - Count records in the most skewed parts - Keep track of how much CPU time your job actually consumes
  • 42. Lessons - Measure slowest parts of your job - Count records in the most skewed parts - Keep track of how much CPU time your job actually consumes - Have some alerting on these metrics, so you would know that your job gets slower
  • 44. Spot instances mitigation - Break the job into smaller survivable pieces
  • 45. Spot instances mitigation - Break the job into smaller survivable pieces - Use `rdd.checkpoint` instead of `rdd.persist` to save data to HDFS
  • 46. Spot instances mitigation - Break the job into smaller survivable pieces - Use `rdd.checkpoint` instead of `rdd.persist` to save data to HDFS - Helps dynamic allocation since executors don't hold any data, so they can leave the job and join other jobs
  • 47. Spot instances mitigation - Break the job into smaller survivable pieces - Use `rdd.checkpoint` instead of `rdd.persist` to save data to HDFS - Helps dynamic allocation since executors don't hold any data, so they can leave the job and join other jobs - Losing multiple executors won't result in recomputing partitions
  • 48. ExternalShuffleService Ex1 1 2 3 4 Ex2 Ex3 Driver
  • 49. ExternalShuffleService Ex1 1 2 3 4 Ex2 Ex3 Driver
  • 50. ExternalShuffleService 1 2 3 4 Ex2 Ex3 Driver
  • 51. ExternalShuffleService 1 2 3 4 Ex2 Ex3 Driver
  • 52. ExternalShuffleService 1 2 3 4 Ex2 Ex3 Driver
  • 53. ExternalShuffleService 1 2 3 4 Ex2 Ex3 Driver ERROR o.a.s.n.shuffle.RetryingBlockFetcher: Exception while beginning fetch of 13 outstanding blocks java.io.IOException: Failed to connect to ip-10-12-32-67.us-west-2.compute.internal/1 0.12.32.67:7337
  • 55. ExternalShuffleService 2 3 4 Ex2 Ex3 Driver 1 ERROR o.a.s.n.shuffle.RetryingBlockFetcher: Exception while beginning fetch of 13 outstanding blocks java.io.IOException: Failed to connect to ip-10-12-32-67.us-west-2.compute.internal/1 0.12.32.67:7337
  • 57. ExternalShuffleService SPARK-19753 Remove all shuffle files on a host in case of slave lost of fetch failure SPARK-20832 Standalone master should explicitly inform drivers of worker deaths and invalidate external shuffle service outputs
  • 58. Other FetchFailures SPARK-20178 Improve Scheduler fetch failures
  • 59. Keep an eye on failed tasks
  • 61. Lessons - Keep all logs - Spark isn't super-resilient even when one node dies
  • 62. Lessons - Keep all logs - Spark isn't super-resilient even when one node dies - Monitor the number of failed tasks/stages/lost nodes
  • 63. Late arriving partitions rddA.cogroup(rddB, rddC).map({ case (k, iterA, iterB, iterC) => // We should always have one-to-one join, but who knows … if (iterA.toSet.size() > 1) throw new RuntimeException(s"Key $k received more than 1 A record") if (iterB.toSet.size() > 1) throw new RuntimeException(…) if (iterC.toSet.size() > 1) …
  • 64. Late arriving partitions rddA.cogroup(rddB, rddC).map({ case (k, iterA, iterB, iterC) => // We should always have one-to-one join, but who knows … if (iterA.toSet.size() > 1) throw new RuntimeException(s"Key $k received more than 1 A record") if (iterB.toSet.size() > 1) throw new RuntimeException(…) if (iterC.toSet.size() > 1) …
  • 65. Late arriving partitions rddA.cogroup(rddB, rddC).map({ case (k, iterA, iterB, iterC) => // We should always have one-to-one join, but who knows … if (iterA.toSet.size() > 1) throw new RuntimeException(s"Key $k received more than 1 A record") if (iterB.toSet.size() > 1) throw new RuntimeException(…) if (iterC.toSet.size() > 1) …
  • 66. Late arriving partitions .map({ case (key, values: Iterator[(Long, Int)]) => values.toList.sortBy(_._1) // (1L, 10), (1L, 1), (2L, 1) // (1L, 1), (1L, 10), (2L, 1) }) SPARK-19263 DAGScheduler should avoid sending conflicting task set
  • 67. Late arriving partitions .map({ case (key, values: Iterator[(Long, Int)]) => values.toList.sorted // (1L, 1), (1L, 10), (2L, 1) })
  • 68. Lessons - Trust but put extra checks and log everything
  • 69. Lessons - Trust but put extra checks and log everything - Add extra idempotency even if it should be there
  • 70. Lessons - Trust but put extra checks and log everything - Add extra idempotency even if it should be there - Fail the job if some unexpected situation is encountered, but also think ahead of time if such situations can be overcome
  • 71. Lessons - Trust but put extra checks and log everything - Add extra idempotency even if it should be there - Fail the job if some unexpected situation is encountered, but also think ahead of time if such situations can be overcome - Have retries on the pipeline scheduler level
  • 72. Migration to Spark 2 SPARK-13850 TimSort Comparison method violates its general contract SPARK-14560 Cooperative Memory Management for Spillables SPARK-14363 Executor OOM due to a memory leak in Sorter SPARK-14560 Cooperative Memory Management for Spillables SPARK-22033 BufferHolder, other size checks should account for the specific VM array size limitations
  • 73. Lessons - Check the bug tracker periodically
  • 74. Lessons - Check the bug tracker periodically - Subscribe to mailing lists
  • 75. Lessons - Check the bug tracker periodically - Subscribe to mailing lists - Participate in discussing issues
  • 76. In conclusion - Log everything (driver/executors, NodeManagers, GC)
  • 77. In conclusion - Log everything - Measure everything (heap/off-heap, GC, executors cpu, failed tasks/stages, slow parts, skewed parts)
  • 78. In conclusion - Log everything - Measure everything - Trust but be ready
  • 79. In conclusion - Log everything - Measure everything - Trust but be ready - Smaller survivable pieces
  • 80. Thanks! Want to work with us on Spark, Kafka, ES, and more? Come to our booth! jobs.datadoghq.com twitter.com/@databuryat _@databuryat.com vadim@datadoghq.com