SlideShare a Scribd company logo
1 of 38
Download to read offline
Stage Level Scheduling
Improving Big Data and
AI integration
Thomas Graves
Software Engineer at NVIDIA
Spark PMC
Agenda
§ Resource Scheduling
§ Stage Level Scheduling
§ Use Case Example
§ Demo
Resource Scheduling On Spark
Resource Scheduling
• Driver
• Cores
• Memory
• Accelerators (GPU/FPGA/etc)
• Executors
• Cores
• Memory (overhead, pyspark, heap, offheap)
• Accelerators (GPU/FPGA/etc)
• Tasks (requirements)
• CPUs
• Accelerators (GPU/FPGA/etc)
Resource Scheduling
• Tasks Per Executor
• Executor Resources / Task Requirements
• Configs
spark.driver.cores=1
spark.executor.cores=4
spark.task.cpus=1
spark.driver.memory=4g
spark.executor.memory=4g
spark.executor.memoryOverhead=2g
spark.driver.resource.gpu.amount=1
spark.driver.resource.gpu.discoveryScript=./getGpuResources.sh
spark.executor.resource.gpu.amount=1
spark.executor.resource.gpu.discoveryScript=./getGpuResources.sh
spark.task.resource.gpu.amount=0.25
Stage Level Scheduling
Overview
Spark ETL Stage Spark ML Stage
NODE NODE
GPU
CPU
CPU
Stage Level Scheduling
• Stage level resource scheduling (SPARK-27495)
• Specify resource requirements per RDD operation
• Spark dynamically allocates containers to meet resource requirements
• Spark schedules tasks on appropriate containers
• Benefits
• Hardware utilization and cost
• Ease of programming
• Application no longer required split ETL and Deep Learning into separate
applications
• Pipeline simplification
Use Cases
• Beneficial any time the user wants to change container resources between
stages in a single Spark application
• ETL to Deep Learning
• Skewed data
• Data size large in certain stages
• Jobs that use caching, switch to higher memory containers during those
stages
Resources Supported
• Executor Resources
• Cores
• Heap Memory
• OffHeap Memory
• Pyspark Memory
• Memory Overhead
• Additional Resources (GPUs, etc)
• Task Resources
• CPUs
• Additional Resources (GPUs, etc)
Requirements
• Spark 3.1.1
• Dynamic Allocation with External Shuffle Service or Shuffle tracking
enabled
• YARN and Kubernetes
• RDD API only
• Scala, Java, Python
Implementation Details
• New container acquired with new ResourceProfile
• Does NOT try to fit into existing container with different ResourceProfile
(Future Enhancement)
• Unused containers idle timeout
• Default to one ResourceProfile per stage
• Config to allow multiple ResourceProfiles per stage
• Multiple profiles will be merged with simple max of each resource
YARN Implementation Details
• External Shuffle Service and Dynamic Allocation
• YARN Container Priority – ResourceProfile Id becomes container priority
• YARN lower numbers are higher priority
• Job Server type scenario that may come into affect
• GPU and FPGA predefined, other resources require additional
configurations
• Custom resources via spark.yarn.executor.resource.* only apply in default
profile – do not propogate because no way to override
• Discovery script must be accessible – sent with job submission
Kubernetes Implementation Details
• Requires shuffle tracking enabled
(spark.dynamicAllocaiton.shuffleTracking.enabled)
• May not idel timeout if have shuffle data on the node
• Result in more cluster resource used
• spark.dynamicAllocaiton.shuffleTracking.timeout
• Pod Template Behavior
• Resource in Pod Template only used in default profile
• Specify all resources needed in the ResourceProfile
UI Screen Shots
--executor-cores 2 --conf spark.executor.resource.gpu.amount=1 --conf spark.task.resource.gpu.amount=0.5
UI Screen Shots
API
> import org.apache.spark.resource.{ExecutorResourceRequests, ResourceProfileBuilder,
TaskResourceRequests}
> val rpb = new ResourceProfileBuilder()
> val ereq = new ExecutorResourceRequests()
> val treq = new TaskResourceRequests()
> ereq.cores(4).memory("6g”).memoryOverhead("2g”).resource("gpu", 2, "./getGpus")
> treq.cpus(4).resource("gpu", 2)
> rpb.require(ereq)
> rpb.require(treq)
> val rp = rpb.build()
// use the ResourceProfile with the RDD
> val mlRdd = df.rdd.withResources(rp)
> mlRdd.mapPartitions { x =>
// feed data into ML and get result
}.collect()
UI Screen Shots
UI Screen Shots
API
> rpb
Profile executor resources: ArrayBuffer(memoryOverhead=name: memoryOverhead, amount:
2048, script: , vendor: , cores=name: cores, amount: 4, script: , vendor: , memory=name:
memory, amount: 6144, script: , vendor: , gpu=name: gpu, amount: 2, script: ./getGpus,
vendor: ), task resources: ArrayBuffer(cpus=name: cpus, amount: 4.0, gpu=name: gpu,
amount: 2.0)
> mlRdd.getResourceProfile
: org.apache.spark.resource.ResourceProfile = Profile: id = 1, executor resources:
memoryOverhead -> name: memoryOverhead, amount: 2048, script: , vendor: ,cores -> name:
cores, amount: 4, script: , vendor: ,memory -> name: memory, amount: 6144, script: ,
vendor: ,gpu -> name: gpu, amount: 2, script: ./getGpus, vendor: , task resources: cpus
-> name: cpus, amount: 4.0,gpu -> name: gpu, amount: 2.0
API - Mutable vs Immutable
> ereq.cores(2).memory("6g”).memoryOverhead("2g”).resource("gpu", 2, "./getGpus")
> treq.cpus(1).resource("gpu", 1)
> rpb.require(ereq).require(treq)
> val rp = rpb.build()
> rp
: org.apache.spark.resource.ResourceProfile = Profile: id = 2, executor resources: memoryOverhead ->
name: memoryOverhead, amount: 2048, script: , vendor: ,cores -> name: cores, amount: 2, script: , vendor:
,memory -> name: memory, amount: 6144, script: , vendor: ,gpu -> name: gpu, amount: 2, script: ./getGpus,
vendor: , task resources: cpus -> name: cpus, amount: 1.0,gpu -> name: gpu, amount: 1.0
> treq.cpus(2).resource("gpu", 2)
> rpb.require(treq)
> val rpNew = rpb.build()
> rpNew
: org.apache.spark.resource.ResourceProfile = Profile: id = 3, executor resources: memoryOverhead ->
name: memoryOverhead, amount: 2048, script: , vendor: ,cores -> name: cores, amount: 2, script: , vendor:
,memory -> name: memory, amount: 6144, script: , vendor: ,gpu -> name: gpu, amount: 2, script: ./getGpus,
vendor: , task resources: cpus -> name: cpus, amount: 2.0,gpu -> name: gpu, amount: 2.0
Use Case Example
End to End Pipeline
ETL Using Rapids Accelerator For Spark
Rapids Accelerator For Spark
• Run Spark on a GPU to accelerate processing
• combines the power of the RAPIDS cuDF library and the scale of the Spark distributed
computing framework
• Spark SQL and DataFrames
• Requires Spark 3.0+
• No user code changes
• If operation not supported, run on CPU like normal
• built-in accelerated shuffle based on UCX that can be configured to
leverage GPU-to-GPU communication and RDMA capabilities’
ETL Technology Stack
Dask cuDF
cuDF, Pandas
Python
Cython
cuDF C++
CUDA Libraries
CUDA
Java
JNI bindings
Spark dataframes,
Scala, PySpark
Rapids Accelerator For Apache Spark (Plugin)
DISTRIBUTED SCALE-OUT SPARK APPLICATIONS
APACHE SPARK CORE
RAPIDS
Accelerator
for Spark
Spark SQL API DataFrame API Spark Shuffle
if gpu_enabled(operation, data_type)
call-out to RAPIDS
else
execute standard Spark operation
● Custom Implementation of Spark
Shuffle
● Optimized to use RDMA and GPU-
to-GPU direct communication
JNI bindings
Mapping From Java/Scala to C++
RAPIDS C++ Libraries UCX Libraries
CUDA
JNI bindings
Mapping From Java/Scala to C++
Spark SQL & Dataframe Compilation Flow
DataFrame
Logical Plan
Physical Plan
bar.groupBy(
col(”product_id”),
col(“ds”))
.agg(
maxcol(“price”)) -
min(col(“p(rice”)).alias(“range”))
SELECT product_id, ds,
max(price) – min(price) AS
range FROM bar GROUP BY
product_id, ds
QUERY
GPU
PHYSICAL
PLAN
GPU Physical Plan
RAPIDS SQL
Plugin
RDD[InternalRow]
RDD[ColumnarBatch]
NDS Query 38 Results
Entire query is GPU accelerated
CPU Cluster: Driver: 1 x m5dn.large;
Workers: 8 x m5dn.2xlarge
On-demand cluster cost (US West): $4.488/hr
GPU Cluster: Driver: 1 x m5dn.large;
Workers: 8 x g4dn.2xlarge
On-demand cluster cost (US West): $6.152/hr
163.0
53.2
0.0
40.0
80.0
120.0
160.0
200.0
CPU: 8 x m5dn.2xlarge
(64-core 256GB)
GPU: 8 x g4dn.2xlarge
(64-core 256GB 8xT4
GPU)
Time
(secs)
Query Time
$0.20
$0.09
$0.00
$0.05
$0.10
$0.15
$0.20
$0.25
CPU: 8 x m5dn.2xlarge
(64-core 256GB)
GPU: 8 x g4dn.2xlarge
(64-core 256GB 8xT4 GPU)
Total Costs
3X Speed-up 55% Cost Saving
Deep Learning
Horovod Introduction
• Distributed Deep learning training framework
• TensorFlow, Keras, PyTorch, Apache MXNet
• High Performance features
• NCCL< GpuDirect, RDMA, tensor fusion
• Easy to use
• Just 5 lines of Python
• Open Source
• Linux Foundation AI Foundation
• Easy to install
• pip install horovod
horovod.ai
Demo
End to End Horovod Demo
Future Enhancements
Future Enhancements
• Collect feedback from users
• Allow setting certain configs – like dynamic allocation
• Fitting new ResourceProfiles into existing containers
• Better cleanup of ResourceProfiles
• Catalyst internally
Other Performance Enhancements
Other Enhancements
• Pluggable Caching
• Allows developers to try different caching solutions
• Custom GPU implementation
Questions
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot

Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache SparkDatabricks
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Chris Fregly
 
Portable UDFs: Write Once, Run Anywhere
Portable UDFs: Write Once, Run AnywherePortable UDFs: Write Once, Run Anywhere
Portable UDFs: Write Once, Run AnywhereDatabricks
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsDatabricks
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLScyllaDB
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframeJaemun Jung
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDatabricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Spark overview
Spark overviewSpark overview
Spark overviewLisa Hua
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideWhizlabs
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationshadooparchbook
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySparkRussell Jurney
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQLDatabricks
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsCloudera, Inc.
 

What's hot (20)

Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
Advanced Apache Spark Meetup Project Tungsten Nov 12 2015
 
Portable UDFs: Write Once, Run Anywhere
Portable UDFs: Write Once, Run AnywherePortable UDFs: Write Once, Run Anywhere
Portable UDFs: Write Once, Run Anywhere
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Apache spark
Apache sparkApache spark
Apache spark
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
High-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQLHigh-speed Database Throughput Using Apache Arrow Flight SQL
High-speed Database Throughput Using Apache Arrow Flight SQL
 
PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
 
Deep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.xDeep Dive into GPU Support in Apache Spark 3.x
Deep Dive into GPU Support in Apache Spark 3.x
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Spark overview
Spark overviewSpark overview
Spark overview
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 
Top 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applicationsTop 5 mistakes when writing Spark applications
Top 5 mistakes when writing Spark applications
 
Introduction to PySpark
Introduction to PySparkIntroduction to PySpark
Introduction to PySpark
 
Physical Plans in Spark SQL
Physical Plans in Spark SQLPhysical Plans in Spark SQL
Physical Plans in Spark SQL
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 

Similar to Stage Level Scheduling Improving Big Data and AI Integration

Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPDatabricks
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Databricks
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit
 
SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfChester Chen
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideIBM
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkJen Aman
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...Chester Chen
 
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...Holden Karau
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkVMware Tanzu
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAlluxio, Inc.
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Databricks
 
Spark on Yarn
Spark on YarnSpark on Yarn
Spark on YarnQubole
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in productionParis Data Engineers !
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
End-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkEnd-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkDatabricks
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Djamel Zouaoui
 

Similar to Stage Level Scheduling Improving Big Data and AI Integration (20)

Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDPBuild Large-Scale Data Analytics and AI Pipeline Using RayDP
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 
SFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdfSFBigAnalytics_SparkRapid_20220622.pdf
SFBigAnalytics_SparkRapid_20220622.pdf
 
Spark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting GuideSpark 2.x Troubleshooting Guide
Spark 2.x Troubleshooting Guide
 
Deploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using SparkDeploying Accelerators At Datacenter Scale Using Spark
Deploying Accelerators At Datacenter Scale Using Spark
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...A really really fast introduction to PySpark - lightning fast cluster computi...
A really really fast introduction to PySpark - lightning fast cluster computi...
 
Mixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache SparkMixing Analytic Workloads with Greenplum and Apache Spark
Mixing Analytic Workloads with Greenplum and Apache Spark
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Spark on YARN
Spark on YARNSpark on YARN
Spark on YARN
 
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and AlluxioAdvancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
Advancing GPU Analytics with RAPIDS Accelerator for Spark and Alluxio
 
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
 
Spark on Yarn
Spark on YarnSpark on Yarn
Spark on Yarn
 
10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production10 things i wish i'd known before using spark in production
10 things i wish i'd known before using spark in production
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
End-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache SparkEnd-to-End Deep Learning with Horovod on Apache Spark
End-to-End Deep Learning with Horovod on Apache Spark
 
Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming Paris Data Geek - Spark Streaming
Paris Data Geek - Spark Streaming
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Recently uploaded

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格q6pzkpark
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 

Recently uploaded (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 

Stage Level Scheduling Improving Big Data and AI Integration

  • 1. Stage Level Scheduling Improving Big Data and AI integration Thomas Graves Software Engineer at NVIDIA Spark PMC
  • 2. Agenda § Resource Scheduling § Stage Level Scheduling § Use Case Example § Demo
  • 4. Resource Scheduling • Driver • Cores • Memory • Accelerators (GPU/FPGA/etc) • Executors • Cores • Memory (overhead, pyspark, heap, offheap) • Accelerators (GPU/FPGA/etc) • Tasks (requirements) • CPUs • Accelerators (GPU/FPGA/etc)
  • 5. Resource Scheduling • Tasks Per Executor • Executor Resources / Task Requirements • Configs spark.driver.cores=1 spark.executor.cores=4 spark.task.cpus=1 spark.driver.memory=4g spark.executor.memory=4g spark.executor.memoryOverhead=2g spark.driver.resource.gpu.amount=1 spark.driver.resource.gpu.discoveryScript=./getGpuResources.sh spark.executor.resource.gpu.amount=1 spark.executor.resource.gpu.discoveryScript=./getGpuResources.sh spark.task.resource.gpu.amount=0.25
  • 7. Overview Spark ETL Stage Spark ML Stage NODE NODE GPU CPU CPU
  • 8. Stage Level Scheduling • Stage level resource scheduling (SPARK-27495) • Specify resource requirements per RDD operation • Spark dynamically allocates containers to meet resource requirements • Spark schedules tasks on appropriate containers • Benefits • Hardware utilization and cost • Ease of programming • Application no longer required split ETL and Deep Learning into separate applications • Pipeline simplification
  • 9. Use Cases • Beneficial any time the user wants to change container resources between stages in a single Spark application • ETL to Deep Learning • Skewed data • Data size large in certain stages • Jobs that use caching, switch to higher memory containers during those stages
  • 10. Resources Supported • Executor Resources • Cores • Heap Memory • OffHeap Memory • Pyspark Memory • Memory Overhead • Additional Resources (GPUs, etc) • Task Resources • CPUs • Additional Resources (GPUs, etc)
  • 11. Requirements • Spark 3.1.1 • Dynamic Allocation with External Shuffle Service or Shuffle tracking enabled • YARN and Kubernetes • RDD API only • Scala, Java, Python
  • 12. Implementation Details • New container acquired with new ResourceProfile • Does NOT try to fit into existing container with different ResourceProfile (Future Enhancement) • Unused containers idle timeout • Default to one ResourceProfile per stage • Config to allow multiple ResourceProfiles per stage • Multiple profiles will be merged with simple max of each resource
  • 13. YARN Implementation Details • External Shuffle Service and Dynamic Allocation • YARN Container Priority – ResourceProfile Id becomes container priority • YARN lower numbers are higher priority • Job Server type scenario that may come into affect • GPU and FPGA predefined, other resources require additional configurations • Custom resources via spark.yarn.executor.resource.* only apply in default profile – do not propogate because no way to override • Discovery script must be accessible – sent with job submission
  • 14. Kubernetes Implementation Details • Requires shuffle tracking enabled (spark.dynamicAllocaiton.shuffleTracking.enabled) • May not idel timeout if have shuffle data on the node • Result in more cluster resource used • spark.dynamicAllocaiton.shuffleTracking.timeout • Pod Template Behavior • Resource in Pod Template only used in default profile • Specify all resources needed in the ResourceProfile
  • 15. UI Screen Shots --executor-cores 2 --conf spark.executor.resource.gpu.amount=1 --conf spark.task.resource.gpu.amount=0.5
  • 17. API > import org.apache.spark.resource.{ExecutorResourceRequests, ResourceProfileBuilder, TaskResourceRequests} > val rpb = new ResourceProfileBuilder() > val ereq = new ExecutorResourceRequests() > val treq = new TaskResourceRequests() > ereq.cores(4).memory("6g”).memoryOverhead("2g”).resource("gpu", 2, "./getGpus") > treq.cpus(4).resource("gpu", 2) > rpb.require(ereq) > rpb.require(treq) > val rp = rpb.build() // use the ResourceProfile with the RDD > val mlRdd = df.rdd.withResources(rp) > mlRdd.mapPartitions { x => // feed data into ML and get result }.collect()
  • 20. API > rpb Profile executor resources: ArrayBuffer(memoryOverhead=name: memoryOverhead, amount: 2048, script: , vendor: , cores=name: cores, amount: 4, script: , vendor: , memory=name: memory, amount: 6144, script: , vendor: , gpu=name: gpu, amount: 2, script: ./getGpus, vendor: ), task resources: ArrayBuffer(cpus=name: cpus, amount: 4.0, gpu=name: gpu, amount: 2.0) > mlRdd.getResourceProfile : org.apache.spark.resource.ResourceProfile = Profile: id = 1, executor resources: memoryOverhead -> name: memoryOverhead, amount: 2048, script: , vendor: ,cores -> name: cores, amount: 4, script: , vendor: ,memory -> name: memory, amount: 6144, script: , vendor: ,gpu -> name: gpu, amount: 2, script: ./getGpus, vendor: , task resources: cpus -> name: cpus, amount: 4.0,gpu -> name: gpu, amount: 2.0
  • 21. API - Mutable vs Immutable > ereq.cores(2).memory("6g”).memoryOverhead("2g”).resource("gpu", 2, "./getGpus") > treq.cpus(1).resource("gpu", 1) > rpb.require(ereq).require(treq) > val rp = rpb.build() > rp : org.apache.spark.resource.ResourceProfile = Profile: id = 2, executor resources: memoryOverhead -> name: memoryOverhead, amount: 2048, script: , vendor: ,cores -> name: cores, amount: 2, script: , vendor: ,memory -> name: memory, amount: 6144, script: , vendor: ,gpu -> name: gpu, amount: 2, script: ./getGpus, vendor: , task resources: cpus -> name: cpus, amount: 1.0,gpu -> name: gpu, amount: 1.0 > treq.cpus(2).resource("gpu", 2) > rpb.require(treq) > val rpNew = rpb.build() > rpNew : org.apache.spark.resource.ResourceProfile = Profile: id = 3, executor resources: memoryOverhead -> name: memoryOverhead, amount: 2048, script: , vendor: ,cores -> name: cores, amount: 2, script: , vendor: ,memory -> name: memory, amount: 6144, script: , vendor: ,gpu -> name: gpu, amount: 2, script: ./getGpus, vendor: , task resources: cpus -> name: cpus, amount: 2.0,gpu -> name: gpu, amount: 2.0
  • 22. Use Case Example End to End Pipeline
  • 23. ETL Using Rapids Accelerator For Spark
  • 24. Rapids Accelerator For Spark • Run Spark on a GPU to accelerate processing • combines the power of the RAPIDS cuDF library and the scale of the Spark distributed computing framework • Spark SQL and DataFrames • Requires Spark 3.0+ • No user code changes • If operation not supported, run on CPU like normal • built-in accelerated shuffle based on UCX that can be configured to leverage GPU-to-GPU communication and RDMA capabilities’
  • 25. ETL Technology Stack Dask cuDF cuDF, Pandas Python Cython cuDF C++ CUDA Libraries CUDA Java JNI bindings Spark dataframes, Scala, PySpark
  • 26. Rapids Accelerator For Apache Spark (Plugin) DISTRIBUTED SCALE-OUT SPARK APPLICATIONS APACHE SPARK CORE RAPIDS Accelerator for Spark Spark SQL API DataFrame API Spark Shuffle if gpu_enabled(operation, data_type) call-out to RAPIDS else execute standard Spark operation ● Custom Implementation of Spark Shuffle ● Optimized to use RDMA and GPU- to-GPU direct communication JNI bindings Mapping From Java/Scala to C++ RAPIDS C++ Libraries UCX Libraries CUDA JNI bindings Mapping From Java/Scala to C++
  • 27. Spark SQL & Dataframe Compilation Flow DataFrame Logical Plan Physical Plan bar.groupBy( col(”product_id”), col(“ds”)) .agg( maxcol(“price”)) - min(col(“p(rice”)).alias(“range”)) SELECT product_id, ds, max(price) – min(price) AS range FROM bar GROUP BY product_id, ds QUERY GPU PHYSICAL PLAN GPU Physical Plan RAPIDS SQL Plugin RDD[InternalRow] RDD[ColumnarBatch]
  • 28. NDS Query 38 Results Entire query is GPU accelerated CPU Cluster: Driver: 1 x m5dn.large; Workers: 8 x m5dn.2xlarge On-demand cluster cost (US West): $4.488/hr GPU Cluster: Driver: 1 x m5dn.large; Workers: 8 x g4dn.2xlarge On-demand cluster cost (US West): $6.152/hr 163.0 53.2 0.0 40.0 80.0 120.0 160.0 200.0 CPU: 8 x m5dn.2xlarge (64-core 256GB) GPU: 8 x g4dn.2xlarge (64-core 256GB 8xT4 GPU) Time (secs) Query Time $0.20 $0.09 $0.00 $0.05 $0.10 $0.15 $0.20 $0.25 CPU: 8 x m5dn.2xlarge (64-core 256GB) GPU: 8 x g4dn.2xlarge (64-core 256GB 8xT4 GPU) Total Costs 3X Speed-up 55% Cost Saving
  • 30. Horovod Introduction • Distributed Deep learning training framework • TensorFlow, Keras, PyTorch, Apache MXNet • High Performance features • NCCL< GpuDirect, RDMA, tensor fusion • Easy to use • Just 5 lines of Python • Open Source • Linux Foundation AI Foundation • Easy to install • pip install horovod horovod.ai
  • 31. Demo
  • 32. End to End Horovod Demo
  • 34. Future Enhancements • Collect feedback from users • Allow setting certain configs – like dynamic allocation • Fitting new ResourceProfiles into existing containers • Better cleanup of ResourceProfiles • Catalyst internally
  • 36. Other Enhancements • Pluggable Caching • Allows developers to try different caching solutions • Custom GPU implementation
  • 38. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.