SlideShare uma empresa Scribd logo
1 de 43
Baixar para ler offline
On Improving Broadcast Joins
in Spark SQL
Jianneng Li
Software Engineer, Workday
This presentation may contain forward-looking statements for which there are risks, uncertainties, and
assumptions. If the risks materialize or assumptions prove incorrect, Workday’s business results and
directions could differ materially from results implied by the forward-looking statements. Forward-looking
statements include any statements regarding strategies or plans for future operations; any statements
concerning new features, enhancements or upgrades to our existing applications or plans for future
applications; and any statements of belief. Further information on risks that could affect Workday’s results is
included in our filings with the Securities and Exchange Commission which are available on the Workday
investor relations webpage: www.workday.com/company/investor_relations.php
Workday assumes no obligation for and does not intend to update any forward-looking statements. Any
unreleased services, features, functionality or enhancements referenced in any Workday document, roadmap,
blog, our website, press release or public statement that are not currently available are subject to change at
Workday’s discretion and may not be delivered as planned or at all.
Customers who purchase Workday, Inc. services should make their purchase decisions upon services,
features, and functions that are currently available.
Safe Harbor Statement
Agenda
▪ Apache Spark in Workday
Prism Analytics
▪ Broadcast Joins in Spark
▪ Improving Broadcast Joins
▪ Production Case Study
Spark in Workday Prism Analytics
Example Spark physical plan of our pipeline shown in Spark UI
▪ Customers use our self-
service product to build data
transformation pipelines, which
are compiled to DataFrames
and executed by Spark
▪ Finance and HR use cases
▪ This talk focuses on our HR use
cases - more on complex plans
than big data
Spark in Prism Analytics
For more details, see session from SAIS 2019 - Lessons Learned
using Apache Spark for Self-Service Data Prep in SaaS World
Broadcast Joins in Spark
Node 1
A 1
B 2
C 3 DD 4
Node 2
D 4
E 5
F 6 AA 1
Node 1 Node 2
A 1
B 2
C 3
AA 1
DD 4
D 4
E 5
F 6
AA 1
DD 4
Broadcast
Join
#UnifiedAnalytics #SparkAISummit
Broadcast Join Review
Broadcast Join Shuffle Join
Avoids shuffling the bigger side Shuffles both sides
Naturally handles data skew Can suffer from data skew
Cheap for selective joins Can produce unnecessary intermediate results
Broadcasted data needs to fit in memory Data can be spilled and read from disk
Cannot be used for certain outer joins Can be used for all joins
Broadcast Join vs. Shuffle Join
Where applicable, broadcast join should be faster than shuffle join
▪ Spark's broadcasting mechanism is inefficient
▪ Broadcasted data goes through the driver
▪ Too much broadcasted data can run the driver out of memory
Broadcasting in Spark
Driver
Executor 1
Executor 2
(1) Executors sends broadcasted data to driver
(2) Driver sends broadcasted data to executors
▪ Uses broadcasting mechanism to collect data to driver
▪ Planned per-join using size estimation and config
spark.sql.autoBroadcastJoinThreshold
Broadcast Joins in Spark
▪ BroadcastHashJoin (BHJ)
▪ Driver builds in-memory hashtable to distribute to executors
▪ BroadcastNestedLoopJoin (BNLJ)
▪ Distributes data as array to executors
▪ Useful for non-equi joins
▪ Disabled in Prism for stability reasons
Improving Broadcast Joins
Goal: More broadcast joins
▪ Q: Is broadcast join faster as long as broadcasted data fits in memory?
▪ A: It depends
▪ Experiment: increase broadcast threshold, and see what breaks
▪ Spoiler: many things go wrong before driver runs out of memory
Experiment: Single Join
Experiment setup
▪ TPC-H Dataset, 10GB
▪ Query: 60M table (lineitem) joining 15M table (orders) on key
▪ Driver: 1 core, 12 GB memory
▪ Executor: 1 instance, 18 cores, 102 GB memory
Single join results
SELECT lineitem.*
FROM lineitem, orders
WHERE l_orderkey =
o_orderkey
▪ Driver collects 15M rows
▪ Driver builds hashtable
▪ Driver sends hashtable to executor
▪ Executor deserializes hashtable
Why is BHJ slower?
Can we reduce BHJ overhead?
▪ Yes - executor side broadcast
Executor Side Broadcast
▪ Based on prototype from SPARK-17556
▪ Data is broadcasted between executors directly
Driver
Executor 1
Executor 2
Executors sends
broadcasted data to
each other
Driver keeps track of executor’s data blocks
Executor BHJ vs. Driver BHJ
Pros Cons
Driver has less memory pressure Each executor builds its own hashtable
Less data shuffled across network More difficult to know size of broadcast
Pros of executor BHJ outweigh cons
New results
SELECT lineitem.*
FROM lineitem, orders
WHERE l_orderkey =
o_orderkey
Why is BHJ still slower?
▪ Let's compare the cost models of the joins
SMJ Cost
▪ Assume n cores, tables A and B, where A > B
1. Read A/n, Sort, Write A/n
2. Read B/n, Sort, Write B/n
3. Read A/n, Read B/n, Join
▪ Considering only I/O costs: 3 A/n + 3 B/n
BHJ Cost
▪ Assume n cores, tables A and B, where A > B
1. Read B/n, Build hashtable, Write B
2. Read A/n, Read B, Join
▪ Considering only I/O costs: A/n + B/n + 2B
▪ SMJ: 3 A/n + 3 B/n
▪ BHJ: A/n + B/n + 2B
▪ SMJ - BHJ: (2 A/n + 2 B/n) - (2B)
Comparing SMJ and BHJ costs
▪ Analysis
▪ More cores, better performance from SMJ
▪ Larger A, better performance from BHJ
SMJ vs. BHJ: (A + B)/n vs. B
Varying cores - SMJ better with more cores
SELECT lineitem.*
FROM lineitem, orders
WHERE l_orderkey =
o_orderkey
Varying size of A - BHJ better with larger difference
SELECT lineitem.*
FROM lineitem, orders
WHERE l_orderkey =
o_orderkey
Increasing size of B - driver BHJ fails, executor BHJ best
SELECT lineitem.*
FROM lineitem, orders
WHERE l_orderkey =
o_orderkey
Other broadcast join improvements
▪ Increase Xms and MetaspaceSize to reduce GC
▪ Fetch all broadcast variables concurrently
▪ Other memory improvements in planning and whole-stage codegen
▪ Planning to contribute code changes back to open source
Production Case Study
▪ 98% of our joins are inner
joins or left outer joins
Join types in HR customer pipelines
Broadcast estimates in HR customer pipelines
▪ If we can increase broadcast
threshold from default 10 MB to
100 MB, then 80% of our joins
can be broadcasted
▪ 30 tables
▪ 29 tables 10K rows
▪ 1 table 3M rows
▪ ~160 joins
▪ Using 18 executor cores
HR use case pipeline
▪ Can broadcast joins make the
pipeline run faster?
Varying broadcast thresholds (0 MB, 10MB, 1GB)
What if we increase the 3M table?
▪ Will it bring similar performance improvements as single join?
30M rows for the big table
Why are more broadcast joins slower?
▪ Self joins and left outer joins
▪ In the highest threshold, the biggest table gets broadcasted
▪ Introduces broadcast overhead
▪ Reduces join parallelism
▪ Takes up storage memory
Closing Thoughts
▪ Executor side broadcast is better than driver side broadcast
▪ When evaluating whether broadcast is better, consider:
▪ Number of cores available
▪ Relative size difference between bigger and smaller tables
▪ Relative size of broadcast tables and available memory
▪ Presence of self joins and outer joins
Broadcast joins are better… with caveats
Future improvements in broadcast joins
▪ Adaptive Query Execution in Spark 3.0
▪ Building hashtables in BHJ with multiple cores
▪ Smaller footprint for BHJ hashtables
▪ Skew handling in sort merge join using broadcast
Thank you
Questions?
Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.
On Improving Broadcast Joins in Apache Spark SQL

Mais conteúdo relacionado

Mais procurados

Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introductioncolorant
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationDatabricks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...Databricks
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroDatabricks
 
In-Memory Evolution in Apache Spark
In-Memory Evolution in Apache SparkIn-Memory Evolution in Apache Spark
In-Memory Evolution in Apache SparkKazuaki Ishizaki
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failingSandy Ryza
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsdatamantra
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDatabricks
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkKazuaki Ishizaki
 
Spark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookSpark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookDatabricks
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan BlueDatabricks
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsDatabricks
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceDatabricks
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLDatabricks
 
Parallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysParallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysDatabricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache SparkDatabricks
 
Spark Performance Tuning .pdf
Spark Performance Tuning .pdfSpark Performance Tuning .pdf
Spark Performance Tuning .pdfAmit Raj
 

Mais procurados (20)

Spark shuffle introduction
Spark shuffle introductionSpark shuffle introduction
Spark shuffle introduction
 
Apache Spark Core – Practical Optimization
Apache Spark Core – Practical OptimizationApache Spark Core – Practical Optimization
Apache Spark Core – Practical Optimization
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
In-Memory Evolution in Apache Spark
In-Memory Evolution in Apache SparkIn-Memory Evolution in Apache Spark
In-Memory Evolution in Apache Spark
 
Why your Spark job is failing
Why your Spark job is failingWhy your Spark job is failing
Why your Spark job is failing
 
Optimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloadsOptimizing S3 Write-heavy Spark workloads
Optimizing S3 Write-heavy Spark workloads
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
 
Spark SQL Join Improvement at Facebook
Spark SQL Join Improvement at FacebookSpark SQL Join Improvement at Facebook
Spark SQL Join Improvement at Facebook
 
Spark and S3 with Ryan Blue
Spark and S3 with Ryan BlueSpark and S3 with Ryan Blue
Spark and S3 with Ryan Blue
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQLBuilding a SIMD Supported Vectorized Native Engine for Spark SQL
Building a SIMD Supported Vectorized Native Engine for Spark SQL
 
Parallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected WaysParallelizing with Apache Spark in Unexpected Ways
Parallelizing with Apache Spark in Unexpected Ways
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
Spark Performance Tuning .pdf
Spark Performance Tuning .pdfSpark Performance Tuning .pdf
Spark Performance Tuning .pdf
 

Semelhante a On Improving Broadcast Joins in Apache Spark SQL

Informix HA Best Practices
Informix HA Best Practices Informix HA Best Practices
Informix HA Best Practices Scott Lashley
 
Always on high availability best practices for informix
Always on high availability best practices for informixAlways on high availability best practices for informix
Always on high availability best practices for informixIBM_Info_Management
 
Case study migration from cm13 to cm14 - Oracle Primavera P6 Collaborate 14
Case study migration from cm13 to cm14 - Oracle Primavera P6 Collaborate 14Case study migration from cm13 to cm14 - Oracle Primavera P6 Collaborate 14
Case study migration from cm13 to cm14 - Oracle Primavera P6 Collaborate 14p6academy
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simpleDori Waldman
 
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...MapR Technologies
 
How to Suceed in Hadoop
How to Suceed in HadoopHow to Suceed in Hadoop
How to Suceed in HadoopPrecisely
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsSeeling Cheung
 
Tuning Flink Clusters for stability and efficiency
Tuning Flink Clusters for stability and efficiencyTuning Flink Clusters for stability and efficiency
Tuning Flink Clusters for stability and efficiencyDivye Kapoor
 
Tips for Installing Cognos Analytics: Configuring and Installing the Server
Tips for Installing Cognos Analytics: Configuring and Installing the ServerTips for Installing Cognos Analytics: Configuring and Installing the Server
Tips for Installing Cognos Analytics: Configuring and Installing the ServerSenturus
 
SharePoint Performance Monitoring with Sean P. McDonough
SharePoint Performance Monitoring with Sean P. McDonoughSharePoint Performance Monitoring with Sean P. McDonough
SharePoint Performance Monitoring with Sean P. McDonoughGabrijela Orsag
 
Enabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedEnabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedShubham Tagra
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and HadoopMichael Zhang
 
Implementing Oracle Hyperion Profitability and Cost Management in a Professio...
Implementing Oracle Hyperion Profitability and Cost Management in a Professio...Implementing Oracle Hyperion Profitability and Cost Management in a Professio...
Implementing Oracle Hyperion Profitability and Cost Management in a Professio...InnovusPartners
 
Enabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedEnabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedShubham Tagra
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScyllaDB
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceDatabricks
 
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai Yu
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai YuAn Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai Yu
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai YuDatabricks
 
Storage Sizing for SAP
Storage Sizing for SAPStorage Sizing for SAP
Storage Sizing for SAPCenk Ersoy
 
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]vasuballa
 
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Taro L. Saito
 

Semelhante a On Improving Broadcast Joins in Apache Spark SQL (20)

Informix HA Best Practices
Informix HA Best Practices Informix HA Best Practices
Informix HA Best Practices
 
Always on high availability best practices for informix
Always on high availability best practices for informixAlways on high availability best practices for informix
Always on high availability best practices for informix
 
Case study migration from cm13 to cm14 - Oracle Primavera P6 Collaborate 14
Case study migration from cm13 to cm14 - Oracle Primavera P6 Collaborate 14Case study migration from cm13 to cm14 - Oracle Primavera P6 Collaborate 14
Case study migration from cm13 to cm14 - Oracle Primavera P6 Collaborate 14
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
 
How to Suceed in Hadoop
How to Suceed in HadoopHow to Suceed in Hadoop
How to Suceed in Hadoop
 
Concept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with TelematicsConcept to production Nationwide Insurance BigInsights Journey with Telematics
Concept to production Nationwide Insurance BigInsights Journey with Telematics
 
Tuning Flink Clusters for stability and efficiency
Tuning Flink Clusters for stability and efficiencyTuning Flink Clusters for stability and efficiency
Tuning Flink Clusters for stability and efficiency
 
Tips for Installing Cognos Analytics: Configuring and Installing the Server
Tips for Installing Cognos Analytics: Configuring and Installing the ServerTips for Installing Cognos Analytics: Configuring and Installing the Server
Tips for Installing Cognos Analytics: Configuring and Installing the Server
 
SharePoint Performance Monitoring with Sean P. McDonough
SharePoint Performance Monitoring with Sean P. McDonoughSharePoint Performance Monitoring with Sean P. McDonough
SharePoint Performance Monitoring with Sean P. McDonough
 
Enabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speedEnabling Presto to handle massive scale at lightning speed
Enabling Presto to handle massive scale at lightning speed
 
The Data Center and Hadoop
The Data Center and HadoopThe Data Center and Hadoop
The Data Center and Hadoop
 
Implementing Oracle Hyperion Profitability and Cost Management in a Professio...
Implementing Oracle Hyperion Profitability and Cost Management in a Professio...Implementing Oracle Hyperion Profitability and Cost Management in a Professio...
Implementing Oracle Hyperion Profitability and Cost Management in a Professio...
 
Enabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedEnabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speed
 
Scaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/DayScaling Apache Pulsar to 10 Petabytes/Day
Scaling Apache Pulsar to 10 Petabytes/Day
 
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a ServiceZeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
Zeus: Uber’s Highly Scalable and Distributed Shuffle as a Service
 
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai Yu
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai YuAn Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai Yu
An Adaptive Execution Engine for Apache Spark with Carson Wang and Yucai Yu
 
Storage Sizing for SAP
Storage Sizing for SAPStorage Sizing for SAP
Storage Sizing for SAP
 
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]
OOW16 - Getting Optimal Performance from Oracle E-Business Suite [CON6711]
 
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
 

Mais de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Último

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 

Último (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 

On Improving Broadcast Joins in Apache Spark SQL

  • 1. On Improving Broadcast Joins in Spark SQL Jianneng Li Software Engineer, Workday
  • 2. This presentation may contain forward-looking statements for which there are risks, uncertainties, and assumptions. If the risks materialize or assumptions prove incorrect, Workday’s business results and directions could differ materially from results implied by the forward-looking statements. Forward-looking statements include any statements regarding strategies or plans for future operations; any statements concerning new features, enhancements or upgrades to our existing applications or plans for future applications; and any statements of belief. Further information on risks that could affect Workday’s results is included in our filings with the Securities and Exchange Commission which are available on the Workday investor relations webpage: www.workday.com/company/investor_relations.php Workday assumes no obligation for and does not intend to update any forward-looking statements. Any unreleased services, features, functionality or enhancements referenced in any Workday document, roadmap, blog, our website, press release or public statement that are not currently available are subject to change at Workday’s discretion and may not be delivered as planned or at all. Customers who purchase Workday, Inc. services should make their purchase decisions upon services, features, and functions that are currently available. Safe Harbor Statement
  • 3. Agenda ▪ Apache Spark in Workday Prism Analytics ▪ Broadcast Joins in Spark ▪ Improving Broadcast Joins ▪ Production Case Study
  • 4. Spark in Workday Prism Analytics
  • 5. Example Spark physical plan of our pipeline shown in Spark UI ▪ Customers use our self- service product to build data transformation pipelines, which are compiled to DataFrames and executed by Spark ▪ Finance and HR use cases ▪ This talk focuses on our HR use cases - more on complex plans than big data Spark in Prism Analytics For more details, see session from SAIS 2019 - Lessons Learned using Apache Spark for Self-Service Data Prep in SaaS World
  • 7. Node 1 A 1 B 2 C 3 DD 4 Node 2 D 4 E 5 F 6 AA 1 Node 1 Node 2 A 1 B 2 C 3 AA 1 DD 4 D 4 E 5 F 6 AA 1 DD 4 Broadcast Join #UnifiedAnalytics #SparkAISummit Broadcast Join Review
  • 8. Broadcast Join Shuffle Join Avoids shuffling the bigger side Shuffles both sides Naturally handles data skew Can suffer from data skew Cheap for selective joins Can produce unnecessary intermediate results Broadcasted data needs to fit in memory Data can be spilled and read from disk Cannot be used for certain outer joins Can be used for all joins Broadcast Join vs. Shuffle Join Where applicable, broadcast join should be faster than shuffle join
  • 9. ▪ Spark's broadcasting mechanism is inefficient ▪ Broadcasted data goes through the driver ▪ Too much broadcasted data can run the driver out of memory Broadcasting in Spark Driver Executor 1 Executor 2 (1) Executors sends broadcasted data to driver (2) Driver sends broadcasted data to executors
  • 10. ▪ Uses broadcasting mechanism to collect data to driver ▪ Planned per-join using size estimation and config spark.sql.autoBroadcastJoinThreshold Broadcast Joins in Spark ▪ BroadcastHashJoin (BHJ) ▪ Driver builds in-memory hashtable to distribute to executors ▪ BroadcastNestedLoopJoin (BNLJ) ▪ Distributes data as array to executors ▪ Useful for non-equi joins ▪ Disabled in Prism for stability reasons
  • 12. Goal: More broadcast joins ▪ Q: Is broadcast join faster as long as broadcasted data fits in memory? ▪ A: It depends ▪ Experiment: increase broadcast threshold, and see what breaks ▪ Spoiler: many things go wrong before driver runs out of memory
  • 14. Experiment setup ▪ TPC-H Dataset, 10GB ▪ Query: 60M table (lineitem) joining 15M table (orders) on key ▪ Driver: 1 core, 12 GB memory ▪ Executor: 1 instance, 18 cores, 102 GB memory
  • 15. Single join results SELECT lineitem.* FROM lineitem, orders WHERE l_orderkey = o_orderkey
  • 16. ▪ Driver collects 15M rows ▪ Driver builds hashtable ▪ Driver sends hashtable to executor ▪ Executor deserializes hashtable Why is BHJ slower?
  • 17. Can we reduce BHJ overhead? ▪ Yes - executor side broadcast
  • 18. Executor Side Broadcast ▪ Based on prototype from SPARK-17556 ▪ Data is broadcasted between executors directly Driver Executor 1 Executor 2 Executors sends broadcasted data to each other Driver keeps track of executor’s data blocks
  • 19. Executor BHJ vs. Driver BHJ Pros Cons Driver has less memory pressure Each executor builds its own hashtable Less data shuffled across network More difficult to know size of broadcast Pros of executor BHJ outweigh cons
  • 20. New results SELECT lineitem.* FROM lineitem, orders WHERE l_orderkey = o_orderkey
  • 21. Why is BHJ still slower? ▪ Let's compare the cost models of the joins
  • 22. SMJ Cost ▪ Assume n cores, tables A and B, where A > B 1. Read A/n, Sort, Write A/n 2. Read B/n, Sort, Write B/n 3. Read A/n, Read B/n, Join ▪ Considering only I/O costs: 3 A/n + 3 B/n
  • 23. BHJ Cost ▪ Assume n cores, tables A and B, where A > B 1. Read B/n, Build hashtable, Write B 2. Read A/n, Read B, Join ▪ Considering only I/O costs: A/n + B/n + 2B
  • 24. ▪ SMJ: 3 A/n + 3 B/n ▪ BHJ: A/n + B/n + 2B ▪ SMJ - BHJ: (2 A/n + 2 B/n) - (2B) Comparing SMJ and BHJ costs ▪ Analysis ▪ More cores, better performance from SMJ ▪ Larger A, better performance from BHJ SMJ vs. BHJ: (A + B)/n vs. B
  • 25. Varying cores - SMJ better with more cores SELECT lineitem.* FROM lineitem, orders WHERE l_orderkey = o_orderkey
  • 26. Varying size of A - BHJ better with larger difference SELECT lineitem.* FROM lineitem, orders WHERE l_orderkey = o_orderkey
  • 27. Increasing size of B - driver BHJ fails, executor BHJ best SELECT lineitem.* FROM lineitem, orders WHERE l_orderkey = o_orderkey
  • 28. Other broadcast join improvements ▪ Increase Xms and MetaspaceSize to reduce GC ▪ Fetch all broadcast variables concurrently ▪ Other memory improvements in planning and whole-stage codegen ▪ Planning to contribute code changes back to open source
  • 30. ▪ 98% of our joins are inner joins or left outer joins Join types in HR customer pipelines
  • 31. Broadcast estimates in HR customer pipelines ▪ If we can increase broadcast threshold from default 10 MB to 100 MB, then 80% of our joins can be broadcasted
  • 32. ▪ 30 tables ▪ 29 tables 10K rows ▪ 1 table 3M rows ▪ ~160 joins ▪ Using 18 executor cores HR use case pipeline ▪ Can broadcast joins make the pipeline run faster?
  • 33. Varying broadcast thresholds (0 MB, 10MB, 1GB)
  • 34. What if we increase the 3M table? ▪ Will it bring similar performance improvements as single join?
  • 35. 30M rows for the big table
  • 36. Why are more broadcast joins slower? ▪ Self joins and left outer joins ▪ In the highest threshold, the biggest table gets broadcasted ▪ Introduces broadcast overhead ▪ Reduces join parallelism ▪ Takes up storage memory
  • 38. ▪ Executor side broadcast is better than driver side broadcast ▪ When evaluating whether broadcast is better, consider: ▪ Number of cores available ▪ Relative size difference between bigger and smaller tables ▪ Relative size of broadcast tables and available memory ▪ Presence of self joins and outer joins Broadcast joins are better… with caveats
  • 39. Future improvements in broadcast joins ▪ Adaptive Query Execution in Spark 3.0 ▪ Building hashtables in BHJ with multiple cores ▪ Smaller footprint for BHJ hashtables ▪ Skew handling in sort merge join using broadcast
  • 42. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.