Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and Accelerators

•

3 gostaram•768 visualizações

CPU technologies have scaled well in past years, by more complex architecture design, more wide execution pipelines, more cores in same processor, and higher frequency. However accelerators show more computational power and higher throughput with lower cost in dedicated area, which leads to more usages in Spark. But when we integrate accelerators in Spark a common case is huge performance promises through micro test with little performance boost actually we get. One reason is the cost of data transfer between JVM and accelerator. The other reason is the accelerator lack the information how it's used in Spark. In this research, we investigate the usage of apache arrow based dataframe as the unified data sharing and transferring way between CPU and accelerators, and make it dataframe aware when we design hardware and software stack. In this way we seamlessly integrate Spark and Accelerators design and get close to promised performance.

Dados e análise

WIFI SSID:SparkAISummit | Password: UnifiedAnalytics

Binwei Yang, Intel
Carson Wang, Intel
Apache Arrow* Based
Unified Data Exchange
#UnifiedAnalytics #SparkAISummit

Me
• 13 years of experience on performance analysis
• Software -> CPU simulator -> Spark
• Join Intel Spark team in Aug. 2018
• A “layman” of Apache Spark
3

Pursuit of Performance Is Endless
• Intel® 2nd Gen Xeon® Scalable Processors
• Intel® Optane™ DC persistent memory
• Intel® FPGA
• Software optimization
4

Without Offload
5
Internal Row Internal RowTungsten Engine
CPU

FPGA Offload
6
Internal Row
FPGA Batch
FPGA DMA RX FPGA Engine
Internal Row
FPGA Batch
FPGA DMA TX
CPU
FPGA
Spark already has off-heap unsafe-row

Offloading Performance
7
To-FPGA
Offload
From-FPGA
Time
CPU
√

Offloading Performance
8
To-FPGA
Offload
From-FPGA
Time
To-FPGA
Offload
From-FPGA
CPU
√

Overhead of Offload
9
Internal Row
FPGA Batch
FPGA DMA RX FPGA Engine
Internal Row
FPGA Batch
FPGA DMA TX
CPU
FPGA
Convert
Data Move

FPGA BatchFPGA Batch
Optimize – Unified Format
10
Unified Format
FPGA DMA RX FPGA Engine
Unified Format
FPGA DMA TX
CPU
FPGA
• Unified format FPAG can easily debug
• FPGA library can be shared with all other projects

FPGA BatchFPGA Batch
Optimize – Double Buffer
11
Unified Format
FPGA DMA RX1
FPGA Engine
Unified Format
CPU
FPGA
FPGA DMA RX2
FPGA DMA RX1
FPGA DMA RX2

Optimize – Double Buffer
12
Time
Col1
Eng 1
Col2
Eng 2
Col3
Eng 3
Col…
Eng …
• Columnar data format is
friendly to most of
accelerator

Do We Fully Utilize CPU?
13
df.agg(F.sum(‘a_float')).show()
perf stat -e fp_arith_inst_retired.128b_packed_single -A -a sleep 1
CPU0 0 fp_arith_inst_retired.128b_packed_single
CPU1 0 fp_arith_inst_retired.128b_packed_single
CPU2 0 fp_arith_inst_retired.128b_packed_single
…

Add AVX Support
14
• We need
– A columnar data format
– Native LLVM SQL Engine
• Take use of other highly optimized libraries

Recap
15
• A standard columnar data format
– Easily debug
– Shared by all projects
• Implement a serial of Tungsten backends

Apache Arrow* Is the Answer
16
• Apache Arrow* is the best choice
• A standard data frame format
– For Native Tungsten backend
– For all accelerators offloading Spark SQL engine
*Other names and brands may be claimed as the property of others.

Plug and Play Backend
17
op1 op2 op3 op4
Python
UDF
Data Frame Physical Plan
Tungsten Backend
JVM
LLVM
AVX
ACC1 ACC2 Intel
Python
Off-Heap Python
>>> >>> >>>

Take Use of Intel Optane DC Persistent
Memory
18
op1 op2 op3 op4
Data Frame Physical Plan
Tungsten Backend
JVM
LLVM
AVX
ACC ACC
Off-Heap
>>> >>>

Take Use of Intel Optane DC Persistent
Memory
19
op2 op3 op4
Data Frame Physical Plan
Tungsten Backend
LLVM
AVX
ACC ACC
Off-Heap
>>> >>>Shuffle
Input

Json, CSV, Unzip Offload
20
op2 op3 op4
Data Frame Physical Plan
Tungsten Backend
LLVM
AVX
ACC1 ACC2
Off-Heap
>>> >>>
Unzip ACC1Json csv

Filter, Project Pushdown
21
op2 op3 op4
Data Frame Physical Plan
Tungsten Backend
LLVM
AVX
ACC1 ACC2
Off-Heap
>>> >>>
ACC1Filter Project

Connect Other ML/AI Framework
22
• The proposal of JIRA 24579
• No extra data format convert

Call to Action
• Share your comments on JIRA 27396 created by
Robert
• Follow our work on https://github.com/Intel-
bigdata
• Let’s bring Spark’s performance to higher level
23#UnifiedAnalytics #SparkAISummit

DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Mais conteúdo relacionado

Mais procurados

Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...Databricks

Pedal to the Metal: Accelerating Spark with Silicon InnovationJen Aman

Deep Learning with Apache Spark and GPUs with Pierce SpitlerDatabricks

Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks

Improving Apache Spark by Taking Advantage of Disaggregated ArchitectureDatabricks

Announcing Amazon EC2 F1 Instances with Custom FPGAsAmazon Web Services

Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit

Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit

Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit

Hive vs. ImpalaOmid Vahdaty

Spark Summit 2016: Connecting Python to the Spark EcosystemDaniel Rodriguez

Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...DataWorks Summit

SOS: Optimizing Shuffle I/O with Brian Cho and Ergin SeyfeDatabricks

Scaling Apache Spark at FacebookDatabricks

CaffeOnSpark Update: Recent Enhancements and Use CasesDataWorks Summit

Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Spark Summit

Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...Databricks

KSCOPE 2013: Exadata Consolidation Success StoryKristofferson A

Spark meetup feb 2016Todd Niven

Spark Summit EU talk by Jorg SchadSpark Summit

Mais procurados (20)

Accelerating Apache Spark Shuffle for Data Analytics on the Cloud with Remote...

Pedal to the Metal: Accelerating Spark with Silicon Innovation

Deep Learning with Apache Spark and GPUs with Pierce Spitler

Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...

Improving Apache Spark by Taking Advantage of Disaggregated Architecture

Announcing Amazon EC2 F1 Instances with Custom FPGAs

Flexible and Real-Time Stream Processing with Apache Flink

Spark Summit EU talk by Debasish Das and Pramod Narasimha

Kafka to the Maxka - (Kafka Performance Tuning)

Hive vs. Impala

Spark Summit 2016: Connecting Python to the Spark Ecosystem

Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...

SOS: Optimizing Shuffle I/O with Brian Cho and Ergin Seyfe

Scaling Apache Spark at Facebook

CaffeOnSpark Update: Recent Enhancements and Use Cases

Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...

Serverless Machine Learning on Modern Hardware Using Apache Spark with Patric...

KSCOPE 2013: Exadata Consolidation Success Story

Spark meetup feb 2016

Spark Summit EU talk by Jorg Schad

Semelhante a Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and Accelerators

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit

PLNOG 13: P. Kupisiewicz, O. Pelerin: Make IOS-XE Troubleshooting Easy – Pack...PROIDEA

CAPI and OpenCAPI Hardware acceleration enablementGanesan Narayanasamy

Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt

Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...Ceph Community

3.INTEL.Optane_on_ceph_v2.pdfhellobank1

Ceph Day Tokyo - Delivering cost effective, high performance Ceph clusterCeph Community

Ceph Day KL - Delivering cost-effective, high performance Ceph clusterCeph Community

00 opencapi acceleration framework yonglu_ver2Yutaka Kawai

Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster Ceph Community

Introduction to Spark Streaming & Apache Kafka | Big Data Hadoop Spark Tutori...CloudxLab

Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster Ceph Community

Using a Field Programmable Gate Array to Accelerate Application PerformanceOdinot Stanislas

Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...Databricks

Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...Chris Fregly

PLNOG14: Architektura oraz rozwiązywanie problemów na routerach IOS-XE - Piot...PROIDEA

SCFE 2020 OpenCAPI presentation as part of OpenPWOER TutorialGanesan Narayanasamy

InAccel FPGA resource managerChristoforos Kachris

Deep Dive into GPU Support in Apache Spark 3.xDatabricks

A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...Shinya Takamaeda-Y

Semelhante a Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and Accelerators (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang

PLNOG 13: P. Kupisiewicz, O. Pelerin: Make IOS-XE Troubleshooting Easy – Pack...

CAPI and OpenCAPI Hardware acceleration enablement

Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...

Ceph Day Beijing - Optimizing Ceph performance by leveraging Intel Optane and...

3.INTEL.Optane_on_ceph_v2.pdf

Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster

Ceph Day KL - Delivering cost-effective, high performance Ceph cluster

00 opencapi acceleration framework yonglu_ver2

Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster

Introduction to Spark Streaming & Apache Kafka | Big Data Hadoop Spark Tutori...

Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster

Using a Field Programmable Gate Array to Accelerate Application Performance

Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...

Advanced Apache Spark Meetup: How Spark Beat Hadoop @ 100 TB Daytona GraySor...

PLNOG14: Architektura oraz rozwiązywanie problemów na routerach IOS-XE - Piot...

SCFE 2020 OpenCAPI presentation as part of OpenPWOER Tutorial

InAccel FPGA resource manager

Deep Dive into GPU Support in Apache Spark 3.x

A High Performance Heterogeneous FPGA-based Accelerator with PyCoRAM (Runner ...

Mais de Databricks

DW Migration Webinar-March 2022.pptxDatabricks

Data Lakehouse Symposium | Day 1 | Part 1Databricks

Data Lakehouse Symposium | Day 1 | Part 2Databricks

Data Lakehouse Symposium | Day 2Databricks

Data Lakehouse Symposium | Day 4Databricks

5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks

Democratizing Data Quality Through a Centralized PlatformDatabricks

Learn to Use Databricks for Data ScienceDatabricks

Why APM Is Not the Same As ML MonitoringDatabricks

The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks

Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks

Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks

Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks

Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks

Sawtooth Windows for Feature AggregationsDatabricks

Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks

Re-imagine Data Monitoring with whylogs and SparkDatabricks

Raven: End-to-end Optimization of ML Prediction QueriesDatabricks

Processing Large Datasets for ADAS Applications using Apache SparkDatabricks

Massive Data Processing in Adobe Using Delta LakeDatabricks

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx

Data Lakehouse Symposium | Day 1 | Part 1

Data Lakehouse Symposium | Day 1 | Part 2

Data Lakehouse Symposium | Day 2

Data Lakehouse Symposium | Day 4

5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop

Democratizing Data Quality Through a Centralized Platform

Learn to Use Databricks for Data Science

Why APM Is Not the Same As ML Monitoring

The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix

Stage Level Scheduling Improving Big Data and AI Integration

Simplify Data Conversion from Spark to TensorFlow and PyTorch

Scaling your Data Pipelines with Apache Spark on Kubernetes

Scaling and Unifying SciKit Learn and Apache Spark Pipelines

Sawtooth Windows for Feature Aggregations

Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink

Re-imagine Data Monitoring with whylogs and Spark

Raven: End-to-end Optimization of ML Prediction Queries

Processing Large Datasets for ADAS Applications using Apache Spark

Massive Data Processing in Adobe Using Delta Lake

Último

Discover Why Less is More in B2B Researchmichael115558

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

April 2024 - Crypto Market Report's Analysismanisha194592

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

Mature dropshipping via API with DroFx.pptxolyaivanovalion

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823

Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra

Invezz.com - Grow your wealth with trading signalsInvezz1

Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and Accelerators

1. WIFI SSID:SparkAISummit | Password: UnifiedAnalytics

2. Binwei Yang, Intel Carson Wang, Intel Apache Arrow* Based Unified Data Exchange #UnifiedAnalytics #SparkAISummit

3. Me • 13 years of experience on performance analysis • Software -> CPU simulator -> Spark • Join Intel Spark team in Aug. 2018 • A “layman” of Apache Spark 3

4. Pursuit of Performance Is Endless • Intel® 2nd Gen Xeon® Scalable Processors • Intel® Optane™ DC persistent memory • Intel® FPGA • Software optimization 4

5. Without Offload 5 Internal Row Internal RowTungsten Engine CPU

6. FPGA Offload 6 Internal Row FPGA Batch FPGA DMA RX FPGA Engine Internal Row FPGA Batch FPGA DMA TX CPU FPGA Spark already has off-heap unsafe-row

7. Offloading Performance 7 To-FPGA Offload From-FPGA Time CPU √

8. Offloading Performance 8 To-FPGA Offload From-FPGA Time To-FPGA Offload From-FPGA CPU √

9. Overhead of Offload 9 Internal Row FPGA Batch FPGA DMA RX FPGA Engine Internal Row FPGA Batch FPGA DMA TX CPU FPGA Convert Data Move

10. FPGA BatchFPGA Batch Optimize – Unified Format 10 Unified Format FPGA DMA RX FPGA Engine Unified Format FPGA DMA TX CPU FPGA • Unified format FPAG can easily debug • FPGA library can be shared with all other projects

11. FPGA BatchFPGA Batch Optimize – Double Buffer 11 Unified Format FPGA DMA RX1 FPGA Engine Unified Format CPU FPGA FPGA DMA RX2 FPGA DMA RX1 FPGA DMA RX2

12. Optimize – Double Buffer 12 Time Col1 Eng 1 Col2 Eng 2 Col3 Eng 3 Col… Eng … • Columnar data format is friendly to most of accelerator

13. Do We Fully Utilize CPU? 13 df.agg(F.sum(‘a_float')).show() perf stat -e fp_arith_inst_retired.128b_packed_single -A -a sleep 1 CPU0 0 fp_arith_inst_retired.128b_packed_single CPU1 0 fp_arith_inst_retired.128b_packed_single CPU2 0 fp_arith_inst_retired.128b_packed_single …

14. Add AVX Support 14 • We need – A columnar data format – Native LLVM SQL Engine • Take use of other highly optimized libraries

15. Recap 15 • A standard columnar data format – Easily debug – Shared by all projects • Implement a serial of Tungsten backends

16. Apache Arrow* Is the Answer 16 • Apache Arrow* is the best choice • A standard data frame format – For Native Tungsten backend – For all accelerators offloading Spark SQL engine *Other names and brands may be claimed as the property of others.

17. Plug and Play Backend 17 op1 op2 op3 op4 Python UDF Data Frame Physical Plan Tungsten Backend JVM LLVM AVX ACC1 ACC2 Intel Python Off-Heap Python >>> >>> >>>

18. Take Use of Intel Optane DC Persistent Memory 18 op1 op2 op3 op4 Data Frame Physical Plan Tungsten Backend JVM LLVM AVX ACC ACC Off-Heap >>> >>>

19. Take Use of Intel Optane DC Persistent Memory 19 op2 op3 op4 Data Frame Physical Plan Tungsten Backend LLVM AVX ACC ACC Off-Heap >>> >>>Shuffle Input

20. Json, CSV, Unzip Offload 20 op2 op3 op4 Data Frame Physical Plan Tungsten Backend LLVM AVX ACC1 ACC2 Off-Heap >>> >>> Unzip ACC1Json csv

21. Filter, Project Pushdown 21 op2 op3 op4 Data Frame Physical Plan Tungsten Backend LLVM AVX ACC1 ACC2 Off-Heap >>> >>> ACC1Filter Project

22. Connect Other ML/AI Framework 22 • The proposal of JIRA 24579 • No extra data format convert

23. Call to Action • Share your comments on JIRA 27396 created by Robert • Follow our work on https://github.com/Intel- bigdata • Let’s bring Spark’s performance to higher level 23#UnifiedAnalytics #SparkAISummit

24. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT

Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and Accelerators

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and Accelerators

Semelhante a Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and Accelerators (20)

Mais de Databricks

Mais de Databricks (20)

Último

Último (20)

Apache Arrow-Based Unified Data Sharing and Transferring Format Among CPU and Accelerators