SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
Data Analytics Technology, Intel Corporation
QATCodec: Past, Present
and Future
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
LEGALNOTICESNo license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of
merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of
performance, course of dealing, or usage in trade.
This document contains information on products, services and/or processes in development. All information provided
here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule,
specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from
published specifications. Current characterized errata are available on request.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-
800-548-4725 or by visiting www.intel.com/design/literature.htm.
Intel, the Intel logo, Intel® are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others
Copyright © 2018 Intel Corporation.
2
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
ForPerformanceClaimsandOptimizationNotice
3
Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests,
such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change
to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating
your contemplated purchases, including the performance of that product when combined with other products.
Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not
unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not
guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-
dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel
microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information
regarding the specific instruction sets covered by this notice. Notice Revision #20110804.
Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits
referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others
For more information go to http://www.intel.com/performance.
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
AboutUS
Intel SSG OTC DAT
● Active open source contributions: About 30
committers from our team with significant
contributions to a lot of big data key projects
including Spark, Hadoop, HBase, Hive...
● A global engineering team (China, US, India) -
Major team is based in Shanghai
● Mission: To fully optimize analytics solutions
on IA and make Intel the best and easiest
platform for big data and analytics customers
Edge
Analytics APPS
Analytics PIPELINE
Data Platform
Infrastructure
Machine/DeepLearning
PerformanceandSecurity
+ +
Data Center
4
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Agenda
• Introduction to Big Data
• QAT Acceleration Opportunity in Big Data
• High Level Architecture for QATCodec in Big Data
• Performance Evaluation
• Future Work for QATCodec
5
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
6IMG Source: https://viblo.asia/p/why-we-need-modern-big-data-integration-platform-gAm5yx1Dldb
WhatisBIGDATA
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
BIGDATASOLUTIONSTACKATINTEL
Hadoop MapReduceSpark
Hadoop HDFS
Intel
Architecture
Storage
Compute
Framework
Applications
Customers/
Partners
Solution for Customers/Patterns
Storm/Flink
Batch
SQL, Spark SQL, Hive, …etc
Streaming
Kafka, Spark Streaming, Flink, Storm,
…etc
Machine Learning/
Deep Learning
Spark ML/MLlib, BigDL
Bare Metal Yarn KubernetesResource
Manager
Object Store
Ceph, GlusterFS
Intel® Optane
Persistent Memory,
SSD, ..etc
Intel® Xeon®
Purley, Grantley, …etc
Intel® Technologies
QAT, RSD, RDMA, AVX512, MKL, …etc
Database
MySQL, HBase, Cassandra …etc
TensorFlow/Caffe
Mesos
Cloud
VM, Container, …etc
Intel® FPGA
Arria, Stratix, …etc
7
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
8source[IDC and Seagate]: https://www.seagate.com/files/www-content/our-story/trends/files/Seagate-
WP-DataAge2025-March-2017.pdf
WHYQATisimportanttoBIGData
• Compression Needs
• Reduce data volume and save
storage space.
• Speed up the disk I/O operations
and data transfer across network,
optimize workload performance.
• Trade-off
• Computation overhead for high
compression ratio codecs.
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
DataCompressionPipelineinBIGDATAFRAMEWOEK
9
Map
Map
Input
A HDFS file
Map
reduce
Output
A HDFS file
reduce
reduce
Intermediate Data
Each Map’s output
Shuffle (Multiple iterations)
Partition 0
Partition 1
Partition 0
Partition 2
Partition 1
Partition 0
Partition 2
Partition 0
Partition 2
Partition 1
Output 0
Output 1
Output 2
Input
split0
Input
split1
Input
split2
Input Decompression
Shuffle Compression
Output Compression
Shuffle Decompression
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
DataCompressionPipelineinBIGDAtA-I/OCharacteristics
• HDFS Storage
• Generally sequence read/write
• Generally one time read/write
10
Shuffle OperationsHDFS Storage
Input Read(Data Decompression) Shuffle Write(Data Compression)
Output Write(Data Compression) Shuffle Read(Data Decompression)
• Shuffle Operations
• Random read/write
• Multiple times read/write
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
11
QATAccelerationOpportunity
QAT
Compress
QAT
Compress
QAT
Compress
QAT
Decompress
QAT
Decompress
Compression/Decompression in Sort Workload
Write to
HDFS
Read File Sort
Shuffle
Write
Merge
Write to
HDFS
Shuffle Read
Map ReduceData Gen
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
12
Java
AboutQATCodec
QAT Codec project provides compression and decompression
library for Apache Hadoop/Apache Hive/Apache Spark using
the Intel® QuickAssist Technology.
Its advantages:
● Higher performance with better compression ratio
(against Snappy)
● Open
○ Great support in major big data components
(Spark, Hadoop, Hive, Parquet)
○ It’s an open source project and available at
https://github.com/intel-hadoop/IntelQATCodec
○ Apply Intel QAT Open Lab
https://www.sdnlab.com/intel_network_platform
Intel® QuickAssist Hardware
Intel® QuickAssist Linux Kernel Driver
QATzip (User Space API)
QAT Codec (Hadoop& Spark Codec Impl for QAT)
Sequence File
Input/Output Intermediate data
Sequence File
Input/Output
Intermediate Data
Input/Output
Intermediate Data
Native
12
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
BenchmarkConfiguration
Hardware
Number of worker nodes 3
CPU Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz
Number of vcores per node Limit to 64
Memory 384 GB
Disk 1* NVMe SSD 2 TB
QAT 3* Lewisburg
Network 10GbE
Software
QAT Version 1.1.0 Build 0010 (Early Access with CnV)
CDH 5.14.0
13
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
14
PerformanceEvaluation–SparkSort
Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. Software and
workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those
factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to
www.intel.com/benchmarks. Tests performed by Intel® company. Configurations: see slides 13
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
15
PerformanceEvaluation–MapReduce
Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. Software and
workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those
factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to
www.intel.com/benchmarks. Tests performed by Intel® company. Configurations: see slides 13
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
16
PerformanceEvaluation–HiveonMapReduce
Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. Software and
workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those
factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to
www.intel.com/benchmarks. Tests performed by Intel® company. Configurations: see slides 13
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
QATVS.Snappy–CompressionRatio(1TBdatascale)
0.0000%
20.0000%
40.0000%
60.0000%
80.0000%
100.0000%
120.0000%
Q/S*100%
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Impactanalysisonqueries-CompressionRatio• Impacts
- Map Join, Parallelism & ORC records per strip
• Map Join conversion
- hive.auto.convert.join.noconditionaltask.size
- Joined table total size
- GC overhead issue
• Parallelism
- Input split size:
mapreduce.input.fileinputformat.split.maxsize
- File block size: smallest data unit for map task (e.g.
orc.strip.size)
- Determine Map task number
64M 64M 64M 64M
Result Map Tasks:
Task Task
TaskTask
Map Tasks:
Orc File size: 256M
Split size is set as 64M
64M 64M 52M
Task
Task
Task
Map Tasks:
Orc File size: 180M
Records Number: 100K
QATSNAPPY
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Queryanalysis
Map join conversion
Map join GC overhead issue
System Metrics
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Q22(MapJoinconversion)
QAT SNAPPY table
1.5M 2.3M date_dim
5.6G 7.8G inventory
77.2M 99.2M item
3.4K 3.8K warehouse
hive.auto.convert.join.noconditionaltask.size=100000000; (95MB)
Table size summary:
0
100
200
300
400
500
Snappy - None
Map Join
QAT - None Map
Join
QAT - Map Join
Execution time summary
Execution Time
SnappyQAT
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Queryanalysis
Map join conversion
Map join GC overhead issue
System Metrics
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Q12(GCissuecausedbyMapJoin)
Common
join
Common
join
Common
join
[148s]
web_clickstreams
[96.4GB]
Item
[99.2MB]
store_sales
[49.5GB]
Item
[99.2MB]
Snappy
Map Join
Map Join
Map Join
[1252s]
web_clickstreams
[80.8GB]
Item
[77.2MB]
store_sales
[43.9GB]
Item
[77.2MB]
QAT
Heap size of Map is 3GB.
Full GC becoming very frequent
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Q12(GCissuecausedbyMapJoin)–cont’d
1
2.65
2.02
0.92
0
0.5
1
1.5
2
2.5
3
Snappy QAT-3GB map QAT-boost map heap size QAT-common join
Performance comparison between QAT and Snappy
Lower is Better
Boosting heap
size of map tasks
Disabling map
join
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Queryanalysis
Map join conversion
Map join GC overhead issue
System Metrics
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Micro-viewQueryComparison-IOwaitCPU Utilization
SnappyQAT
IO Request
• Less IO request
• IO wait helps
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
Micro-viewQueryComparison–NoIOwaitCPU Utilization
SnappyQAT
IO Request
• Less IO request
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
27
Achievements
Deal Win for QuickAssist Technology: We already have the first customer for Skylake with QAT
and QATCodec, more customers have interesting in it.
Cloudera Certification of Intel® QuickAssist Technology Codec & QATZIP: Validation using
MapReduce jobs with TeraGen and Terasort workloads
https://www.cloudera.com/partners/partners-listing.html?q=intel
17
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
28
28
Past Now
QATCodec project
established:
• Begin with Hadoop QAT
support
• Jan, 2015
Intel® QAT Codec
open source
• 6 weeks to achieve
Intel open source
certification
• July 2018
QAT for Purley Launch
• Report out, our final
optimization brings
benefits for Hive, Spark,
Hadoop
• Aug 2017
Complete first customer
commitments
• CDH 5.14.0 version support
(rebase & package)
• Integration test/performance
test with latest QAT 4.1.0 and
QATZIP 0.2.3
• May 2018
Refresh performance
with new QATzip
• Refresh
performance for the
new QATZip
• Nov 2017
QATCodec project with
supports for more big data
components
• With supports of Hive,
Spark
• Jan, 2017
QATCodec project
established
QATCodec project with supports for more
big data components
QAT for Purley Launch
Refresh performance
with new QATzip
Complete First Customer commitments
Intel® QAT Codec open source
Cloudera Certifications
TheJourneyOfQATCodec
Copyright © 2017, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization Notice
29
FutureWork
Continuing engineering collaborations
● More features support: hugepage optimization support
● More big data components supports: Apache CarbonData (Huawei)
● Continuing supports for coming CDH version: CDH 6.0 supports
More customers
● Seek for more deal wins like our first custormer case to bring the hardware
accelerations to more big data users.
19
QATCodec: past, present and future

Mais conteúdo relacionado

Mais procurados

LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance testsLF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
LF_DPDK
 
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applicationsLF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK
 
LF_DPDK17_Making networking apps scream on Windows with DPDK
LF_DPDK17_Making networking apps scream on Windows with DPDKLF_DPDK17_Making networking apps scream on Windows with DPDK
LF_DPDK17_Making networking apps scream on Windows with DPDK
LF_DPDK
 
Disrupting the Data Center: Unleashing the Digital Services Economy
Disrupting the Data Center: Unleashing the Digital Services EconomyDisrupting the Data Center: Unleashing the Digital Services Economy
Disrupting the Data Center: Unleashing the Digital Services Economy
Intel IT Center
 

Mais procurados (20)

Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase – Big D...
 
Real-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPAReal-Time Game Optimization with Intel® GPA
Real-Time Game Optimization with Intel® GPA
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Tec...
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Telec...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Telec...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Telec...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Telec...
 
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
 
IT@Intel: Creating Smart Spaces with All-in-Ones
IT@Intel:  Creating Smart Spaces with All-in-OnesIT@Intel:  Creating Smart Spaces with All-in-Ones
IT@Intel: Creating Smart Spaces with All-in-Ones
 
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance testsLF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
LF_DPDK17_DPDK's best kept secret – Micro-benchmark performance tests
 
Intel® Xeon® processor E7-8800/4800 v3 Application Showcase
Intel® Xeon® processor E7-8800/4800 v3 Application ShowcaseIntel® Xeon® processor E7-8800/4800 v3 Application Showcase
Intel® Xeon® processor E7-8800/4800 v3 Application Showcase
 
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applicationsLF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
LF_DPDK17_Enabling hardware acceleration in DPDK data plane applications
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Core ...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Core ...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Core ...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Core ...
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Fin...
	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Fin...	 Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Fin...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Fin...
 
Intel® Xeon® Processor E5-2600 v4 Product Family EAMG
Intel® Xeon® Processor E5-2600 v4 Product Family EAMGIntel® Xeon® Processor E5-2600 v4 Product Family EAMG
Intel® Xeon® Processor E5-2600 v4 Product Family EAMG
 
LF_DPDK17_Making networking apps scream on Windows with DPDK
LF_DPDK17_Making networking apps scream on Windows with DPDKLF_DPDK17_Making networking apps scream on Windows with DPDK
LF_DPDK17_Making networking apps scream on Windows with DPDK
 
Disrupting the Data Center: Unleashing the Digital Services Economy
Disrupting the Data Center: Unleashing the Digital Services EconomyDisrupting the Data Center: Unleashing the Digital Services Economy
Disrupting the Data Center: Unleashing the Digital Services Economy
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloads
 
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Data ...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Data ...Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Data ...
Intel® Xeon® Processor E5-2600 v3 Product Family Application Showcase - Data ...
 
LF_DPDK_Accelerate storage service via SPDK
LF_DPDK_Accelerate storage service via SPDK LF_DPDK_Accelerate storage service via SPDK
LF_DPDK_Accelerate storage service via SPDK
 
3 additional dpdk_theory(1)
3 additional dpdk_theory(1)3 additional dpdk_theory(1)
3 additional dpdk_theory(1)
 
In The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for IntelIn The Trenches Optimizing UE4 for Intel
In The Trenches Optimizing UE4 for Intel
 
E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case E5 Intel Xeon Processor E5 Family Making the Business Case
E5 Intel Xeon Processor E5 Family Making the Business Case
 

Semelhante a QATCodec: past, present and future

Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Alluxio, Inc.
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Databricks
 
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
Edge AI and Vision Alliance
 

Semelhante a QATCodec: past, present and future (20)

What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?What are latest new features that DPDK brings into 2018?
What are latest new features that DPDK brings into 2018?
 
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
Intel: How to Use Alluxio to Accelerate BigData Analytics on the Cloud and Ne...
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function Framework
 
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
 
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference ChipSpring Hill (NNP-I 1000): Intel's Data Center Inference Chip
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
 
Новые технологии Intel в центрах обработки данных
Новые технологии Intel в центрах обработки данныхНовые технологии Intel в центрах обработки данных
Новые технологии Intel в центрах обработки данных
 
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura IntelTDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
TDC2018SP | Trilha IA - Inteligencia Artificial na Arquitetura Intel
 
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive... Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
Intel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/LabIntel NFVi Enabling Kit Demo/Lab
Intel NFVi Enabling Kit Demo/Lab
 
Software Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT PlatformsSoftware Development Tools for Intel® IoT Platforms
Software Development Tools for Intel® IoT Platforms
 
Re-architecting the Datacenter to Deliver Better Experiences (Intel)
Re-architecting the Datacenter to Deliver Better Experiences (Intel)Re-architecting the Datacenter to Deliver Better Experiences (Intel)
Re-architecting the Datacenter to Deliver Better Experiences (Intel)
 
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
 
Performance out of the box developers
Performance   out of the box developersPerformance   out of the box developers
Performance out of the box developers
 
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
“Acceleration of Deep Learning Using OpenVINO: 3D Seismic Case Study,” a Pres...
 
Introduction to container networking in K8s - SDN/NFV London meetup
Introduction to container networking in K8s - SDN/NFV  London meetupIntroduction to container networking in K8s - SDN/NFV  London meetup
Introduction to container networking in K8s - SDN/NFV London meetup
 
Intel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewIntel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overview
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

QATCodec: past, present and future

  • 1. Data Analytics Technology, Intel Corporation QATCodec: Past, Present and Future
  • 2. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice LEGALNOTICESNo license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1- 800-548-4725 or by visiting www.intel.com/design/literature.htm. Intel, the Intel logo, Intel® are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others Copyright © 2018 Intel Corporation. 2
  • 3. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice ForPerformanceClaimsandOptimizationNotice 3 Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Optimization Notice: Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor- dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice Revision #20110804. Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others For more information go to http://www.intel.com/performance.
  • 4. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice AboutUS Intel SSG OTC DAT ● Active open source contributions: About 30 committers from our team with significant contributions to a lot of big data key projects including Spark, Hadoop, HBase, Hive... ● A global engineering team (China, US, India) - Major team is based in Shanghai ● Mission: To fully optimize analytics solutions on IA and make Intel the best and easiest platform for big data and analytics customers Edge Analytics APPS Analytics PIPELINE Data Platform Infrastructure Machine/DeepLearning PerformanceandSecurity + + Data Center 4
  • 5. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Agenda • Introduction to Big Data • QAT Acceleration Opportunity in Big Data • High Level Architecture for QATCodec in Big Data • Performance Evaluation • Future Work for QATCodec 5
  • 6. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 6IMG Source: https://viblo.asia/p/why-we-need-modern-big-data-integration-platform-gAm5yx1Dldb WhatisBIGDATA
  • 7. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice BIGDATASOLUTIONSTACKATINTEL Hadoop MapReduceSpark Hadoop HDFS Intel Architecture Storage Compute Framework Applications Customers/ Partners Solution for Customers/Patterns Storm/Flink Batch SQL, Spark SQL, Hive, …etc Streaming Kafka, Spark Streaming, Flink, Storm, …etc Machine Learning/ Deep Learning Spark ML/MLlib, BigDL Bare Metal Yarn KubernetesResource Manager Object Store Ceph, GlusterFS Intel® Optane Persistent Memory, SSD, ..etc Intel® Xeon® Purley, Grantley, …etc Intel® Technologies QAT, RSD, RDMA, AVX512, MKL, …etc Database MySQL, HBase, Cassandra …etc TensorFlow/Caffe Mesos Cloud VM, Container, …etc Intel® FPGA Arria, Stratix, …etc 7
  • 8. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 8source[IDC and Seagate]: https://www.seagate.com/files/www-content/our-story/trends/files/Seagate- WP-DataAge2025-March-2017.pdf WHYQATisimportanttoBIGData • Compression Needs • Reduce data volume and save storage space. • Speed up the disk I/O operations and data transfer across network, optimize workload performance. • Trade-off • Computation overhead for high compression ratio codecs.
  • 9. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice DataCompressionPipelineinBIGDATAFRAMEWOEK 9 Map Map Input A HDFS file Map reduce Output A HDFS file reduce reduce Intermediate Data Each Map’s output Shuffle (Multiple iterations) Partition 0 Partition 1 Partition 0 Partition 2 Partition 1 Partition 0 Partition 2 Partition 0 Partition 2 Partition 1 Output 0 Output 1 Output 2 Input split0 Input split1 Input split2 Input Decompression Shuffle Compression Output Compression Shuffle Decompression
  • 10. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice DataCompressionPipelineinBIGDAtA-I/OCharacteristics • HDFS Storage • Generally sequence read/write • Generally one time read/write 10 Shuffle OperationsHDFS Storage Input Read(Data Decompression) Shuffle Write(Data Compression) Output Write(Data Compression) Shuffle Read(Data Decompression) • Shuffle Operations • Random read/write • Multiple times read/write
  • 11. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 11 QATAccelerationOpportunity QAT Compress QAT Compress QAT Compress QAT Decompress QAT Decompress Compression/Decompression in Sort Workload Write to HDFS Read File Sort Shuffle Write Merge Write to HDFS Shuffle Read Map ReduceData Gen
  • 12. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 12 Java AboutQATCodec QAT Codec project provides compression and decompression library for Apache Hadoop/Apache Hive/Apache Spark using the Intel® QuickAssist Technology. Its advantages: ● Higher performance with better compression ratio (against Snappy) ● Open ○ Great support in major big data components (Spark, Hadoop, Hive, Parquet) ○ It’s an open source project and available at https://github.com/intel-hadoop/IntelQATCodec ○ Apply Intel QAT Open Lab https://www.sdnlab.com/intel_network_platform Intel® QuickAssist Hardware Intel® QuickAssist Linux Kernel Driver QATzip (User Space API) QAT Codec (Hadoop& Spark Codec Impl for QAT) Sequence File Input/Output Intermediate data Sequence File Input/Output Intermediate Data Input/Output Intermediate Data Native 12
  • 13. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice BenchmarkConfiguration Hardware Number of worker nodes 3 CPU Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz Number of vcores per node Limit to 64 Memory 384 GB Disk 1* NVMe SSD 2 TB QAT 3* Lewisburg Network 10GbE Software QAT Version 1.1.0 Build 0010 (Early Access with CnV) CDH 5.14.0 13
  • 14. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 14 PerformanceEvaluation–SparkSort Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. Tests performed by Intel® company. Configurations: see slides 13
  • 15. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 15 PerformanceEvaluation–MapReduce Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. Tests performed by Intel® company. Configurations: see slides 13
  • 16. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 16 PerformanceEvaluation–HiveonMapReduce Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks. Tests performed by Intel® company. Configurations: see slides 13
  • 17. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice QATVS.Snappy–CompressionRatio(1TBdatascale) 0.0000% 20.0000% 40.0000% 60.0000% 80.0000% 100.0000% 120.0000% Q/S*100%
  • 18. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Impactanalysisonqueries-CompressionRatio• Impacts - Map Join, Parallelism & ORC records per strip • Map Join conversion - hive.auto.convert.join.noconditionaltask.size - Joined table total size - GC overhead issue • Parallelism - Input split size: mapreduce.input.fileinputformat.split.maxsize - File block size: smallest data unit for map task (e.g. orc.strip.size) - Determine Map task number 64M 64M 64M 64M Result Map Tasks: Task Task TaskTask Map Tasks: Orc File size: 256M Split size is set as 64M 64M 64M 52M Task Task Task Map Tasks: Orc File size: 180M Records Number: 100K QATSNAPPY
  • 19. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Queryanalysis Map join conversion Map join GC overhead issue System Metrics
  • 20. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Q22(MapJoinconversion) QAT SNAPPY table 1.5M 2.3M date_dim 5.6G 7.8G inventory 77.2M 99.2M item 3.4K 3.8K warehouse hive.auto.convert.join.noconditionaltask.size=100000000; (95MB) Table size summary: 0 100 200 300 400 500 Snappy - None Map Join QAT - None Map Join QAT - Map Join Execution time summary Execution Time SnappyQAT
  • 21. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Queryanalysis Map join conversion Map join GC overhead issue System Metrics
  • 22. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Q12(GCissuecausedbyMapJoin) Common join Common join Common join [148s] web_clickstreams [96.4GB] Item [99.2MB] store_sales [49.5GB] Item [99.2MB] Snappy Map Join Map Join Map Join [1252s] web_clickstreams [80.8GB] Item [77.2MB] store_sales [43.9GB] Item [77.2MB] QAT Heap size of Map is 3GB. Full GC becoming very frequent
  • 23. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Q12(GCissuecausedbyMapJoin)–cont’d 1 2.65 2.02 0.92 0 0.5 1 1.5 2 2.5 3 Snappy QAT-3GB map QAT-boost map heap size QAT-common join Performance comparison between QAT and Snappy Lower is Better Boosting heap size of map tasks Disabling map join
  • 24. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Queryanalysis Map join conversion Map join GC overhead issue System Metrics
  • 25. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Micro-viewQueryComparison-IOwaitCPU Utilization SnappyQAT IO Request • Less IO request • IO wait helps
  • 26. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice Micro-viewQueryComparison–NoIOwaitCPU Utilization SnappyQAT IO Request • Less IO request
  • 27. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 27 Achievements Deal Win for QuickAssist Technology: We already have the first customer for Skylake with QAT and QATCodec, more customers have interesting in it. Cloudera Certification of Intel® QuickAssist Technology Codec & QATZIP: Validation using MapReduce jobs with TeraGen and Terasort workloads https://www.cloudera.com/partners/partners-listing.html?q=intel 17
  • 28. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 28 28 Past Now QATCodec project established: • Begin with Hadoop QAT support • Jan, 2015 Intel® QAT Codec open source • 6 weeks to achieve Intel open source certification • July 2018 QAT for Purley Launch • Report out, our final optimization brings benefits for Hive, Spark, Hadoop • Aug 2017 Complete first customer commitments • CDH 5.14.0 version support (rebase & package) • Integration test/performance test with latest QAT 4.1.0 and QATZIP 0.2.3 • May 2018 Refresh performance with new QATzip • Refresh performance for the new QATZip • Nov 2017 QATCodec project with supports for more big data components • With supports of Hive, Spark • Jan, 2017 QATCodec project established QATCodec project with supports for more big data components QAT for Purley Launch Refresh performance with new QATzip Complete First Customer commitments Intel® QAT Codec open source Cloudera Certifications TheJourneyOfQATCodec
  • 29. Copyright © 2017, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice 29 FutureWork Continuing engineering collaborations ● More features support: hugepage optimization support ● More big data components supports: Apache CarbonData (Huawei) ● Continuing supports for coming CDH version: CDH 6.0 supports More customers ● Seek for more deal wins like our first custormer case to bring the hardware accelerations to more big data users. 19