SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Stanley Seibert
Director of Community Innovation
Continuum Analytics
Python Scalability Story
In Production Environments
Sergey Maidanov
Software Engineering Manager for
Intel® Distribution for Python*
Motivation: Why Python
Pythonis #1programming
language in hiring demand
followed by Java and C++.
And the demand is growing
What Problems We Solve: Scalable Performance
Make Python usable beyond prototyping environment by
scaling out to HPC and Big Data environments
What Problems We Solve: Out-Of-The-Box Usability
“Any articles I found on your site
that related to actually using the
MKL for compiling something were
overly technical. I couldn't figure
out what the heck some of the
things were doing or talking
about.“ – Intel® Parallel Studio 2015 Beta Survey Response
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/280832
https://software.intel.com/en-us/articles/building-numpyscipy-with-intel-mkl-and-intel-fortran-on-windows
https://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl
INTEL®DISTRIBUTIONFORPYTHON*2017
Advancing Python performance closer to native speeds
•  Prebuilt, optimized for numerical computing, data analytics, HPC
•  Drop in replacement for existing Python. No code changes required
Easy, out-of-the-box
access to high
performance Python
•  Accelerated NumPy/SciPy/Scikit-Learn with Intel® MKL
•  Data analytics with pyDAAL, enhanced thread scheduling with TBB,
Jupyter* Notebook interface, Numba, Cython
•  Scale easily with optimized MPI4Py and Jupyter notebooks
Performance with multiple
optimization techniques
•  Distribution and individual optimized packages available through
conda and Anaconda Cloud: anaconda.org/intel
•  Optimizations upstreamed back to main Python trunk
Faster access to latest
optimizations for Intel
architecture
Intel® Xeon® Processors Intel® Xeon Phi™ Product Family
Configuration Info: apt/atlas: installed with apt-get, Ubuntu 16.10, python 3.5.2, numpy 1.11.0, scipy 0.17.0; pip/openblas: installed with pip, Ubuntu 16.10, python 3.5.2, numpy 1.11.1, scipy 0.18.0; Intel Python: Intel Distribution for Python
2017;. Hardware: Xeon: Intel Xeon CPU E5-2698 v3 @ 2.30 GHz (2 sockets, 16 cores each, HT=off), 64 GB of RAM, 8 DIMMS of 8GB@2133MHz; Xeon Phi: Intel Intel® Xeon Phi™ CPU 7210 1.30 GHz, 96 GB of RAM, 6 DIMMS of 16GB@1200MHz
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components,
software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the
performance of that product when combined with other products. * Other brands and names are the property of their respective owners. Benchmark Source: Intel Corporation
Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction
sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for
use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific
instruction sets covered by this notice. Notice revision #20110804 .
Why Yet Another Python Distribution?
Mature AVX2 instructions based product New AVX512 instructions based product
Scaling To HPC/Big Data Production Environment
•  Hardware and software efficiency crucial in production (Perf/Watt, etc.)
•  Efficiency = Parallelism
•  Instruction Level Parallelism with effective memory access patterns
•  SIMD
•  Multi-threading
•  Multi-node
* Roofline Performance Model https://crd.lbl.gov/departments/computer-science/PAR/research/roofline/
Roofline Performance Model*
Arithmetic Intensity
SpMVBLAS1
Stencils
FFT
BLAS3 Particle
Methods
Low High
Gflop/s
Peak Gflop/s
Efficiency = Parallelism in Python
•  CPython as interpreter inhibits parallelism but…
•  … Overall Python tools evolved far toward unlocking parallelism
Native extensions
numpy*, scipy*, scikit-
learn* accelerated
with Intel® MKL, Intel®
DAAL, Intel® IPP
Composable multi-
threading with
Intel® TBB and
Dask*
Multi-node
parallelism with
mpi4py*
accelerated with
Intel® MPI
Language
extensions for
vectorization &
multi-threading
(Cython*, Numba*)
Integration with Big
Data platforms and
Machine Learning
frameworks (pySpark*,
Theano*, TensorFlow*,
etc.)
Mixed language
profiling with Intel®
VTune™ Amplifier
Numpy* & Scipy* optimizations with Intel® MKL
Linear Algebra
•  BLAS
•  LAPACK
•  ScaLAPACK
•  Sparse BLAS
•  Sparse Solvers
•  Iterative
•  PARDISO* SMP & Cluster
Fast Fourier
Transforms
•  1D and multidimensional FFT
Vector Math
•  Trigonometric
•  Hyperbolic
•  Exponential
•  Log
•  Power
•  Root
Vector RNGs
•  Multiple BRNG
•  Support methods for 

independent streams

creation
•  Support all key probability
distributions
Summary Statistics
•  Kurtosis
•  Variation coefficient
•  Order statistics
•  Min/max
•  Variance-covariance
And More
•  Splines
•  Interpolation
•  Trust Region
•  Fast Poisson Solver
Functional domain in this color accelerate respective NumPy, SciPy, etc. domain
Up to
100x
faster!
Up to
10x
faster!
Up to
10x
faster!
Up to
60x
faster!
Configuration Info: apt/atlas: installed with apt-get, Ubuntu 16.10, python 3.5.2, numpy 1.11.0, scipy 0.17.0; pip/openblas: installed with pip, Ubuntu 16.10, python 3.5.2, numpy
1.11.1, scipy 0.18.0; Intel Python: Intel Distribution for Python 2017;. Hardware: Xeon: Intel Xeon CPU E5-2698 v3 @ 2.30 GHz (2 sockets, 16 cores each, HT=off), 64 GB of RAM, 8
DIMMS of 8GB@2133MHz; Xeon Phi: Intel Intel® Xeon Phi™ CPU 7210 1.30 GHz, 96 GB of RAM, 6 DIMMS of 16GB@1200MHz
Scikit-Learn* optimizations with Intel® MKL
0x
1x
2x
3x
4x
5x
6x
7x
8x
9x
Approximate
neighbors
Fast K-means GLM GLM net LASSO Lasso path Least angle
regression,
OpenMP
Non-negative
matrix
factorization
Regression by
SGD
Sampling
without
replacement
SVD
Speedups of Scikit-Learn Benchmarks
Intel® Distribution for Python* 2017 Update 1 vs. system Python & NumPy/Scikit-Learn
System info: 32x Intel® Xeon® CPU E5-2698 v3 @ 2.30GHz, disabled HT, 64GB RAM; Intel® Distribution for Python* 2017 Gold; Intel® MKL 2017.0.0; Ubuntu 14.04.4 LTS; Numpy 1.11.1; scikit-learn 0.17.1. See Optimization Notice.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that
product when combined with other products. * Other brands and names are the property of their respective owners. Benchmark Source: Intel Corporation
Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other
optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel
microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by
this notice. Notice revision #20110804 .
Effect of Intel MKL
optimizations for
NumPy* and SciPy*
1 1.11
54.13
0x
10x
20x
30x
40x
50x
60x
System Sklearn Intel SKlearn Intel PyDAAL
Speedup
Potential Speedup of Scikit-learn*
due to PyDAAL
PCA, 1M Samples, 200 Features
Effect of DAAL
optimizations for
Scikit-Learn*
Intel® Distribution for Python* ships Intel®
Data Analytics Acceleration Library with
Python interfaces, a.k.a. pyDAAL
Distributed parallelism
Intel® MPI library accelerates Intel® Distribution
for Python* (Mpi4py*, Ipyparallel*)
Intel Distribution for Python* also supports
▪  PySpark* - Python interfaces for Spark*, a fast and general
engine for large-scale data processing.
▪  Dask* - a flexible parallel computing library for analytic
computing.
Mpi4py* performance vs. native Intel® MPI
1.7x 2.2x 3.0x 5.3x
0x
1x
2x
3x
4x
5x
6x
2 nodes 4 nodes 8 nodes 16 nodes
PyDAAL Implicit ALS with
Mpi4Py*
Configuration Info: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz, 2x18 cores, HT is ON, RAM 128GB; Versions: Oracle Linux Server 6.6, Intel®
DAAL 2017 Gold, Intel® MPI 5.1.3; Interconnect: 1 GB Ethernet
Software and workloads used in performance tests may have been optimized for performance only on Intel
microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,
components, software, operations and functions. Any change to any of those factors may cause the results to vary. You
should consult other information and performance tests to assist you in fully evaluating your contemplated purchases,
including the performance of that product when combined with other products. * Other brands and names are the
property of their respective owners. Benchmark Source: Intel Corporation
Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for
optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction
sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on
microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use
with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the
specific instruction sets covered by this notice. Notice revision #20110804 .
Composable multi-threading with Intel® TBB
•  Amhdal’s law suggests extracting parallelism at all levels
•  Software components are built from smaller ones
•  If each component is threaded there can be too much!
•  Intel TBB dynamically balances thread loads and effectively manages oversubscription
MKL
TBB
DAAL
pyDAAL
NumPy
SciPy TBB
Joblib
Dask
Application
PythonpackagesNative
libs
12
Application
Component 1
Component N
Subcomponent 1
Subcomponent 2
Subcomponent K
Subcomponent 1
Subcomponent M
Subcomponent
1
Subcomponent
1
Subcomponent
1
Subcomponent
1
Subcomponent
1
Subcomponent
1
Subcomponent
1
Subcomponent
1
>python –m TBB myapp.py
Composable Parallelism: QR Performance
Numpy
1.00x
Numpy
0.22x
Numpy
0.47xDask
0.61x
Dask
0.89x
Dask
1.46x
0.0x
0.2x
0.4x
0.6x
0.8x
1.0x
1.2x
1.4x
Default MKL Serial MKL Intel® TBB
Speedup relative to Default Numpy*
Intel® MKL,
OpenMP* threading
Intel® MKL,
Serial
Intel® MKL,
Intel® TBB threading
Over-
subscription
App-level
parallelism
only
TBB-
composable
nested
parallelism
System info: 32x Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz, disabled HT, 64GB RAM; Intel(R) MKL 2017.0 Beta Update 1 Intel(R) 64 architecture,
Intel(R) AVX2; Intel(R)TBB 4.4.4; Ubuntu 14.04.4 LTS; Dask 0.10.0; Numpy 1.11.0. Software and workloads used in performance tests may have been
optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer
systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other
information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when
combined with other products. * Other brands and names are the property of their respective owners. Benchmark Source: Intel Corporation
Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to
Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the
availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in
this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel
microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered
by this notice. Notice revision #20110804 .
15
Feature cProfile Line_profiler Intel® VTune™ Amplifier
Profiling technology Event Instrumentation Sampling, hardware events
Analysis granularity Function-level Line-level Line-level, call stack, time windows,
hardware events
Intrusiveness Medium (1.3-5x) High (4-10x) Low (1.05-1.3x)
Mixed language programs Python Python Python, Cython, C++, Fortran
Right tool for high performance application profiling at all levels
•  Function-level and line-level hotspot analysis, down to disassembly
•  Call stack analysis
•  Low overhead
•  Mixed-language, multi-threaded application analysis
•  Advanced hardware event analysis for native codes (Cython, C++, Fortran) for cache misses,
branch misprediction, etc.
Profiling Python* code with Intel® VTune™ Amplifier
Stanley Seibert
Director of Community Innovation
Continuum Analytics
November 2016
Scaling Python with JIT Compilation
17
Creating a Compiler For Python
Many valid approaches, but we think these are the most important for data science:
▪  Cannot replace the standard interpreter
–  Must be able to continue to use pandas, SciPy, scikit-learn, etc
▪  Minimize boilerplate
–  Traditional compiled Python extensions require a lot of infrastructure. Try to stay simple
and get out of the way.
▪  Be flexible about execution model
–  Not all hardware is a general purpose CPU
▪  Integrate well with Python’s adaptable ecosystem
–  Must be able to continue to use pandas, SciPy, scikit-learn, etc
18
Numba: A JIT Compiler for Python Functions
▪  An open-source, function-at-a-time compiler library for Python
▪  Compiler toolbox for different targets and execution models:
–  single-threaded CPU, multi-threaded CPU, GPU
–  regular functions, “universal functions” (array functions), GPU kernels
▪  Speedup: 2x (compared to basic NumPy code) to 200x (compared to pure
Python)
▪  Combine ease of writing Python with speeds approaching FORTRAN
▪  Empowers data scientists who make tools for themselves and other data
scientists
19
How does Numba work?
Python Function
(bytecode)
Bytecode
Analysis
Functions
Arguments
Numba IR
Machine
Code
Execut
e!
Type
Inference
LLVM/NVVM JIT LLVM IR
Lowering
Rewrite IR
Cache
@jit
def do_math(a, b):
…
>>> do_math(x, y)
20
Supported Platforms and Hardware
OS HW SW
Windows (7 and later) 32 and 64-bit x86 CPUs Python 2 and 3
OS X (10.9 and later) CUDA & HSA Capable GPUs NumPy 1.7 through 1.11
Linux (RHEL 5 and later)
Experimental support for
ARM, Xeon Phi, AMD Fiji
GPUs
21
Basic Example
22
Basic Example
Array Allocation
Looping over ndarray x as an iterator
Using numpy math functions
Returning a slice of the array
2.7x speedup!
Numba decorator
(nopython=True not required)
23
Releasing the GIL
0.
0.9
1.8
2.6
3.5
1 2 4SpeedupRatio
Number of Threads
Option to release the GIL
Using Python
concurrent.futures
24
Universal Functions (Ufuncs)
Ufuncs are a core concept in NumPy for array-oriented computing.
▪  A function with scalar inputs is broadcast across the elements of the input
arrays:
–  np.add([1,2,3], 3) == [4, 5, 6]
–  np.add([1,2,3], [10, 20, 30]) == [11, 22, 33]
▪  Parallelism is present, by construction. Numba will generate loops and can
automatically multi-thread if requested.
▪  Before Numba, creating fast ufuncs required writing C. No longer!
25
Universal Functions (Ufuncs)
Different decorator!
1.8x speedup!
26
Multi-threaded Ufuncs
Specify type signature
Select parallel target
Automatically uses all CPU cores!
27
Distributed Computing
Example: Dask
Dask Client
(Haswell)
Dask Scheduler
Dask Worker
(Skylake)
Dask Worker
(Skylake)
Dask Worker
(Knight’s Landing)
@jit
def f(x):
…
-  Serialize with pickle module
-  Works with Dask and Spark (and others)
-  Automatic recompilation for each target
f(x)
f(x)
f(x)
28
Other Numba Features
▪  Detects CPU model during code generation and instructs LLVM to optimize for that
architecture.
▪  Automatic dispatch to multiple type-specialized implementations of the same
function
▪  Uses LLVM autovectorization optimization passes for SIMD code generation
▪  Supports calls directly to C with CFFI and ctypes
▪  Optional caching of compiled functions to disk
▪  Ahead of time compilation to shared libraries
▪  Extension API allowing 3rd parties to extend the compiler with new data types and
functions.
29
Conclusion
▪  Numba - Create new high performance functions on-the-fly with pure Python
▪  Understands NumPy arrays and many NumPy operations
▪  Supplies several compilation modes and options for multi-threading
▪  Use with your favorite distributed computing framework
▪  For more information: http://numba.pydata.org
▪  Comes with Anaconda: https://www.continuum.io/downloads
Call To Action
•  Start with either Intel’s or Continuum’s distribution
•  Both have Intel performance goodness baked in!
•  You cannot go wrong either way!
•  Give Numba* a try and see performance increase
•  Try Python* performance profiling with Intel® VTune™ Amplifier!
•  Intel Distribution for Python is free!
https://software.intel.com/en-us/intel-distribution-for-python
–  Commercial support included for Intel® Parallel Studio XE customers!
–  Easy to install with Anaconda* https://anaconda.org/intel/
Intel is working with community leaders like Continuum Analytics to bring the
BEST performance on IA to Python developers
Thank you for your time
Stan Seibert
stan.seibert@continuum.io
www.intel.com/hpcdevcon
Sergey Maidanov
sergey.Maidanov@intel.com
www.intel.com/hpcdevcon
Intel® Distribution for Python*
Powered by Anaconda*
Legal Disclaimer & Optimization Notice
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO
ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND
INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR
WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT,
COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors.
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software,
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information
and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product
when combined with other products.
For more complete information about compiler optimizations, see our Optimization Notice at https://software.intel.com/en-us/
articles/optimization-notice#opt-en.
Copyright © 2016, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/
or other countries. *Other names and brands may be claimed as the property of others.

Mais conteúdo relacionado

Mais procurados

OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
5 p9 pnor and open bmc overview - final
5 p9 pnor and open bmc overview - final5 p9 pnor and open bmc overview - final
5 p9 pnor and open bmc overview - finalYutaka Kawai
 
BXI: Bull eXascale Interconnect
BXI: Bull eXascale InterconnectBXI: Bull eXascale Interconnect
BXI: Bull eXascale Interconnectinside-BigData.com
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveNetronome
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_finalYutaka Kawai
 
Xilinx vs Intel (Altera) FPGA performance comparison
Xilinx vs Intel (Altera) FPGA performance comparison Xilinx vs Intel (Altera) FPGA performance comparison
Xilinx vs Intel (Altera) FPGA performance comparison Roy Messinger
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
The Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorThe Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorDeepak Tomar
 
Scalability for All: Unreal Engine* 4 with Intel
Scalability for All: Unreal Engine* 4 with Intel Scalability for All: Unreal Engine* 4 with Intel
Scalability for All: Unreal Engine* 4 with Intel Intel® Software
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLinside-BigData.com
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture explorationDeepak Shankar
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator Ganesan Narayanasamy
 
Parallelizing Conqueror's Blade
Parallelizing Conqueror's BladeParallelizing Conqueror's Blade
Parallelizing Conqueror's BladeIntel® Software
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishBruno Cornec
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingMichelle Holley
 
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...Intel® Software
 

Mais procurados (20)

OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
5 p9 pnor and open bmc overview - final
5 p9 pnor and open bmc overview - final5 p9 pnor and open bmc overview - final
5 p9 pnor and open bmc overview - final
 
BXI: Bull eXascale Interconnect
BXI: Bull eXascale InterconnectBXI: Bull eXascale Interconnect
BXI: Bull eXascale Interconnect
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 
6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final6 open capi_meetup_in_japan_final
6 open capi_meetup_in_japan_final
 
Xilinx vs Intel (Altera) FPGA performance comparison
Xilinx vs Intel (Altera) FPGA performance comparison Xilinx vs Intel (Altera) FPGA performance comparison
Xilinx vs Intel (Altera) FPGA performance comparison
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
The Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorThe Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft Processor
 
ARM HPC Ecosystem
ARM HPC EcosystemARM HPC Ecosystem
ARM HPC Ecosystem
 
Scalability for All: Unreal Engine* 4 with Intel
Scalability for All: Unreal Engine* 4 with Intel Scalability for All: Unreal Engine* 4 with Intel
Scalability for All: Unreal Engine* 4 with Intel
 
DOME 64-bit μDataCenter
DOME 64-bit μDataCenterDOME 64-bit μDataCenter
DOME 64-bit μDataCenter
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Introduction to architecture exploration
Introduction to architecture explorationIntroduction to architecture exploration
Introduction to architecture exploration
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
 
Parallelizing Conqueror's Blade
Parallelizing Conqueror's BladeParallelizing Conqueror's Blade
Parallelizing Conqueror's Blade
 
Intel dpdk Tutorial
Intel dpdk TutorialIntel dpdk Tutorial
Intel dpdk Tutorial
 
IPMI is dead, Long live Redfish
IPMI is dead, Long live RedfishIPMI is dead, Long live Redfish
IPMI is dead, Long live Redfish
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
 
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
World of Tanks* 1.0+: Enriching Gamers Experience with Multicore Optimized Ph...
 

Semelhante a Python* Scalability in Production Environments

Scaling python to_hpc_big_data-maidanov
Scaling python to_hpc_big_data-maidanovScaling python to_hpc_big_data-maidanov
Scaling python to_hpc_big_data-maidanovDenis Nagorny
 
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael GreeneUnleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael GreeneDatabricks
 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Intel® Software
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...tdc-globalcode
 
Accelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the EdgeAccelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the EdgeIntel® Software
 
Accelerating AI Adoption with Partners
Accelerating AI Adoption with PartnersAccelerating AI Adoption with Partners
Accelerating AI Adoption with PartnersSri Ambati
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciIntel® Software
 
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Pedal to the Metal: Accelerating Spark with Silicon InnovationPedal to the Metal: Accelerating Spark with Silicon Innovation
Pedal to the Metal: Accelerating Spark with Silicon InnovationJen Aman
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaIntel® Software
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and futureboxu42
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018AWS User Group Bengaluru
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryDatabricks
 
Develop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster ReadyDevelop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster ReadyIntel IT Center
 
Intel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewIntel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewDESMOND YUEN
 
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning AcceleratorDeep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning Acceleratorinside-BigData.com
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephCeph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephDanielle Womboldt
 
Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Community
 
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive... Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...Databricks
 
Intel Powered AI Applications for Telco
Intel Powered AI Applications for TelcoIntel Powered AI Applications for Telco
Intel Powered AI Applications for TelcoMichelle Holley
 

Semelhante a Python* Scalability in Production Environments (20)

Scaling python to_hpc_big_data-maidanov
Scaling python to_hpc_big_data-maidanovScaling python to_hpc_big_data-maidanov
Scaling python to_hpc_big_data-maidanov
 
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael GreeneUnleashing Data Intelligence with Intel and Apache Spark with Michael Greene
Unleashing Data Intelligence with Intel and Apache Spark with Michael Greene
 
Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture Accelerate Machine Learning Software on Intel Architecture
Accelerate Machine Learning Software on Intel Architecture
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
 
Accelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the EdgeAccelerating AI from the Cloud to the Edge
Accelerating AI from the Cloud to the Edge
 
Accelerating AI Adoption with Partners
Accelerating AI Adoption with PartnersAccelerating AI Adoption with Partners
Accelerating AI Adoption with Partners
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSciStreamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
 
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Pedal to the Metal: Accelerating Spark with Silicon InnovationPedal to the Metal: Accelerating Spark with Silicon Innovation
Pedal to the Metal: Accelerating Spark with Silicon Innovation
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and AnacondaPython Data Science and Machine Learning at Scale with Intel and Anaconda
Python Data Science and Machine Learning at Scale with Intel and Anaconda
 
QATCodec: past, present and future
QATCodec: past, present and futureQATCodec: past, present and future
QATCodec: past, present and future
 
Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018Ready access to high performance Python with Intel Distribution for Python 2018
Ready access to high performance Python with Intel Distribution for Python 2018
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
 
Develop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster ReadyDevelop, Deploy, and Innovate with Intel® Cluster Ready
Develop, Deploy, and Innovate with Intel® Cluster Ready
 
Intel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overviewIntel xeon-scalable-processors-overview
Intel xeon-scalable-processors-overview
 
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning AcceleratorDeep Learning Training at Scale: Spring Crest Deep Learning Accelerator
Deep Learning Training at Scale: Spring Crest Deep Learning Accelerator
 
Ceph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and CephCeph Day Beijing - Storage Modernization with Intel and Ceph
Ceph Day Beijing - Storage Modernization with Intel and Ceph
 
Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph Ceph Day Beijing - Storage Modernization with Intel & Ceph
Ceph Day Beijing - Storage Modernization with Intel & Ceph
 
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive... Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
 
Intel Powered AI Applications for Telco
Intel Powered AI Applications for TelcoIntel Powered AI Applications for Telco
Intel Powered AI Applications for Telco
 

Mais de Intel® Software

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology Intel® Software
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.Intel® Software
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Intel® Software
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Intel® Software
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Intel® Software
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchIntel® Software
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel® Software
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019Intel® Software
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019Intel® Software
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Intel® Software
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Intel® Software
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Intel® Software
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...Intel® Software
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesIntel® Software
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision SlidesIntel® Software
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Intel® Software
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Software
 
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...Intel® Software
 

Mais de Intel® Software (20)

AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology AI for All: Biology is eating the world & AI is eating Biology
AI for All: Biology is eating the world & AI is eating Biology
 
AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.AI for good: Scaling AI in science, healthcare, and more.
AI for good: Scaling AI in science, healthcare, and more.
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
 
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| S...
 
AWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI ResearchAWS & Intel Webinar Series - Accelerating AI Research
AWS & Intel Webinar Series - Accelerating AI Research
 
Intel Developer Program
Intel Developer ProgramIntel Developer Program
Intel Developer Program
 
Intel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview SlidesIntel AIDC Houston Summit - Overview Slides
Intel AIDC Houston Summit - Overview Slides
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
AIDC India - AI on IA
AIDC India  - AI on IAAIDC India  - AI on IA
AIDC India - AI on IA
 
AIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino SlidesAIDC India - Intel Movidius / Open Vino Slides
AIDC India - Intel Movidius / Open Vino Slides
 
AIDC India - AI Vision Slides
AIDC India - AI Vision SlidesAIDC India - AI Vision Slides
AIDC India - AI Vision Slides
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
 
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsZilliz
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMsFact vs. Fiction: Autodetecting Hallucinations in LLMs
Fact vs. Fiction: Autodetecting Hallucinations in LLMs
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 

Python* Scalability in Production Environments

  • 1.
  • 2. Stanley Seibert Director of Community Innovation Continuum Analytics Python Scalability Story In Production Environments Sergey Maidanov Software Engineering Manager for Intel® Distribution for Python*
  • 3. Motivation: Why Python Pythonis #1programming language in hiring demand followed by Java and C++. And the demand is growing
  • 4. What Problems We Solve: Scalable Performance Make Python usable beyond prototyping environment by scaling out to HPC and Big Data environments
  • 5. What Problems We Solve: Out-Of-The-Box Usability “Any articles I found on your site that related to actually using the MKL for compiling something were overly technical. I couldn't figure out what the heck some of the things were doing or talking about.“ – Intel® Parallel Studio 2015 Beta Survey Response https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/280832 https://software.intel.com/en-us/articles/building-numpyscipy-with-intel-mkl-and-intel-fortran-on-windows https://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl
  • 6. INTEL®DISTRIBUTIONFORPYTHON*2017 Advancing Python performance closer to native speeds •  Prebuilt, optimized for numerical computing, data analytics, HPC •  Drop in replacement for existing Python. No code changes required Easy, out-of-the-box access to high performance Python •  Accelerated NumPy/SciPy/Scikit-Learn with Intel® MKL •  Data analytics with pyDAAL, enhanced thread scheduling with TBB, Jupyter* Notebook interface, Numba, Cython •  Scale easily with optimized MPI4Py and Jupyter notebooks Performance with multiple optimization techniques •  Distribution and individual optimized packages available through conda and Anaconda Cloud: anaconda.org/intel •  Optimizations upstreamed back to main Python trunk Faster access to latest optimizations for Intel architecture
  • 7. Intel® Xeon® Processors Intel® Xeon Phi™ Product Family Configuration Info: apt/atlas: installed with apt-get, Ubuntu 16.10, python 3.5.2, numpy 1.11.0, scipy 0.17.0; pip/openblas: installed with pip, Ubuntu 16.10, python 3.5.2, numpy 1.11.1, scipy 0.18.0; Intel Python: Intel Distribution for Python 2017;. Hardware: Xeon: Intel Xeon CPU E5-2698 v3 @ 2.30 GHz (2 sockets, 16 cores each, HT=off), 64 GB of RAM, 8 DIMMS of 8GB@2133MHz; Xeon Phi: Intel Intel® Xeon Phi™ CPU 7210 1.30 GHz, 96 GB of RAM, 6 DIMMS of 16GB@1200MHz Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. * Other brands and names are the property of their respective owners. Benchmark Source: Intel Corporation Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 . Why Yet Another Python Distribution? Mature AVX2 instructions based product New AVX512 instructions based product
  • 8. Scaling To HPC/Big Data Production Environment •  Hardware and software efficiency crucial in production (Perf/Watt, etc.) •  Efficiency = Parallelism •  Instruction Level Parallelism with effective memory access patterns •  SIMD •  Multi-threading •  Multi-node * Roofline Performance Model https://crd.lbl.gov/departments/computer-science/PAR/research/roofline/ Roofline Performance Model* Arithmetic Intensity SpMVBLAS1 Stencils FFT BLAS3 Particle Methods Low High Gflop/s Peak Gflop/s
  • 9. Efficiency = Parallelism in Python •  CPython as interpreter inhibits parallelism but… •  … Overall Python tools evolved far toward unlocking parallelism Native extensions numpy*, scipy*, scikit- learn* accelerated with Intel® MKL, Intel® DAAL, Intel® IPP Composable multi- threading with Intel® TBB and Dask* Multi-node parallelism with mpi4py* accelerated with Intel® MPI Language extensions for vectorization & multi-threading (Cython*, Numba*) Integration with Big Data platforms and Machine Learning frameworks (pySpark*, Theano*, TensorFlow*, etc.) Mixed language profiling with Intel® VTune™ Amplifier
  • 10. Numpy* & Scipy* optimizations with Intel® MKL Linear Algebra •  BLAS •  LAPACK •  ScaLAPACK •  Sparse BLAS •  Sparse Solvers •  Iterative •  PARDISO* SMP & Cluster Fast Fourier Transforms •  1D and multidimensional FFT Vector Math •  Trigonometric •  Hyperbolic •  Exponential •  Log •  Power •  Root Vector RNGs •  Multiple BRNG •  Support methods for 
 independent streams
 creation •  Support all key probability distributions Summary Statistics •  Kurtosis •  Variation coefficient •  Order statistics •  Min/max •  Variance-covariance And More •  Splines •  Interpolation •  Trust Region •  Fast Poisson Solver Functional domain in this color accelerate respective NumPy, SciPy, etc. domain Up to 100x faster! Up to 10x faster! Up to 10x faster! Up to 60x faster! Configuration Info: apt/atlas: installed with apt-get, Ubuntu 16.10, python 3.5.2, numpy 1.11.0, scipy 0.17.0; pip/openblas: installed with pip, Ubuntu 16.10, python 3.5.2, numpy 1.11.1, scipy 0.18.0; Intel Python: Intel Distribution for Python 2017;. Hardware: Xeon: Intel Xeon CPU E5-2698 v3 @ 2.30 GHz (2 sockets, 16 cores each, HT=off), 64 GB of RAM, 8 DIMMS of 8GB@2133MHz; Xeon Phi: Intel Intel® Xeon Phi™ CPU 7210 1.30 GHz, 96 GB of RAM, 6 DIMMS of 16GB@1200MHz
  • 11. Scikit-Learn* optimizations with Intel® MKL 0x 1x 2x 3x 4x 5x 6x 7x 8x 9x Approximate neighbors Fast K-means GLM GLM net LASSO Lasso path Least angle regression, OpenMP Non-negative matrix factorization Regression by SGD Sampling without replacement SVD Speedups of Scikit-Learn Benchmarks Intel® Distribution for Python* 2017 Update 1 vs. system Python & NumPy/Scikit-Learn System info: 32x Intel® Xeon® CPU E5-2698 v3 @ 2.30GHz, disabled HT, 64GB RAM; Intel® Distribution for Python* 2017 Gold; Intel® MKL 2017.0.0; Ubuntu 14.04.4 LTS; Numpy 1.11.1; scikit-learn 0.17.1. See Optimization Notice. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. * Other brands and names are the property of their respective owners. Benchmark Source: Intel Corporation Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 . Effect of Intel MKL optimizations for NumPy* and SciPy* 1 1.11 54.13 0x 10x 20x 30x 40x 50x 60x System Sklearn Intel SKlearn Intel PyDAAL Speedup Potential Speedup of Scikit-learn* due to PyDAAL PCA, 1M Samples, 200 Features Effect of DAAL optimizations for Scikit-Learn* Intel® Distribution for Python* ships Intel® Data Analytics Acceleration Library with Python interfaces, a.k.a. pyDAAL
  • 12. Distributed parallelism Intel® MPI library accelerates Intel® Distribution for Python* (Mpi4py*, Ipyparallel*) Intel Distribution for Python* also supports ▪  PySpark* - Python interfaces for Spark*, a fast and general engine for large-scale data processing. ▪  Dask* - a flexible parallel computing library for analytic computing. Mpi4py* performance vs. native Intel® MPI 1.7x 2.2x 3.0x 5.3x 0x 1x 2x 3x 4x 5x 6x 2 nodes 4 nodes 8 nodes 16 nodes PyDAAL Implicit ALS with Mpi4Py* Configuration Info: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz, 2x18 cores, HT is ON, RAM 128GB; Versions: Oracle Linux Server 6.6, Intel® DAAL 2017 Gold, Intel® MPI 5.1.3; Interconnect: 1 GB Ethernet Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. * Other brands and names are the property of their respective owners. Benchmark Source: Intel Corporation Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 .
  • 13. Composable multi-threading with Intel® TBB •  Amhdal’s law suggests extracting parallelism at all levels •  Software components are built from smaller ones •  If each component is threaded there can be too much! •  Intel TBB dynamically balances thread loads and effectively manages oversubscription MKL TBB DAAL pyDAAL NumPy SciPy TBB Joblib Dask Application PythonpackagesNative libs 12 Application Component 1 Component N Subcomponent 1 Subcomponent 2 Subcomponent K Subcomponent 1 Subcomponent M Subcomponent 1 Subcomponent 1 Subcomponent 1 Subcomponent 1 Subcomponent 1 Subcomponent 1 Subcomponent 1 Subcomponent 1 >python –m TBB myapp.py
  • 14. Composable Parallelism: QR Performance Numpy 1.00x Numpy 0.22x Numpy 0.47xDask 0.61x Dask 0.89x Dask 1.46x 0.0x 0.2x 0.4x 0.6x 0.8x 1.0x 1.2x 1.4x Default MKL Serial MKL Intel® TBB Speedup relative to Default Numpy* Intel® MKL, OpenMP* threading Intel® MKL, Serial Intel® MKL, Intel® TBB threading Over- subscription App-level parallelism only TBB- composable nested parallelism System info: 32x Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz, disabled HT, 64GB RAM; Intel(R) MKL 2017.0 Beta Update 1 Intel(R) 64 architecture, Intel(R) AVX2; Intel(R)TBB 4.4.4; Ubuntu 14.04.4 LTS; Dask 0.10.0; Numpy 1.11.0. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. * Other brands and names are the property of their respective owners. Benchmark Source: Intel Corporation Optimization Notice: Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804 .
  • 15. 15 Feature cProfile Line_profiler Intel® VTune™ Amplifier Profiling technology Event Instrumentation Sampling, hardware events Analysis granularity Function-level Line-level Line-level, call stack, time windows, hardware events Intrusiveness Medium (1.3-5x) High (4-10x) Low (1.05-1.3x) Mixed language programs Python Python Python, Cython, C++, Fortran Right tool for high performance application profiling at all levels •  Function-level and line-level hotspot analysis, down to disassembly •  Call stack analysis •  Low overhead •  Mixed-language, multi-threaded application analysis •  Advanced hardware event analysis for native codes (Cython, C++, Fortran) for cache misses, branch misprediction, etc. Profiling Python* code with Intel® VTune™ Amplifier
  • 16. Stanley Seibert Director of Community Innovation Continuum Analytics November 2016 Scaling Python with JIT Compilation
  • 17. 17 Creating a Compiler For Python Many valid approaches, but we think these are the most important for data science: ▪  Cannot replace the standard interpreter –  Must be able to continue to use pandas, SciPy, scikit-learn, etc ▪  Minimize boilerplate –  Traditional compiled Python extensions require a lot of infrastructure. Try to stay simple and get out of the way. ▪  Be flexible about execution model –  Not all hardware is a general purpose CPU ▪  Integrate well with Python’s adaptable ecosystem –  Must be able to continue to use pandas, SciPy, scikit-learn, etc
  • 18. 18 Numba: A JIT Compiler for Python Functions ▪  An open-source, function-at-a-time compiler library for Python ▪  Compiler toolbox for different targets and execution models: –  single-threaded CPU, multi-threaded CPU, GPU –  regular functions, “universal functions” (array functions), GPU kernels ▪  Speedup: 2x (compared to basic NumPy code) to 200x (compared to pure Python) ▪  Combine ease of writing Python with speeds approaching FORTRAN ▪  Empowers data scientists who make tools for themselves and other data scientists
  • 19. 19 How does Numba work? Python Function (bytecode) Bytecode Analysis Functions Arguments Numba IR Machine Code Execut e! Type Inference LLVM/NVVM JIT LLVM IR Lowering Rewrite IR Cache @jit def do_math(a, b): … >>> do_math(x, y)
  • 20. 20 Supported Platforms and Hardware OS HW SW Windows (7 and later) 32 and 64-bit x86 CPUs Python 2 and 3 OS X (10.9 and later) CUDA & HSA Capable GPUs NumPy 1.7 through 1.11 Linux (RHEL 5 and later) Experimental support for ARM, Xeon Phi, AMD Fiji GPUs
  • 22. 22 Basic Example Array Allocation Looping over ndarray x as an iterator Using numpy math functions Returning a slice of the array 2.7x speedup! Numba decorator (nopython=True not required)
  • 23. 23 Releasing the GIL 0. 0.9 1.8 2.6 3.5 1 2 4SpeedupRatio Number of Threads Option to release the GIL Using Python concurrent.futures
  • 24. 24 Universal Functions (Ufuncs) Ufuncs are a core concept in NumPy for array-oriented computing. ▪  A function with scalar inputs is broadcast across the elements of the input arrays: –  np.add([1,2,3], 3) == [4, 5, 6] –  np.add([1,2,3], [10, 20, 30]) == [11, 22, 33] ▪  Parallelism is present, by construction. Numba will generate loops and can automatically multi-thread if requested. ▪  Before Numba, creating fast ufuncs required writing C. No longer!
  • 25. 25 Universal Functions (Ufuncs) Different decorator! 1.8x speedup!
  • 26. 26 Multi-threaded Ufuncs Specify type signature Select parallel target Automatically uses all CPU cores!
  • 27. 27 Distributed Computing Example: Dask Dask Client (Haswell) Dask Scheduler Dask Worker (Skylake) Dask Worker (Skylake) Dask Worker (Knight’s Landing) @jit def f(x): … -  Serialize with pickle module -  Works with Dask and Spark (and others) -  Automatic recompilation for each target f(x) f(x) f(x)
  • 28. 28 Other Numba Features ▪  Detects CPU model during code generation and instructs LLVM to optimize for that architecture. ▪  Automatic dispatch to multiple type-specialized implementations of the same function ▪  Uses LLVM autovectorization optimization passes for SIMD code generation ▪  Supports calls directly to C with CFFI and ctypes ▪  Optional caching of compiled functions to disk ▪  Ahead of time compilation to shared libraries ▪  Extension API allowing 3rd parties to extend the compiler with new data types and functions.
  • 29. 29 Conclusion ▪  Numba - Create new high performance functions on-the-fly with pure Python ▪  Understands NumPy arrays and many NumPy operations ▪  Supplies several compilation modes and options for multi-threading ▪  Use with your favorite distributed computing framework ▪  For more information: http://numba.pydata.org ▪  Comes with Anaconda: https://www.continuum.io/downloads
  • 30. Call To Action •  Start with either Intel’s or Continuum’s distribution •  Both have Intel performance goodness baked in! •  You cannot go wrong either way! •  Give Numba* a try and see performance increase •  Try Python* performance profiling with Intel® VTune™ Amplifier! •  Intel Distribution for Python is free! https://software.intel.com/en-us/intel-distribution-for-python –  Commercial support included for Intel® Parallel Studio XE customers! –  Easy to install with Anaconda* https://anaconda.org/intel/ Intel is working with community leaders like Continuum Analytics to bring the BEST performance on IA to Python developers
  • 31. Thank you for your time Stan Seibert stan.seibert@continuum.io www.intel.com/hpcdevcon Sergey Maidanov sergey.Maidanov@intel.com www.intel.com/hpcdevcon Intel® Distribution for Python* Powered by Anaconda*
  • 32. Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information about compiler optimizations, see our Optimization Notice at https://software.intel.com/en-us/ articles/optimization-notice#opt-en. Copyright © 2016, Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/ or other countries. *Other names and brands may be claimed as the property of others.