SlideShare uma empresa Scribd logo
1 de 49
Baixar para ler offline
It´s The Memory,
           Stupid!
                      or:
How I Learned to Stop Worrying about CPU Speed
           and Love Memory Access
                Francesc Alted
               Software Architect

     Big Data Spain 2012, Madrid (Spain)
              November 16, 2012
About Continuum
      Analytics
• Develop new ways on how data is
  stored, computed, and visualized.
• Provide open technologies for Data
  Integration on a massive scale.
• Provide software tools, training, and
  integration/consulting services to
  corporate, government, and educational
  clients worldwide.
Overview

• The Era of ‘Big Data’
• A few words about Python and NumPy
• The Starving CPU problem
• Choosing optimal containers for Big Data
“A wind of streaming data, social data
        and unstructured data is knocking at
      the door, and we're starting to let it in.
           It's a scary place at the moment.”

          -- Unidentified bank IT executive, as
            quoted by “The American Banker”




The Dawn of ‘Big Data’
Challenges

• We have to deal with as much data as
  possible by using limited resources


• So, we must use our computational
  resources optimally to be able to get the
  most out of Big Data
Interactivity and Big
          Data

• Interactivity is crucial for handling data

• Interactivity and performance are crucial
  for handling Big Data
Python and ‘Big Data’
• Python is an interpreted language and hence,
   it offers interactivity
• Myth: “Python is slow, so why on the hell are
   you going to use it for Big Data?”
• Answer: Python has access to an incredibly
   powerful range of libraries that boost its
   performance far beyond your expectations
• ...and during this talk I will prove it!
NumPy: A Standard ‘De
  Facto’ Container





                                            





         
Operating
    with NumPy
• array[2]; array[1,1:5, :]; array[[3,6,10]]
• (array1**3 / array2) - sin(array3)
• numpy.dot(array1, array2): access to
  optimized BLAS (*GEMM) functions
• and much more...
Nothing Is Perfect

• NumPy is just great for many use cases
• However, it also has its own deficiencies:
  •   Follows the Python evaluation order in complex
      expressions like : (a * b) + c

  •   Does not have support for multiprocessors
      (except for BLAS computations)
Numexpr: Dealing with
Complex Expressions
• It comes with a specialized virtual machine
  for evaluating expressions
• It accelerates computations mainly by
  making a more efficient memory usage
• It supports extremely easy to use
  multithreading (active by default)
Exercise (I)
Evaluate the next polynomial:
      0.25x3 + 0.75x2 + 1.5x - 2
in the range [-1, 1] with a step size of 2*10-7,
using both NumPy and numexpr.
Note: use a single processor for numexpr
numexpr.set_num_threads(1)
Exercise (II)
Rewrite the polynomial in this notation:

    ((0.25x + 0.75)x + 1.5)x - 2

and redo the computations.

What happens?
((.25*x + .75)*x - 1.5)*x – 2                         0,301            0,11
x                                                     0,052           0,045
sin(x)**2+cos(x)**2                                   0,715           0,559

                               Time to evaluate polynomial (1 thread)

              1,8
              1,6
              1,4
              1,2
                                                                                      NumPy
               1
   Time (s)




                                                                                      Numexpr
              0,8
              0,6
              0,4
              0,2
               0
                    .25*x**3 + .75*x**2 - 1.5*x – 2   ((.25*x + .75)*x - 1.5)*x – 2



                                    NumPy vs Numexpr (1 thread)

              1,8
Power Expansion
Numexpr expands expression:

0.25x3 + 0.75x2 + 1.5x - 2
to:
0.25x*x*x + 0.75x*x + 1.5x*x - 2

so, no need to use transcendental pow()
Pending question


• Why numexpr continues to be 3x faster
  than NumPy, even when both are executing
  exactly the *same* number of operations?
“Across the industry, today’s chips are largely
    able to execute code faster than we can feed
                them with instructions and data.”

               – Richard Sites, after his article
                    “It’s The Memory, Stupid!”,
          Microprocessor Report, 10(10),1996



The Starving CPU
    Problem
Memory Access Time
 vs CPU Cycle Time
Book in
 2009
The Status of CPU
   Starvation in 2012
• Memory latency is much slower (between
  250x and 500x) than processors.
• Memory bandwidth is improving at a better
  rate than memory latency, but it is also
  slower than processors (between 30x and
  100x).
CPU Caches to the
      Rescue

• CPU cache latency and throughput
  are much better than memory
• However: the faster they run the
  smaller they must be
CPU Cache Evolution
           Up to end 80’s                     90’s and 2000’s                                  2010’s
                 Mechanical disk                      Mechanical disk                         Mechanical disk



                                                                                              Solid state disk
Capacity




                                                                                                                         Speed
                 Main memory                           Main memory                             Main memory



                                                                                               Level 3 cache

                                                        Level 2 cache                          Level 2 cache
                    Central
                   processing                           Level 1 cache                          Level 1 cache
                   unit (CPU)                               CPU                                    CPU
           (a)                                (b)                                     (c)

 Figure 1. Evolution of the hierarchical memory model. (a) The primordial (and simplest) model; (b) the most common current
 implementation, which includes additional cache levels; and (c) a sensible guess at what’s coming over the next decade:
 three levels of cache in the CPU and solid state disks lying between main memory and classical mechanical disks.
When CPU Caches Are
     Effective?
Mainly in a couple of scenarios:
 • Time locality: when the dataset is
   reused
 • Spatial locality: when the dataset is
   accessed sequentially
The Blocking Technique
When accessing disk or memory, get a contiguous block that fits
in CPU cache, operate upon it and reuse it as much as possible.

                         




                  


                         

                                       




                            Use this extensively to leverage
                                      spatial and temporal localities
Time To Answer                         NumPy


              Pending Questions
.25*x**3 + .75*x**2 - 1.5*x – 2
((.25*x + .75)*x - 1.5)*x – 2
x
                                                 NumPy
                                                         1,613
                                                         0,301
                                                         0,052
                                                               Numexpr
                                                                     0,138
                                                                       0,11
                                                                     0,045
sin(x)**2+cos(x)**2                                      0,715       0,559

                               Time to evaluate polynomial (1 thread)

              1,8
              1,6
              1,4
              1,2
                                                                                         NumPy
               1
   Time (s)




                                                                                         Numexpr
              0,8
              0,6
              0,4
              0,2
               0
                    .25*x**3 + .75*x**2 - 1.5*x – 2      ((.25*x + .75)*x - 1.5)*x – 2



                                    NumPy vs Numexpr (1 thread)

              1,8





                                       
                                       




                                                 


                     




       





                                       
                                       




                                                 


                      




       
Beyond numexpr:
    Numba
Numexpr Limitations
• Numexpr only implements element-wise
  operations, i.e. ‘a*b’ is evaluated as:
  for i in range(N):

      c[i] = a[i] * b[i]


• In particular, it cannot deal with things like:
  for i in range(N):

      c[i] = a[i-1] + a[i] * b[i]
Numba: Overcoming
 numexpr Limitations
• Numba is a JIT that can translate a subset
  of the Python language into machine code
• It uses LLVM infrastructure behind the
  scenes
• Can achieve similar or better performance
  than numexpr, but with more flexibility
How Numba Works
Python Function                            Machine Code


                         LLVM-PY

                         LLVM 3.1
      ISPC      OpenCL    OpenMP    CUDA     CLANG

        Intel       AMD        Nvidia      Apple
Numba Example:
     Computing the Polynomial
import numpy as np
import numba as nb

N = 10*1000*1000

x = np.linspace(-1, 1, N)
y = np.empty(N, dtype=np.float64)

@nb.jit(arg_types=[nb.f8[:], nb.f8[:]])
def poly(x, y):
    for i in range(N):
        # y[i] = 0.25*x[i]**3 + 0.75*x[i]**2 + 1.5*x[i] - 2
        y[i] = ((0.25*x[i] + 0.75)*x[i] + 1.5)*x[i] - 2

poly(x, y)   # run through Numba!
Times for Computing the
   Polynomial (In Seconds)
  Poly version     (I)        (II)
    Numpy         1.086      0.505

    numexpr       0.108      0.096

    Numba         0.055      0.054

Pure C, OpenMP    0.215      0.054

• Compilation time for Numba: 0.019 sec
• Run on Mac OSX, Core2 Duo @ 2.13 GHz
Numba: LLVM for
    Python
Python code can reach C
 speed without having to
   program in C itself
  (and without losing interactivity!)
Numba in SC 2012
Numba in SC2012
 Awesome Python!
If a datastore requires all data to fit in
                     memory, it isn't big data

                   -- Alex Gaynor (in twitter)




Optimal Containers for
      Big Data
The Need for a Good
  Data Container
• Too many times we are too focused on
  computing as fast as possible
• But we have seen how important data
  access is
• Hence, having an optimal data structure is
  critical for getting good performance when
  processing very large datasets
Appending Data in
   Large NumPy Objects

 array to be enlarged           final array object
                        Copy!


                                New memory
 new data to append
                                 allocation
• Normally a realloc() syscall will not succeed
• Both memory areas have to exist simultaneously
Contiguous vs Chunked
 NumPy container       Blaze container

                          chunk 1

                          chunk 2
                             .
                             .
                             .
                          chunk N

Contiguous memory   Discontiguous memory
Appending data in Blaze
 array to be enlarged              final array object


                        X
        chunk 1                        chunk 1

       chunk 2                         chunk 2


                        compress
 new data to append                  new chunk

Only a small amount of data has to be compressed
Blosc: (de)compressing
     faster than memcpy()




Transmission + decompression faster than direct transfer?
TABLE 1
                                                  Test Data Sets

   Example of How Blosc Accelerates Genomics I/O:
     #
     1
         Source
         1000 Genomes
                        Identifier
                        ERR000018
                                      Sequencer
                                      Illumina GA
                                                            Read Count
                                                               9,280,498
                                                                           Read Length
                                                                                 36 bp
                                                                                         ID Lengths
                                                                                              40–50
                                                                                                      FASTQ Size
                                                                                                        1,105 MB
     2
     3        SeqPack (backed by Blosc)
         1000 Genomes
         1000 Genomes
                        SRR493233 1
                        SRR497004 1
                                      Illumina HiSeq 2000
                                      AB SOLiD 4
                                                              43,225,060
                                                             122,924,963
                                                                                100 bp
                                                                                 51 bp
                                                                                              51–61
                                                                                              78–91
                                                                                                       10,916 MB
                                                                                                       22,990 MB




 g. 1. In-memory throughputs for several compression schemes applied to increasing block sizes (where each
equence is 256 bytes Howison, M. (in press). High-throughput compression of FASTQ data
            Source:
                     long).
            with SeqDB. IEEE Transactions on Computational Biology and Bioinformatics.


to a memory buffer, timed the compression of block          consistent throughput across both compression and
How Blaze Does Out-
Of-Core Computations
                                                
                                                      
                                                      
                                                                            
                                                                            
                                                                        

                                     




                                                   
                                            
                        
                                        
                                                               
                                                                             
            
            
                                         
                                                                 
                                                                             
                                     


                                    
                                                     


                         
                                               
                                             



Virtual Machine : Python, numexpr, Numba
Last Message for Today
Big data is tricky to manage:

Look for the optimal containers for
your data


Spending some time choosing your
appropriate data container can be a big time
saver in the long run
Summary
• Python is a perfect language for Big Data
• Nowadays you should be aware of the
  memory system for getting good
  performance
• Choosing appropriate data containers is of
  the utmost importance when dealing with
  Big Data
“El éxito del Big Data lo conseguirán
aquellos desarrolladores que sean capaces
de mirar más allá del standard y sean
capaces de entender los recursos hardware
subyacentes y la variedad de algoritmos
disponibles.”

-- Oscar de Bustos, HPC Line of Business
Manager at BULL
¡Gracias!

Mais conteúdo relacionado

Mais procurados

Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
PeterAndreasEntschev
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
mjfrankli
 

Mais procurados (20)

MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Webinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyWebinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data Safety
 
Advances in GPU Computing
Advances in GPU ComputingAdvances in GPU Computing
Advances in GPU Computing
 
Real-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpacesReal-Time Big Data with Storm, Kafka and GigaSpaces
Real-Time Big Data with Storm, Kafka and GigaSpaces
 
CuPy v4 and v5 roadmap
CuPy v4 and v5 roadmapCuPy v4 and v5 roadmap
CuPy v4 and v5 roadmap
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
 
Linux MMAP & Ioremap introduction
Linux MMAP & Ioremap introductionLinux MMAP & Ioremap introduction
Linux MMAP & Ioremap introduction
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
 
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
SQL? NoSQL? NewSQL?!? What's a Java developer to do? - PhillyETE 2012
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+
 
lec4_ref.pdf
lec4_ref.pdflec4_ref.pdf
lec4_ref.pdf
 
Apache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's GroupApache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's Group
 
Caffe framework tutorial2
Caffe framework tutorial2Caffe framework tutorial2
Caffe framework tutorial2
 
クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術
 
GIST AI-X Computing Cluster
GIST AI-X Computing ClusterGIST AI-X Computing Cluster
GIST AI-X Computing Cluster
 
Understanding DLmalloc
Understanding DLmallocUnderstanding DLmalloc
Understanding DLmalloc
 
GPU-Accelerated Parallel Computing
GPU-Accelerated Parallel ComputingGPU-Accelerated Parallel Computing
GPU-Accelerated Parallel Computing
 

Semelhante a Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012

HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
Hackito Ergo Sum
 
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
npinto
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
ActiveState
 
Ca บทที่สี่
Ca บทที่สี่Ca บทที่สี่
Ca บทที่สี่
atit604
 
Multithreading and Parallelism on iOS [MobOS 2013]
 Multithreading and Parallelism on iOS [MobOS 2013] Multithreading and Parallelism on iOS [MobOS 2013]
Multithreading and Parallelism on iOS [MobOS 2013]
Kuba Břečka
 

Semelhante a Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012 (20)

How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPU
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 
Jvm memory model
Jvm memory modelJvm memory model
Jvm memory model
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Lecture 25
Lecture 25Lecture 25
Lecture 25
 
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
"Practical Machine Learning With Ruby" by Iqbal Farabi (ID Ruby Community)
 
Learn How to Master Solr1 4
Learn How to Master Solr1 4Learn How to Master Solr1 4
Learn How to Master Solr1 4
 
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe ShockwaveHES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
HES2011 - Aaron Portnoy and Logan Brown - Black Box Auditing Adobe Shockwave
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
[Harvard CS264] 02 - Parallel Thinking, Architecture, Theory & Patterns
 
Migrating from matlab to python
Migrating from matlab to pythonMigrating from matlab to python
Migrating from matlab to python
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
 
NAS EP Algorithm
NAS EP Algorithm NAS EP Algorithm
NAS EP Algorithm
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
Ca บทที่สี่
Ca บทที่สี่Ca บทที่สี่
Ca บทที่สี่
 
Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
 
Kaggle tokyo 2018
Kaggle tokyo 2018Kaggle tokyo 2018
Kaggle tokyo 2018
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Multithreading and Parallelism on iOS [MobOS 2013]
 Multithreading and Parallelism on iOS [MobOS 2013] Multithreading and Parallelism on iOS [MobOS 2013]
Multithreading and Parallelism on iOS [MobOS 2013]
 

Mais de Big Data Spain

Mais de Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Memory efficient applications. FRANCESC ALTED at Big Data Spain 2012

  • 1. It´s The Memory, Stupid! or: How I Learned to Stop Worrying about CPU Speed and Love Memory Access Francesc Alted Software Architect Big Data Spain 2012, Madrid (Spain) November 16, 2012
  • 2. About Continuum Analytics • Develop new ways on how data is stored, computed, and visualized. • Provide open technologies for Data Integration on a massive scale. • Provide software tools, training, and integration/consulting services to corporate, government, and educational clients worldwide.
  • 3. Overview • The Era of ‘Big Data’ • A few words about Python and NumPy • The Starving CPU problem • Choosing optimal containers for Big Data
  • 4. “A wind of streaming data, social data and unstructured data is knocking at the door, and we're starting to let it in. It's a scary place at the moment.” -- Unidentified bank IT executive, as quoted by “The American Banker” The Dawn of ‘Big Data’
  • 5. Challenges • We have to deal with as much data as possible by using limited resources • So, we must use our computational resources optimally to be able to get the most out of Big Data
  • 6. Interactivity and Big Data • Interactivity is crucial for handling data • Interactivity and performance are crucial for handling Big Data
  • 7. Python and ‘Big Data’ • Python is an interpreted language and hence, it offers interactivity • Myth: “Python is slow, so why on the hell are you going to use it for Big Data?” • Answer: Python has access to an incredibly powerful range of libraries that boost its performance far beyond your expectations • ...and during this talk I will prove it!
  • 8. NumPy: A Standard ‘De Facto’ Container
  • 9.     
  • 10. Operating with NumPy • array[2]; array[1,1:5, :]; array[[3,6,10]] • (array1**3 / array2) - sin(array3) • numpy.dot(array1, array2): access to optimized BLAS (*GEMM) functions • and much more...
  • 11. Nothing Is Perfect • NumPy is just great for many use cases • However, it also has its own deficiencies: • Follows the Python evaluation order in complex expressions like : (a * b) + c • Does not have support for multiprocessors (except for BLAS computations)
  • 12. Numexpr: Dealing with Complex Expressions • It comes with a specialized virtual machine for evaluating expressions • It accelerates computations mainly by making a more efficient memory usage • It supports extremely easy to use multithreading (active by default)
  • 13. Exercise (I) Evaluate the next polynomial: 0.25x3 + 0.75x2 + 1.5x - 2 in the range [-1, 1] with a step size of 2*10-7, using both NumPy and numexpr. Note: use a single processor for numexpr numexpr.set_num_threads(1)
  • 14. Exercise (II) Rewrite the polynomial in this notation: ((0.25x + 0.75)x + 1.5)x - 2 and redo the computations. What happens?
  • 15. ((.25*x + .75)*x - 1.5)*x – 2 0,301 0,11 x 0,052 0,045 sin(x)**2+cos(x)**2 0,715 0,559 Time to evaluate polynomial (1 thread) 1,8 1,6 1,4 1,2 NumPy 1 Time (s) Numexpr 0,8 0,6 0,4 0,2 0 .25*x**3 + .75*x**2 - 1.5*x – 2 ((.25*x + .75)*x - 1.5)*x – 2 NumPy vs Numexpr (1 thread) 1,8
  • 16. Power Expansion Numexpr expands expression: 0.25x3 + 0.75x2 + 1.5x - 2 to: 0.25x*x*x + 0.75x*x + 1.5x*x - 2 so, no need to use transcendental pow()
  • 17. Pending question • Why numexpr continues to be 3x faster than NumPy, even when both are executing exactly the *same* number of operations?
  • 18. “Across the industry, today’s chips are largely able to execute code faster than we can feed them with instructions and data.” – Richard Sites, after his article “It’s The Memory, Stupid!”, Microprocessor Report, 10(10),1996 The Starving CPU Problem
  • 19. Memory Access Time vs CPU Cycle Time
  • 21. The Status of CPU Starvation in 2012 • Memory latency is much slower (between 250x and 500x) than processors. • Memory bandwidth is improving at a better rate than memory latency, but it is also slower than processors (between 30x and 100x).
  • 22. CPU Caches to the Rescue • CPU cache latency and throughput are much better than memory • However: the faster they run the smaller they must be
  • 23. CPU Cache Evolution Up to end 80’s 90’s and 2000’s 2010’s Mechanical disk Mechanical disk Mechanical disk Solid state disk Capacity Speed Main memory Main memory Main memory Level 3 cache Level 2 cache Level 2 cache Central processing Level 1 cache Level 1 cache unit (CPU) CPU CPU (a) (b) (c) Figure 1. Evolution of the hierarchical memory model. (a) The primordial (and simplest) model; (b) the most common current implementation, which includes additional cache levels; and (c) a sensible guess at what’s coming over the next decade: three levels of cache in the CPU and solid state disks lying between main memory and classical mechanical disks.
  • 24. When CPU Caches Are Effective? Mainly in a couple of scenarios: • Time locality: when the dataset is reused • Spatial locality: when the dataset is accessed sequentially
  • 25. The Blocking Technique When accessing disk or memory, get a contiguous block that fits in CPU cache, operate upon it and reuse it as much as possible.        Use this extensively to leverage spatial and temporal localities
  • 26. Time To Answer NumPy Pending Questions .25*x**3 + .75*x**2 - 1.5*x – 2 ((.25*x + .75)*x - 1.5)*x – 2 x NumPy 1,613 0,301 0,052 Numexpr 0,138 0,11 0,045 sin(x)**2+cos(x)**2 0,715 0,559 Time to evaluate polynomial (1 thread) 1,8 1,6 1,4 1,2 NumPy 1 Time (s) Numexpr 0,8 0,6 0,4 0,2 0 .25*x**3 + .75*x**2 - 1.5*x – 2 ((.25*x + .75)*x - 1.5)*x – 2 NumPy vs Numexpr (1 thread) 1,8
  • 30. Numexpr Limitations • Numexpr only implements element-wise operations, i.e. ‘a*b’ is evaluated as: for i in range(N): c[i] = a[i] * b[i] • In particular, it cannot deal with things like: for i in range(N): c[i] = a[i-1] + a[i] * b[i]
  • 31. Numba: Overcoming numexpr Limitations • Numba is a JIT that can translate a subset of the Python language into machine code • It uses LLVM infrastructure behind the scenes • Can achieve similar or better performance than numexpr, but with more flexibility
  • 32. How Numba Works Python Function Machine Code LLVM-PY LLVM 3.1 ISPC OpenCL OpenMP CUDA CLANG Intel AMD Nvidia Apple
  • 33. Numba Example: Computing the Polynomial import numpy as np import numba as nb N = 10*1000*1000 x = np.linspace(-1, 1, N) y = np.empty(N, dtype=np.float64) @nb.jit(arg_types=[nb.f8[:], nb.f8[:]]) def poly(x, y): for i in range(N): # y[i] = 0.25*x[i]**3 + 0.75*x[i]**2 + 1.5*x[i] - 2 y[i] = ((0.25*x[i] + 0.75)*x[i] + 1.5)*x[i] - 2 poly(x, y) # run through Numba!
  • 34. Times for Computing the Polynomial (In Seconds) Poly version (I) (II) Numpy 1.086 0.505 numexpr 0.108 0.096 Numba 0.055 0.054 Pure C, OpenMP 0.215 0.054 • Compilation time for Numba: 0.019 sec • Run on Mac OSX, Core2 Duo @ 2.13 GHz
  • 35. Numba: LLVM for Python Python code can reach C speed without having to program in C itself (and without losing interactivity!)
  • 36. Numba in SC 2012
  • 37. Numba in SC2012 Awesome Python!
  • 38. If a datastore requires all data to fit in memory, it isn't big data -- Alex Gaynor (in twitter) Optimal Containers for Big Data
  • 39. The Need for a Good Data Container • Too many times we are too focused on computing as fast as possible • But we have seen how important data access is • Hence, having an optimal data structure is critical for getting good performance when processing very large datasets
  • 40. Appending Data in Large NumPy Objects array to be enlarged final array object Copy! New memory new data to append allocation • Normally a realloc() syscall will not succeed • Both memory areas have to exist simultaneously
  • 41. Contiguous vs Chunked NumPy container Blaze container chunk 1 chunk 2 . . . chunk N Contiguous memory Discontiguous memory
  • 42. Appending data in Blaze array to be enlarged final array object X chunk 1 chunk 1 chunk 2 chunk 2 compress new data to append new chunk Only a small amount of data has to be compressed
  • 43. Blosc: (de)compressing faster than memcpy() Transmission + decompression faster than direct transfer?
  • 44. TABLE 1 Test Data Sets Example of How Blosc Accelerates Genomics I/O: # 1 Source 1000 Genomes Identifier ERR000018 Sequencer Illumina GA Read Count 9,280,498 Read Length 36 bp ID Lengths 40–50 FASTQ Size 1,105 MB 2 3 SeqPack (backed by Blosc) 1000 Genomes 1000 Genomes SRR493233 1 SRR497004 1 Illumina HiSeq 2000 AB SOLiD 4 43,225,060 122,924,963 100 bp 51 bp 51–61 78–91 10,916 MB 22,990 MB g. 1. In-memory throughputs for several compression schemes applied to increasing block sizes (where each equence is 256 bytes Howison, M. (in press). High-throughput compression of FASTQ data Source: long). with SeqDB. IEEE Transactions on Computational Biology and Bioinformatics. to a memory buffer, timed the compression of block consistent throughput across both compression and
  • 45. How Blaze Does Out- Of-Core Computations                                                         Virtual Machine : Python, numexpr, Numba
  • 46. Last Message for Today Big data is tricky to manage: Look for the optimal containers for your data Spending some time choosing your appropriate data container can be a big time saver in the long run
  • 47. Summary • Python is a perfect language for Big Data • Nowadays you should be aware of the memory system for getting good performance • Choosing appropriate data containers is of the utmost importance when dealing with Big Data
  • 48. “El éxito del Big Data lo conseguirán aquellos desarrolladores que sean capaces de mirar más allá del standard y sean capaces de entender los recursos hardware subyacentes y la variedad de algoritmos disponibles.” -- Oscar de Bustos, HPC Line of Business Manager at BULL