SlideShare uma empresa Scribd logo
1 de 47
Baixar para ler offline
It´s The Memory,
Stupid!

Understanding the Memory Hierarchy for Better Data
Management
Francesc Alted

Software Architect

CodeJam 2014, Jülich Supercomputing Centre	

January 26, 2014
About Continuum Analytics

• Develop new ways on how data is stored,
computed, and visualized.	


• Provide open technologies for data
integration on a massive scale.	


• Provide software tools, training, and

integration/consulting services to
corporate, government, and educational
clients worldwide.
Main Software Projects
• Anaconda: Easy deployment of Python
packages	


• Bokeh: Interactive visualization and analysis
for the web and in Python	


• Blaze: Out-of-core and distributed data
computation & store (NG NumPy)	


• Wakari: Sharing Python environments in
the web (Python for teams)
Overview
• The Era of ‘Big Data’	

• A small exercise on query performance	

• The Starving CPU problem	

• Efficient computing: numexpr and Numba	

• Optimal Containers for Big Data
“There were 5 exabytes of information
created between the dawn of
civilization through 2003, but that much
information is now created	

every 2 days, and the pace is increasing.”	

!

— Erich Schmidt, Google CEO,
Techonomy Conference, August 4, 2010

The Dawn of ‘Big Data’
Interactivity	

and Big Data
• Interactivity is crucial for handling data	

• Interactivity and performance are crucial
for handling Big Data
Python and ‘Big Data’
• Python is an interpreted language and
hence, it offers interactivity	


• Myth: “Python is slow, so why on hell are
you going to use it for Big Data?”	


• Answer: Python has access to an incredibly

powerful range of libraries that boost its
performance far beyond your expectations	


• ...and during this talk I will prove it…
NumPy: A Standard ‘De
Facto’ Container
Operating with
NumPy
• array[2]; array[1,1:5, :]; array[[3,6,10]]	

• (array1**3 / array2) - sin(array3)	

• numpy.dot(array1, array2): access to
optimized BLAS (*GEMM) functions	


• and much more...
Interlude:

Querying Big Tables
Exercise
• We are starting to see datasets with a huge
number of rows, and more importantly,
columns	


• Let’s do an experiment querying such a

large table by using several classic methods
(and a new one :)
The Table & The Query
• Suppose that our table is something like

one million rows and 100 columns (yes, we
are talking about big data)	


• The query is like:






(f2>.9) and ((f8>.3) and (f8<.4))

• Selectivity is about 1%
Times for querying (seconds)
Database
(in-memory)

(create)

(query)

NumPy

23.4

1.97

sqlite3

454

28.9

sqlite3 (index)

140

2.87

BLZ

22.0

0.081

Run on Mac OSX, Core2 Duo @ 2.13 GHz
Open Questions
• If all the query engines store their data inmemory, and written in C, why they have
such a large difference in performance?	


• Why SQLite3 performance is so poor, even
when using indexing?	


• Why BLZ (essentially a data container on
steroids) can be up to 25x faster than
NumPy for querying?
“Across the industry, today’s chips are largely
able to execute code faster than we can feed
them with instructions and data.”	

!

– Richard Sites, after his article

“It’s The Memory, Stupid!”, 

Microprocessor Report, 10(10),1996

The Starving CPU
Problem
Memory Access Time
vs CPU Cycle Time
Book in
2009
The Status of CPU
Starvation in 2014
• Memory latency is much slower (between
250x and 1000x) than processors.	


• Memory bandwidth is improving at a better
rate than memory latency, but it is also
slower than processors (between 30x and
100x).
CPU Caches to the
Rescue

• CPU cache latency and throughput
are much better than memory	


• However: the faster they run the

smaller they must be (because of heat
dissipation problems)
CPU Cache Evolution
Up to end 80’s

90’s and 2000’s

2010’s

Mechanical disk

Mechanical disk

Mechanical disk

Main memory

Main memory

Main memory

Speed

Capacity

Solid state disk

Level 3 cache
Level 2 cache
Level 1 cache
CPU

Central
processing
unit (CPU)
(a)

(b)

Level 2 cache
Level 1 cache
CPU
(c)

Figure 1. Evolution of the hierarchical memory model. (a) The primordial (and simplest) model; (b) the most common current
implementation, which includes additional cache levels; and (c) a sensible guess at what’s coming over the next decade:
three levels of cache in the CPU and solid state disks lying between main memory and classical mechanical disks.
numexpr
• numexpr is a computational engine for

NumPy that makes a sensible use of the
memory hierarchy for better performance 	


• Other libraries normally use numexpr
when it is time to accelerate large
computations	


• PyTables, pandas and BLZ can all leverage
numexpr
Computing with NumPy

Temporaries go to memory
Computing with Numexpr

Temporaries stay in cache
First Hint for Open
Questions
• BLZ is using numexpr behind the scenes so
as to do the computations needed for the
query	


• It creates less in-memory temporaries than
NumPy, and CPUs can do more work
Second Interlude

Beyond numexpr:
Numba
Numexpr Limitations
• Numexpr only implements element-wise
operations, i.e. ‘a*b’ is evaluated as:



for i in range(N):	
c[i] = a[i] * b[i]	


• In particular, it cannot deal with things like:	

for i in range(N):	
c[i] = a[i-1] + a[i] * b[i]
Numba: Overcoming
numexpr Limitations
• Numba is a JIT that can translate a subset

of the Python language into machine code	


• It uses LLVM infrastructure behind the
scenes	


• Can achieve similar or better performance
than numexpr, but with more flexibility
How Numba Works
Machine Code

Python Function
LLVM-PY
LLVM 3.1
ISPC

Intel

OpenCL

OpenMP

AMD

CUDA

Nvidia

CLANG

Apple
Numba Example:

Computing a Polynomial
import numpy as np
import numba as nb
!
N = 10*1000*1000
!
x = np.linspace(-1, 1, N)
y = np.empty(N, dtype=np.float64)
!
@nb.autojit
def poly(x, y):
for i in range(N):
# y[i] = 0.25*x[i]**3 + 0.75*x[i]**2 + 1.5*x[i] - 2
y[i] = ((0.25*x[i] + 0.75)*x[i] + 1.5)*x[i] - 2
!
poly(x, y) # run through Numba!
Times for Computing (sec.)
Poly version

(I)

(II)

Numpy

1.086

0.505

numexpr

0.108

0.096

Numba

0.055

0.054

Pure C, OpenMP

0.215

0.054

Run on Mac OSX, Core2 Duo @ 2.13 GHz
Numba: LLVM for Python
Python code can reach C
speed without having to
program in C itself
(and without losing interactivity!)
Numba is Powering Blaze
• Blaze is Continuum Analytics' on-going

effort to provide efficient computation for
out-of-core and distributed data.	


• Blaze is using a series of technologies for
this: DyND, Numba, BLZ	


• Numba is at the computational core of
Blaze
FlyPy Has Born	

(Numba in 2014)

• Complete reimplementation of

Numba (i.e. different computing
engine)	


• More powerful type system than
Numba	


• High Performance Computing for
High Level Language (Python)
FlyPy Highlights

• SIMD support	

• Fused generators (restricted use)	

• Garbage collection 	

• Efficient user-defined abstraction	

• Shared nothing architecture with

optional safe sharing cache latency
If a datastore requires all data to fit in
memory, it isn't big data

!

-- Alex Gaynor (in twitter)

Optimal Containers for
High Performance
Computing
The Need for a Good
Data Container
• Too many times we are too focused on
computing as fast as possible	


• But we have seen how important data
access is	


• BLZ is data container for in-memory or
on-disk data that can be compressed
The Need for
Compression
• Compression allows to store more data
using the same storage capacity	


• Sure, it uses more CPU time to compress/
decompress data	


• But, that actually means using more wall
clock time?
Blosc: (de)compressing
faster than memcpy()

Transmission + decompression faster than direct transfer?
Test Data Sets

Example of How Blosc Accelerates Genomics I/O:	

SeqPack (backed by Blosc)
#

Source

Identifier

Sequencer

1
2
3

1000 Genomes
1000 Genomes
1000 Genomes

ERR000018
SRR493233 1
SRR497004 1

Illumina GA
Illumina HiSeq 2000
AB SOLiD 4

Read Count

Read Length

ID Lengths

FASTQ Size

9,280,498
43,225,060
122,924,963

36 bp
100 bp
51 bp

40–50
51–61
78–91

1,105 MB
10,916 MB
22,990 MB

g. 1. In-memory throughputs for several compression schemes applied to increasing block sizes (where each
Source: Howison, M. (in press). High-throughput compression of FASTQ data
equence is 256 bytes long).
with SeqDB. IEEE Transactions on Computational Biology and Bioinformatics.
Second Hint for Open
Questions	

• BLZ does make use of Blosc for dealing
with compressed data transparently	


• That means not only better storage usage,
but frequently better access time to the
data
The btable object in BLZ
Chunks

New row to append

• Columns are contiguous in memory	

• Chunks follow column order	

• Very efficient for querying (specially with a

large number of columns)
And a Third Hint…
• BLZ, differently from NumPy or SQLite3,

stores columns contiguously in-memory or
on-disk	


• Again, BLZ is making use of the principles
of time and spatial locality for optimizing
the access to memory	


• Blaze can leverage BLZ for storage and
improved I/O capabilities
Summary
• Python is a perfect match for Big Data	

• Nowadays you should be aware of the
memory system for getting good
performance	


• Choosing appropriate data containers is of
the utmost importance when dealing with
Big Data
References
• Blaze: http://blaze.pydata.org	

• BLZ: http://blz.pydata.org	

• Blosc: http://www.blosc.org	

• Numba: http://numba.pydata.org 	

• FlyPy: https://github.com/ContinuumIO/flypy
“Across the industry, today’s chips are largely able to execute code
faster than we can feed them with instructions and data. There are no
longer performance bottlenecks in the floating-point multiplier or in
having only a single integer unit. The real design action is in memory
subsystems— caches, buses, bandwidth, and latency.”	

!

“Over the coming decade, memory subsystem design will be the only
important design issue for microprocessors.”	

!

– Richard Sites, after his article “It’s The Memory, Stupid!”,
Microprocessor Report, 10(10),1996
Thank you!

Mais conteúdo relacionado

Mais procurados

data deduplication
data deduplicationdata deduplication
data deduplicationssuser1eca7d
 
Modern software design in Big data era
Modern software design in Big data eraModern software design in Big data era
Modern software design in Big data eraBill GU
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)Romain Jacotin
 
Backup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryBackup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryMongoDB
 
クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術Ryousei Takano
 
Joblib Toward efficient computing : from laptop to cloud
Joblib Toward efficient computing : from laptop to cloudJoblib Toward efficient computing : from laptop to cloud
Joblib Toward efficient computing : from laptop to cloudPyDataParis
 
Apache tajo configuration
Apache tajo configurationApache tajo configuration
Apache tajo configurationJihoon Son
 
Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing UnitSlides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing UnitCarlo C. del Mundo
 
Financial Portfolio Management with Java on Steroids - JAX Finance 2016
Financial Portfolio Management with Java on Steroids - JAX Finance 2016Financial Portfolio Management with Java on Steroids - JAX Finance 2016
Financial Portfolio Management with Java on Steroids - JAX Finance 2016aixigo AG
 
Webinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyWebinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyMongoDB
 
openTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldOliver Hankeln
 

Mais procurados (20)

Data replication
Data replicationData replication
Data replication
 
data deduplication
data deduplicationdata deduplication
data deduplication
 
computer-memory
computer-memorycomputer-memory
computer-memory
 
Modern software design in Big data era
Modern software design in Big data eraModern software design in Big data era
Modern software design in Big data era
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Backup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryBackup, Restore, and Disaster Recovery
Backup, Restore, and Disaster Recovery
 
クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術クラウド時代の半導体メモリー技術
クラウド時代の半導体メモリー技術
 
Kfs presentation
Kfs presentationKfs presentation
Kfs presentation
 
Joblib Toward efficient computing : from laptop to cloud
Joblib Toward efficient computing : from laptop to cloudJoblib Toward efficient computing : from laptop to cloud
Joblib Toward efficient computing : from laptop to cloud
 
Joblib PyDataParis2016
Joblib PyDataParis2016Joblib PyDataParis2016
Joblib PyDataParis2016
 
Apache tajo configuration
Apache tajo configurationApache tajo configuration
Apache tajo configuration
 
Mongodb backup
Mongodb backupMongodb backup
Mongodb backup
 
NSCC Training Introductory Class
NSCC Training  Introductory ClassNSCC Training  Introductory Class
NSCC Training Introductory Class
 
NSCC Training - Introductory Class
NSCC Training - Introductory ClassNSCC Training - Introductory Class
NSCC Training - Introductory Class
 
NSCC Training Introductory Class
NSCC Training Introductory Class NSCC Training Introductory Class
NSCC Training Introductory Class
 
Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing UnitSlides for In-Datacenter Performance Analysis of a Tensor Processing Unit
Slides for In-Datacenter Performance Analysis of a Tensor Processing Unit
 
Financial Portfolio Management with Java on Steroids - JAX Finance 2016
Financial Portfolio Management with Java on Steroids - JAX Finance 2016Financial Portfolio Management with Java on Steroids - JAX Finance 2016
Financial Portfolio Management with Java on Steroids - JAX Finance 2016
 
Webinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data SafetyWebinar: Understanding Storage for Performance and Data Safety
Webinar: Understanding Storage for Performance and Data Safety
 
openTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed worldopenTSDB - Metrics for a distributed world
openTSDB - Metrics for a distributed world
 
GFS & HDFS Introduction
GFS & HDFS IntroductionGFS & HDFS Introduction
GFS & HDFS Introduction
 

Semelhante a It's the memory, stupid! CodeJam 2014

2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetupGanesan Narayanasamy
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentSpeedment, Inc.
 
High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Javamalduarte
 
Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"Fwdays
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable PythonTravis Oliphant
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationJonathan Katz
 
ANN-Lecture2-Python Startup.pptx
ANN-Lecture2-Python Startup.pptxANN-Lecture2-Python Startup.pptx
ANN-Lecture2-Python Startup.pptxShahzadAhmadJoiya3
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA Taiwan
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Cloudera, Inc.
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedWee Hyong Tok
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresSpark Summit
 

Semelhante a It's the memory, stupid! CodeJam 2014 (20)

Large Data Analyze With PyTables
Large Data Analyze With PyTablesLarge Data Analyze With PyTables
Large Data Analyze With PyTables
 
PyTables
PyTablesPyTables
PyTables
 
Py tables
Py tablesPy tables
Py tables
 
PyTables
PyTablesPyTables
PyTables
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
NYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ SpeedmentNYJavaSIG - Big Data Microservices w/ Speedment
NYJavaSIG - Big Data Microservices w/ Speedment
 
High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Java
 
Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"Travis Oliphant "Python for Speed, Scale, and Science"
Travis Oliphant "Python for Speed, Scale, and Science"
 
PyCon Estonia 2019
PyCon Estonia 2019PyCon Estonia 2019
PyCon Estonia 2019
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Building a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management ApplicationBuilding a Complex, Real-Time Data Management Application
Building a Complex, Real-Time Data Management Application
 
PyData Boston 2013
PyData Boston 2013PyData Boston 2013
PyData Boston 2013
 
ANN-Lecture2-Python Startup.pptx
ANN-Lecture2-Python Startup.pptxANN-Lecture2-Python Startup.pptx
ANN-Lecture2-Python Startup.pptx
 
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflowNVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
NVIDIA 深度學習教育機構 (DLI): Image segmentation with tensorflow
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learned
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi TorresStories About Spark, HPC and Barcelona by Jordi Torres
Stories About Spark, HPC and Barcelona by Jordi Torres
 
Nbvtalkatjntuvizianagaram
NbvtalkatjntuvizianagaramNbvtalkatjntuvizianagaram
Nbvtalkatjntuvizianagaram
 

Último

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Último (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

It's the memory, stupid! CodeJam 2014

  • 1. It´s The Memory, Stupid! Understanding the Memory Hierarchy for Better Data Management Francesc Alted
 Software Architect
 CodeJam 2014, Jülich Supercomputing Centre January 26, 2014
  • 2. About Continuum Analytics • Develop new ways on how data is stored, computed, and visualized. • Provide open technologies for data integration on a massive scale. • Provide software tools, training, and integration/consulting services to corporate, government, and educational clients worldwide.
  • 3. Main Software Projects • Anaconda: Easy deployment of Python packages • Bokeh: Interactive visualization and analysis for the web and in Python • Blaze: Out-of-core and distributed data computation & store (NG NumPy) • Wakari: Sharing Python environments in the web (Python for teams)
  • 4. Overview • The Era of ‘Big Data’ • A small exercise on query performance • The Starving CPU problem • Efficient computing: numexpr and Numba • Optimal Containers for Big Data
  • 5. “There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.” ! — Erich Schmidt, Google CEO, Techonomy Conference, August 4, 2010 The Dawn of ‘Big Data’
  • 6. Interactivity and Big Data • Interactivity is crucial for handling data • Interactivity and performance are crucial for handling Big Data
  • 7. Python and ‘Big Data’ • Python is an interpreted language and hence, it offers interactivity • Myth: “Python is slow, so why on hell are you going to use it for Big Data?” • Answer: Python has access to an incredibly powerful range of libraries that boost its performance far beyond your expectations • ...and during this talk I will prove it…
  • 8. NumPy: A Standard ‘De Facto’ Container
  • 9.
  • 10. Operating with NumPy • array[2]; array[1,1:5, :]; array[[3,6,10]] • (array1**3 / array2) - sin(array3) • numpy.dot(array1, array2): access to optimized BLAS (*GEMM) functions • and much more...
  • 12. Exercise • We are starting to see datasets with a huge number of rows, and more importantly, columns • Let’s do an experiment querying such a large table by using several classic methods (and a new one :)
  • 13. The Table & The Query • Suppose that our table is something like one million rows and 100 columns (yes, we are talking about big data) • The query is like: 
 
 (f2>.9) and ((f8>.3) and (f8<.4)) • Selectivity is about 1%
  • 14. Times for querying (seconds) Database (in-memory) (create) (query) NumPy 23.4 1.97 sqlite3 454 28.9 sqlite3 (index) 140 2.87 BLZ 22.0 0.081 Run on Mac OSX, Core2 Duo @ 2.13 GHz
  • 15. Open Questions • If all the query engines store their data inmemory, and written in C, why they have such a large difference in performance? • Why SQLite3 performance is so poor, even when using indexing? • Why BLZ (essentially a data container on steroids) can be up to 25x faster than NumPy for querying?
  • 16. “Across the industry, today’s chips are largely able to execute code faster than we can feed them with instructions and data.” ! – Richard Sites, after his article
 “It’s The Memory, Stupid!”, 
 Microprocessor Report, 10(10),1996 The Starving CPU Problem
  • 17. Memory Access Time vs CPU Cycle Time
  • 19. The Status of CPU Starvation in 2014 • Memory latency is much slower (between 250x and 1000x) than processors. • Memory bandwidth is improving at a better rate than memory latency, but it is also slower than processors (between 30x and 100x).
  • 20. CPU Caches to the Rescue • CPU cache latency and throughput are much better than memory • However: the faster they run the smaller they must be (because of heat dissipation problems)
  • 21. CPU Cache Evolution Up to end 80’s 90’s and 2000’s 2010’s Mechanical disk Mechanical disk Mechanical disk Main memory Main memory Main memory Speed Capacity Solid state disk Level 3 cache Level 2 cache Level 1 cache CPU Central processing unit (CPU) (a) (b) Level 2 cache Level 1 cache CPU (c) Figure 1. Evolution of the hierarchical memory model. (a) The primordial (and simplest) model; (b) the most common current implementation, which includes additional cache levels; and (c) a sensible guess at what’s coming over the next decade: three levels of cache in the CPU and solid state disks lying between main memory and classical mechanical disks.
  • 22. numexpr • numexpr is a computational engine for NumPy that makes a sensible use of the memory hierarchy for better performance • Other libraries normally use numexpr when it is time to accelerate large computations • PyTables, pandas and BLZ can all leverage numexpr
  • 25. First Hint for Open Questions • BLZ is using numexpr behind the scenes so as to do the computations needed for the query • It creates less in-memory temporaries than NumPy, and CPUs can do more work
  • 27. Numexpr Limitations • Numexpr only implements element-wise operations, i.e. ‘a*b’ is evaluated as:
 
 for i in range(N): c[i] = a[i] * b[i] • In particular, it cannot deal with things like: for i in range(N): c[i] = a[i-1] + a[i] * b[i]
  • 28. Numba: Overcoming numexpr Limitations • Numba is a JIT that can translate a subset of the Python language into machine code • It uses LLVM infrastructure behind the scenes • Can achieve similar or better performance than numexpr, but with more flexibility
  • 29. How Numba Works Machine Code Python Function LLVM-PY LLVM 3.1 ISPC Intel OpenCL OpenMP AMD CUDA Nvidia CLANG Apple
  • 30. Numba Example:
 Computing a Polynomial import numpy as np import numba as nb ! N = 10*1000*1000 ! x = np.linspace(-1, 1, N) y = np.empty(N, dtype=np.float64) ! @nb.autojit def poly(x, y): for i in range(N): # y[i] = 0.25*x[i]**3 + 0.75*x[i]**2 + 1.5*x[i] - 2 y[i] = ((0.25*x[i] + 0.75)*x[i] + 1.5)*x[i] - 2 ! poly(x, y) # run through Numba!
  • 31. Times for Computing (sec.) Poly version (I) (II) Numpy 1.086 0.505 numexpr 0.108 0.096 Numba 0.055 0.054 Pure C, OpenMP 0.215 0.054 Run on Mac OSX, Core2 Duo @ 2.13 GHz
  • 32. Numba: LLVM for Python Python code can reach C speed without having to program in C itself (and without losing interactivity!)
  • 33. Numba is Powering Blaze • Blaze is Continuum Analytics' on-going effort to provide efficient computation for out-of-core and distributed data. • Blaze is using a series of technologies for this: DyND, Numba, BLZ • Numba is at the computational core of Blaze
  • 34. FlyPy Has Born (Numba in 2014) • Complete reimplementation of Numba (i.e. different computing engine) • More powerful type system than Numba • High Performance Computing for High Level Language (Python)
  • 35. FlyPy Highlights • SIMD support • Fused generators (restricted use) • Garbage collection • Efficient user-defined abstraction • Shared nothing architecture with optional safe sharing cache latency
  • 36. If a datastore requires all data to fit in memory, it isn't big data ! -- Alex Gaynor (in twitter) Optimal Containers for High Performance Computing
  • 37. The Need for a Good Data Container • Too many times we are too focused on computing as fast as possible • But we have seen how important data access is • BLZ is data container for in-memory or on-disk data that can be compressed
  • 38. The Need for Compression • Compression allows to store more data using the same storage capacity • Sure, it uses more CPU time to compress/ decompress data • But, that actually means using more wall clock time?
  • 39. Blosc: (de)compressing faster than memcpy() Transmission + decompression faster than direct transfer?
  • 40. Test Data Sets Example of How Blosc Accelerates Genomics I/O: SeqPack (backed by Blosc) # Source Identifier Sequencer 1 2 3 1000 Genomes 1000 Genomes 1000 Genomes ERR000018 SRR493233 1 SRR497004 1 Illumina GA Illumina HiSeq 2000 AB SOLiD 4 Read Count Read Length ID Lengths FASTQ Size 9,280,498 43,225,060 122,924,963 36 bp 100 bp 51 bp 40–50 51–61 78–91 1,105 MB 10,916 MB 22,990 MB g. 1. In-memory throughputs for several compression schemes applied to increasing block sizes (where each Source: Howison, M. (in press). High-throughput compression of FASTQ data equence is 256 bytes long). with SeqDB. IEEE Transactions on Computational Biology and Bioinformatics.
  • 41. Second Hint for Open Questions • BLZ does make use of Blosc for dealing with compressed data transparently • That means not only better storage usage, but frequently better access time to the data
  • 42. The btable object in BLZ Chunks New row to append • Columns are contiguous in memory • Chunks follow column order • Very efficient for querying (specially with a
 large number of columns)
  • 43. And a Third Hint… • BLZ, differently from NumPy or SQLite3, stores columns contiguously in-memory or on-disk • Again, BLZ is making use of the principles of time and spatial locality for optimizing the access to memory • Blaze can leverage BLZ for storage and improved I/O capabilities
  • 44. Summary • Python is a perfect match for Big Data • Nowadays you should be aware of the memory system for getting good performance • Choosing appropriate data containers is of the utmost importance when dealing with Big Data
  • 45. References • Blaze: http://blaze.pydata.org • BLZ: http://blz.pydata.org • Blosc: http://www.blosc.org • Numba: http://numba.pydata.org • FlyPy: https://github.com/ContinuumIO/flypy
  • 46. “Across the industry, today’s chips are largely able to execute code faster than we can feed them with instructions and data. There are no longer performance bottlenecks in the floating-point multiplier or in having only a single integer unit. The real design action is in memory subsystems— caches, buses, bandwidth, and latency.” ! “Over the coming decade, memory subsystem design will be the only important design issue for microprocessors.” ! – Richard Sites, after his article “It’s The Memory, Stupid!”, Microprocessor Report, 10(10),1996