SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
High Performance & High Throughput
Computing
Giuseppe Fiameni – g.Fiameni@cineca.it
http://hackaday.com/2016/07/13/nirvana-like-youve-never-heard-them-before/
Agenda
• Computational Problem
• System architecture
• Serial vs Parallel
• Levels of parallelism
• Models for parallel programming
• HPC and HTC examples
EUDAT Summer School 2017 - High Performance/Throughput Computing 2
What does it make a problem be
relevant from the computational
point of view?
Computational Dimension:
number of operations needed to solve the problem,
in general is a function of the size of the involved
data structures (n, n2 , n3 , n log n, etc.)
flop - Floating point operations
indicates an arithmetic floating point operation.
flop/s - Floating points operations per second
is a unit to measure the speed of a computer.
Computational problems today: 1015 – 1022 flop
One year has about 3 x 107 seconds!
Most powerful computers today have reach a sustained
performance is of the order of Tflop/s - Pflop/s (1012 - 1015
flop/s).
Size of computational applications
EUDAT Summer School 2017 - High Performance/Throughput Computing 4
Example: Weather Prediction
Forecasts on a global scale:
 3D Grid to represent the Earth
 Earth's circumference:  40000 km
 radius:  6370 km
 Earth's surface:  4πr2  5•108 km2
 6 variables:
 temperature
 pressure
 humidity
 wind speed in the 3 Cartesian directions
 cells of 1 km on each side
 100 slices to see how the variables evolve on the
different levels of the atmosphere
 a 30 seconds time step is required for the
simulation with such resolution
 Each cell requires about 1000 operations per
time step (Navier-Stokes turbulence and various
phenomena)
In physics, the Navier–Stokes
equations /nævˈjeɪ stoʊks/,
named after Claude-Louis
Navier and George Gabriel
Stokes, describe the motion
of viscous fluid substances.
EUDAT Summer School 2017 - High Performance/Throughput Computing 5
Grid: 5 x 108 x 100 = 5 x 1010 cells
 each cell is represented with 8 Byte
 Memory space:
 (6 var) x (8 Byte) x (5 x 1010 cells)  2 x 1012 Byte = 2TB
A 24 hours forecast needs:
• 24 x 60 x 2  3 x 103 time-step
• (5 x 1010 cells) x (103 oper.) x (3 x 103 time-steps) = 1.5 x 1017
operations !
A computer with a power of 1Tflop/s will take 1.5 x 105 sec.
Example: Weather Prediction (cont.)
24 hours forecast will need 2 days to run ... but we shall obtain a very
accurate forecast!
EUDAT Summer School 2017 - High Performance/Throughput Computing 6
John Von Neumann
EUDAT Summer School 2017 - High Performance/Throughput Computing 7
https://en.wikipedia.org/wiki/Von_Neumann_architecture
What if you want increase
performance?
• Increase the clock rate
– number of clock cycles per second (peacemaker)
• Increase the instruction rate
– number of basic instructions that can be executed
per clock cycle
– Pipeling, vectorization
• Reduce access time to data
– Having frequently used data on the processor
caches significantly improve computational
performance
EUDAT Summer School 2017 - High Performance/Throughput Computing 8
The clock cycle  is defined as the
time between two adjacent pulses
of oscillator that sets the time of
the processor.
The number of these pulses per
second is known as clock speed or
clock frequency, generally
measured in GHz (gigahertz, or
billions of pulses per second).
The clock cycle controls the
synchronization of operations in a
computer: All the operations
inside the processor last a multiple
of .
Processor  (ns) freq (MHz)
CDC 6600 100 10
Cyber 76 27.5 36,3
IBM ES 9000 9 111
Cray Y-MP C90 4.1 244
Intel i860 20 50
PC Pentium < 0.5 > 2 GHz
Power PC 1.17 850
IBM Power 5 0.52 1.9 GHz
IBM Power 6 0.21 4.7 GHz
Increasing the clock frequency:
The speed of light sets an upper limit to the speed
with which electronic components can operate .
Propagation velocity of a signal in a vacuum:
300.000 Km/s = 30 cm/ns
Heat dissipation problems inside the processor.
Also Quantum tunneling expected to become
important.
Speed of Processors: Clock Cycle and Frequency
EUDAT Summer School 2017 - High Performance/Throughput Computing 9
Memory access time: the time required
by the processor to access data or to write
data from / to memory
The hierarchy exists because :
 fast memory is expensive and small
 slow memory is cheap and big
Latency
 how long do I have to wait for the
data?
 (cannot do anything while waiting)
Throughput
 how many bytes/second. but not
important if waiting. Total time = latency + (amount of data
/ throughput)
Time to run code = clock cycles running code + clock cycles waiting for
memory
Memory hierarchies
EUDAT Summer School 2017 - High Performance/Throughput Computing 10
Loop unrolling
for (i=0; i<N; i++)
s += a[i]*b[i]
EUDAT Summer School 2017 - High Performance/Throughput Computing 11
for (i=0; i<N; i += 2)
s += a[i]*b[i] + a[i+1]*b[i+1]
Why Have Cache?
EUDAT Summer School 2017 - High Performance/Throughput Computing 12
CPU 351 GB/sec
10.66 GB/sec (3%)
253 GB/sec (72%)
Parallelism
 Serial Process:
 A process in which its sub-processes happen
sequentially in time.
 Speed depends only on the rate at which each
sub-process will occur (e.g. processing unit clock
speed).
 Parallel Process:
 Process in which multiple sub-processes can be
active simultaneously.
 Speed depends on execution rate of each sub-
process AND how many sub-processes can be
made to occur simultaneously.
EUDAT Summer School 2017 - High Performance/Throughput Computing 13
Parallel for Capacity
• Example: Searching database of web-sites for some
text (i.e. Google)
• Searching sequentially through large volumes of text
too time-consuming
– Multiple servers hold different pages
– Each server can report the result of each
individual search
– The more servers you add the quicker the search
is
EUDAT Summer School 2017 - High Performance/Throughput Computing 14
Parallel for Capability
• Example: Large scale weather simulation
• Detailed description for atmosphere too large to run
on today's desktop
– Multiple servers are needed to hold all grid data in
memory
– Servers need to quickly communicate to
synchronise work over entire grid
– Communication between servers can become a
bottleneck. Latency really matters!
EUDAT Summer School 2017 - High Performance/Throughput Computing 15
Capability and capacity
• Parallel for Capability => High Performance Computing
– deals with one large problem
– problems that are hard to parallelise, not
embarrassing parallel
– single processor performance is important
• Parallel for Capacity => High Throughput Computing
– It is appropriate when the problem can be
decomposed into many (very many) smaller problems
that are essentially independent
– problems that are easy to parallelise, parameter
swapping
– can be adapted to very large numbers of processors
EUDAT Summer School 2017 - High Performance/Throughput Computing 16
Systems main architecture
• Single processor, single core
EUDAT Summer School 2017 - High Performance/Throughput Computing 17
• Dual processor, single core
• Single processor, dual core
Multi core architecture
• Modern architecture can go up to 34 cores
per processor and to 2/4 processors per
node
EUDAT Summer School 2017 - High Performance/Throughput Computing 18
Flynn Taxonomy
• A computer architecture is categorized by the
multiplicity of hardware used to manipulate streams of
instructions (sequence of instructions executed by the
computer) and streams of data (sequence of data used
to execute a stream of instructions).
EUDAT Summer School 2017 - High Performance/Throughput Computing 20
CU Control Unit
PU Processing Unit
MM Memory Module
DS Data stream
IS Instruction Stream
DS2
DS1
MM2
MM1
MMn
.
.
.
CU
IS
PU2
PU1
PUn
DSn
.
.
.
SIMD Systems
Synchronous parallelism
 SIMD systems presents a
single control unit
 A single instruction
operates simultaneously
on multiple data.
 Array processor and
vector systems fall in this
class
EUDAT Summer School 2017 - High Performance/Throughput Computing 21
22
.
.
.
.
.
.
.
.
.
IS2
CU2
IS2
PU2 MM2
DS2
PU1 MM1
DS1
PUn MMn
DSn
CU1
CUn
ISn
IS1
IS1
ISn
CU Control Unit
PU Processing Unit
MM Memory Module
DS Data stream
IS Instruction Stream
MIMD Systems
Asynchronous parallelism
 Multiple processors
execute different
instructions operating on
different data.
 Represents the
multiprocessor version
of the SIMD class.
 Wide class ranging from
multi-core systems to
large MPP systems.
EUDAT Summer School 2017 - High Performance/Throughput Computing 22
2
3
Moore's Law
• Empirical law which states that the complexity of
devices (number of transistors per square inch in
microprocessors) doubles every 18 months..Gordon
Moore, INTEL co-founder, 1965
• It is estimated that Moore's Law still applies in the near
future but applied to the number of cores per processor
EUDAT Summer School 2017 - High Performance/Throughput Computing 23
The limits of parallelism - Amdahl’s Law
If we have N processors:
taking s as the fraction of the
time spent in the sequential part
of the program (s + p = 1)
les.robertson@cern.ch
s – time spent in a serial
processor on serial parts
of the code
p – time spent in a serial
processor on parts that
could be executed in
parallel
Amdahl, G.M., Validity of single-processor approach to achieving large scale computing capability
Proc. AFIPS Conf., Reston, VA, 1967, pp. 483-485
Speedup =
𝑠+𝑝
𝑠+𝑝/𝑁
Speedup =
1
𝑠+(1−𝑠) /𝑁
→ 1/𝑠
EUDAT Summer School 2017 - High Performance/Throughput Computing
Amdahl’s Law - maximum speedup
max speedup
0
20
40
60
80
100
120
1% 2% 3% 4% 5% 6% 7% 8% 9% 10%
sequential part as percentage of total time
speedup
EUDAT Summer School 2017 - High Performance/Throughput Computing les.robertson@cern.ch
The importance of the network
• Latency – the round-trip time (RTT) to
communicate between two processors
Communications overhead:
• For fine grained parallel programs the
problem is latency, not bandwidth
t
… …
… …
… …
sequential
communications
overhead
parallelisable
Speedup =
𝑠+𝑝
𝑠+𝑐+ 𝑝/𝑁
c = latency + data_transfer_time
EUDAT Summer School 2017 - High Performance/Throughput Computing les.robertson@cern.ch
Aspects of parallelism
 It has been recognized for a long time that
constant performance improvements cannot be
obtained just by increasing factors such as
processor clock speed – parallelism is needed.
 Parallelism can be present at many levels:
 Functional parallelism within the CPU
 Pipelining and vectorization
 Multi-processor and multi-core
 Accelerators
 Parallel I/O
EUDAT Summer School 2017 - High Performance/Throughput Computing 28
Pipelining
 It is a technique where more
instructions, belonging to a stream of
sequential execution, overlap their
execution
 This technique improves the
performance of the single processor
 The concept of pipelining is similar to
that of assembly line in a factory where
in a flow line (pipe) of assembly stations
the elements are assembled in a
continuous flow.
 All the assembly stations must operate
at the same processing speed, otherwise
the station slower becomes the
bottleneck of the entire pipe.
EUDAT Summer School 2017 - High Performance/Throughput Computing 29
30
The stages are
1 Fetch
2 D1 (main decode)
3 D2 (secondary decode, also called translate)
4 EX (execute)
5 WB (write back to registers and memory)
http://www.gamedev.net/page/resources/_/technical/general-programming/a-
journey-through-the-cpu-pipeline-r3115
Instructions Pipelining
EUDAT Summer School 2017 - High Performance/Throughput Computing 30
CPU Vector units
 Vectorization
performed by
dedicated hardware on
chip
 Compiler generates
vector instructions,
when it can, from
programmer’s code
 Important optimization
which can lead to 4x,
8x speedups according
“size” of vector unit
(e.g. 256 bit)
EUDAT Summer School 2017 - High Performance/Throughput Computing 32
CPU vs GPU
• A CPU is designed to handle complex tasks while GPUs only do
one thing well - handle billions of repetitive low level tasks -
originally the rendering of triangles in 3D graphics
• Originally, this was called GPGPU (General Purpose GPU
programming), and it required mapping scientific code to the
matrix operations for manipulating triangles.
• This was insanely difficult to do and took a lot of dedication.
However, with the advent of CUDA and OpenCL, high-level
languages targeting the GPU, GPU programming is rapidly
becoming mainstream in the scientific community.
EUDAT Summer School 2017 - High Performance/Throughput Computing 33
HPC vs HTC
High Performance
• Granularity largely defined
by the algorithm, limitations
in the hardware
• Load balancing difficult
• Reliability is all important
– if one part fails the
calculation stops (maybe
even aborts!)
– check-pointing essential
– hard to dynamically
reconfigure for smaller
number
of processors
High Throughput
• Granularity can be selected to
fit the environment
• Load balancing easy
• Mixing workloads is easy
• Sustained throughput is the
key
goal
– the order in which the
individual tasks execute is
(usually) not important
– if some equipment goes down
the work can be re-run later
– easy to re-schedule
dynamically the workload to
different configurations
EUDAT Summer School 2017 - High Performance/Throughput Computing 34
Granularity
• fine-grained parallelism – the independent bits are
small, need to exchange information, synchronize
often
– large number of more expensive processors
expensively interconnected
• coarse-grained – the problem can be decomposed
into large chunks that can be processed
independently
– large number of inexpensive processors,
inexpensively interconnected
EUDAT Summer School 2017 - High Performance/Throughput Computing
(1) Map Only
(4) Point to Point or
Map-Communication
(3) Iterative Map Reduce
or Map-Collective
(2) Classic
MapReduce
Input
map
reduce
Input
map
reduce
Iterations
Input
Output
map
Local
Graph
BLAST Analysis
Local Machine
Learning
Pleasingly Parallel
High Energy
Physics (HEP)
Histograms
Distributed search
Recommender
Engines
Expectation
maximization
Clustering e.g. K-
means
Linear Algebra,
PageRank
Classic MPI
PDE Solvers and
Particle Dynamics
Graph Problems
MapReduce and Iterative Extensions (Spark, Twister) MPI, Giraph
Integrated Systems such as Hadoop + Harp with
Compute and Communication model separated
Courtesy of Prof. Geoffrey Charles Fox – Indiana University
EUDAT Summer School 2017 - High Performance/Throughput Computing 36
A TYPICAL HPC ALGORITHM
37
Matrix multiplication
• Basic matrix multiplication on a 2-D grid
• Matrix multiplication is an important application in
HPC and appears in many areas (linear algebra)
• C = A * B where A, B, and C are matrices (two-
dimensional arrays)
• A restricted case is when B has only one column,
matrix-vector product, which appears in
representation of linear equations and partial
differential equations
EUDAT Summer School 2017 - High Performance/Throughput Computing 38
C = A x B
EUDAT Summer School 2017 - High Performance/Throughput Computing 39
A TYPICAL HTC ALGORITHM
40
Monte Carlo Methods
• Monte Carlo simulation is a broad term for methods that use
random numbers and statistical sampling to
solve problems, rather than exact modeling.
• Due to the sampling, the result will have some uncertainty,
but the statistical ‘law of large numbers’ will ensure that the
uncertainty goes down as the number of samples grows.
• The statistical law that underlies this is as follows:
– if N independent observations are made of a quantity with
standard deviation σ, then the standard deviation of the
mean is σ=pN. This means that more observations will lead
to more accuracy
• What makes Monte Carlo methods interesting is that the gain
in accuracy is not related to dimensionality of the original
problem.
EUDAT Summer School 2017 - High Performance/Throughput Computing 41
Agia Pelagia gulf area
EUDAT Summer School 2017 - High Performance/Throughput Computing 42
0 1
1
Hyper-Threading Technology
• Hyper-Threading Technology brings the simultaneous multi-
threading approach to the Intel architecture.
• Hyper-Threading Technology makes a single physical processor
appear as two or more logical processors
• Hyper-Threading Technology provides thread-level-parallelism (TLP)
on each processor resulting in increased utilization of processor and
execution resources
• Each logical processor maintain one copy of the architecture state
• While HT improves some performance benchmarks by up to
30%, its benefits are strongly application dependent. For HPC
codes, the consequences of HT also depend on inter-node
communication topology and load balance.
EUDAT Summer School 2017 - High Performance/Throughput Computing 43
EUDAT Summer School 2017 - High Performance/Throughput Computing 44
Reference
EUDAT Summer School 2017 - High Performance/Throughput Computing 45

Mais conteúdo relacionado

Mais procurados

Chap 1 introduction to cloud computing
Chap 1 introduction to cloud computingChap 1 introduction to cloud computing
Chap 1 introduction to cloud computingRaj Sarode
 
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingVTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingSachin Gowda
 
Cloud Application architecture styles
Cloud Application architecture styles Cloud Application architecture styles
Cloud Application architecture styles Nilay Shrivastava
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed SystemSunita Sahu
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systemssumitjain2013
 
blackboard architecture
blackboard architectureblackboard architecture
blackboard architectureNguyễn Ngân
 
Market oriented Cloud Computing
Market oriented Cloud ComputingMarket oriented Cloud Computing
Market oriented Cloud ComputingJithin Parakka
 
Architectural styles and patterns
Architectural styles and patternsArchitectural styles and patterns
Architectural styles and patternsHimanshu
 
Levels of Virtualization.docx
Levels of Virtualization.docxLevels of Virtualization.docx
Levels of Virtualization.docxkumari36
 
Vision of cloud computing
Vision of cloud computingVision of cloud computing
Vision of cloud computinggaurav jain
 
clock synchronization in Distributed System
clock synchronization in Distributed System clock synchronization in Distributed System
clock synchronization in Distributed System Harshita Ved
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computinghuda2018
 
Evolution of Cloud Computing
Evolution of Cloud ComputingEvolution of Cloud Computing
Evolution of Cloud ComputingNephoScale
 
Design Goals of Distributed System
Design Goals of Distributed SystemDesign Goals of Distributed System
Design Goals of Distributed SystemAshish KC
 
01 - Introduction to Distributed Systems
01 - Introduction to Distributed Systems01 - Introduction to Distributed Systems
01 - Introduction to Distributed SystemsDilum Bandara
 

Mais procurados (20)

Chap 1 introduction to cloud computing
Chap 1 introduction to cloud computingChap 1 introduction to cloud computing
Chap 1 introduction to cloud computing
 
Unit 4
Unit 4Unit 4
Unit 4
 
VTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computingVTU 6th Sem Elective CSE - Module 3 cloud computing
VTU 6th Sem Elective CSE - Module 3 cloud computing
 
Cloud Application architecture styles
Cloud Application architecture styles Cloud Application architecture styles
Cloud Application architecture styles
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
 
blackboard architecture
blackboard architectureblackboard architecture
blackboard architecture
 
Market oriented Cloud Computing
Market oriented Cloud ComputingMarket oriented Cloud Computing
Market oriented Cloud Computing
 
Cloud Computing Architecture
Cloud Computing ArchitectureCloud Computing Architecture
Cloud Computing Architecture
 
Architectural styles and patterns
Architectural styles and patternsArchitectural styles and patterns
Architectural styles and patterns
 
Levels of Virtualization.docx
Levels of Virtualization.docxLevels of Virtualization.docx
Levels of Virtualization.docx
 
Vision of cloud computing
Vision of cloud computingVision of cloud computing
Vision of cloud computing
 
clock synchronization in Distributed System
clock synchronization in Distributed System clock synchronization in Distributed System
clock synchronization in Distributed System
 
Data-Intensive Technologies for Cloud Computing
Data-Intensive Technologies for CloudComputingData-Intensive Technologies for CloudComputing
Data-Intensive Technologies for Cloud Computing
 
Evolution of Cloud Computing
Evolution of Cloud ComputingEvolution of Cloud Computing
Evolution of Cloud Computing
 
Lec1,2
Lec1,2Lec1,2
Lec1,2
 
Unit v
Unit vUnit v
Unit v
 
Design Goals of Distributed System
Design Goals of Distributed SystemDesign Goals of Distributed System
Design Goals of Distributed System
 
01 - Introduction to Distributed Systems
01 - Introduction to Distributed Systems01 - Introduction to Distributed Systems
01 - Introduction to Distributed Systems
 
Distributed System ppt
Distributed System pptDistributed System ppt
Distributed System ppt
 

Semelhante a High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe Fiameni, CINECA)

Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
Embedded Intro India05
Embedded Intro India05Embedded Intro India05
Embedded Intro India05Rajesh Gupta
 
Lecture 3
Lecture 3Lecture 3
Lecture 3Mr SMAK
 
Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6Shah Zaib
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale FrontierMultiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontierinside-BigData.com
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computingMehul Patel
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)Nimmi Weeraddana
 
Tutotial 2 answer
Tutotial 2 answerTutotial 2 answer
Tutotial 2 answerUdaya Kumar
 
Forecasting database performance
Forecasting database performanceForecasting database performance
Forecasting database performanceShenglin Du
 
intro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptxintro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptxssuser413a98
 
Unit i-introduction
Unit i-introductionUnit i-introduction
Unit i-introductionakruthi k
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesAmazon Web Services
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptxdhivyak49
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisShah Zaib
 

Semelhante a High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe Fiameni, CINECA) (20)

Nbvtalkatjntuvizianagaram
NbvtalkatjntuvizianagaramNbvtalkatjntuvizianagaram
Nbvtalkatjntuvizianagaram
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Lecture1
Lecture1Lecture1
Lecture1
 
Embedded Intro India05
Embedded Intro India05Embedded Intro India05
Embedded Intro India05
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6
 
Distributed Computing
Distributed ComputingDistributed Computing
Distributed Computing
 
04 performance
04 performance04 performance
04 performance
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale FrontierMultiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
Multiscale Dataflow Computing: Competitive Advantage at the Exascale Frontier
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computing
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)
 
Tutotial 2 answer
Tutotial 2 answerTutotial 2 answer
Tutotial 2 answer
 
Forecasting database performance
Forecasting database performanceForecasting database performance
Forecasting database performance
 
intro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptxintro, definitions, basic laws+.pptx
intro, definitions, basic laws+.pptx
 
Unit i-introduction
Unit i-introductionUnit i-introduction
Unit i-introduction
 
Deep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instancesDeep Dive on Amazon EC2 instances
Deep Dive on Amazon EC2 instances
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
 

Mais de EUDAT

EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdfEUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdfEUDAT
 
EUDAT Booklet Mar22 (2).pdf
EUDAT Booklet Mar22 (2).pdfEUDAT Booklet Mar22 (2).pdf
EUDAT Booklet Mar22 (2).pdfEUDAT
 
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdfEUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdfEUDAT
 
EUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdfEUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdfEUDAT
 
EUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdfEUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdfEUDAT
 
EUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdfEUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdfEUDAT
 
EUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdfEUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdfEUDAT
 
EUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdfEUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdfEUDAT
 
EUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdfEUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdfEUDAT
 
Rob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT servicesRob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT servicesEUDAT
 
Ariyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentationAriyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentationEUDAT
 
Introduction to eudat and its services
Introduction to eudat and its servicesIntroduction to eudat and its services
Introduction to eudat and its servicesEUDAT
 
Using B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto PilotUsing B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto PilotEUDAT
 
OpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last weekOpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last weekEUDAT
 
European Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshopEuropean Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshopEUDAT
 
Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...EUDAT
 
FAIRness of training materials
FAIRness of training materialsFAIRness of training materials
FAIRness of training materialsEUDAT
 
Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...EUDAT
 
Draft Governance Framework for the EOSC
Draft Governance Framework for the EOSCDraft Governance Framework for the EOSC
Draft Governance Framework for the EOSCEUDAT
 
Building Interoperable AAI for Researchers
Building Interoperable AAI for ResearchersBuilding Interoperable AAI for Researchers
Building Interoperable AAI for ResearchersEUDAT
 

Mais de EUDAT (20)

EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdfEUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
EUDAT_Brochure_Generica_Jan_UPDATED(5).pdf
 
EUDAT Booklet Mar22 (2).pdf
EUDAT Booklet Mar22 (2).pdfEUDAT Booklet Mar22 (2).pdf
EUDAT Booklet Mar22 (2).pdf
 
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdfEUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
EUDAT_Brochure_Generica_Jan_UPDATED (1).pdf
 
EUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdfEUDAT Brochure - B2HANDLE.pdf
EUDAT Brochure - B2HANDLE.pdf
 
EUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdfEUDAT Brochure - B2DROP.pdf
EUDAT Brochure - B2DROP.pdf
 
EUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdfEUDAT Brochure - B2SHARE.pdf
EUDAT Brochure - B2SHARE.pdf
 
EUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdfEUDAT Brochure - B2SAFE.pdf
EUDAT Brochure - B2SAFE.pdf
 
EUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdfEUDAT Brochure - B2FIND(1).pdf
EUDAT Brochure - B2FIND(1).pdf
 
EUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdfEUDAT Brochure - B2ACCESS.pdf
EUDAT Brochure - B2ACCESS.pdf
 
Rob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT servicesRob Carrillo - Writing effective service documentation for EUDAT services
Rob Carrillo - Writing effective service documentation for EUDAT services
 
Ariyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentationAriyo - EUDAT CDI B2 services documentation
Ariyo - EUDAT CDI B2 services documentation
 
Introduction to eudat and its services
Introduction to eudat and its servicesIntroduction to eudat and its services
Introduction to eudat and its services
 
Using B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto PilotUsing B2NOTE: The U.Porto Pilot
Using B2NOTE: The U.Porto Pilot
 
OpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last weekOpenAIRE Advance - Kick off last week
OpenAIRE Advance - Kick off last week
 
European Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshopEuropean Open Science Cloud - Skills workshop
European Open Science Cloud - Skills workshop
 
Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...
 
FAIRness of training materials
FAIRness of training materialsFAIRness of training materials
FAIRness of training materials
 
Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...Training by EOSC-hub - Integrating and Managing services for the European Ope...
Training by EOSC-hub - Integrating and Managing services for the European Ope...
 
Draft Governance Framework for the EOSC
Draft Governance Framework for the EOSCDraft Governance Framework for the EOSC
Draft Governance Framework for the EOSC
 
Building Interoperable AAI for Researchers
Building Interoperable AAI for ResearchersBuilding Interoperable AAI for Researchers
Building Interoperable AAI for Researchers
 

Último

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 

Último (20)

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 

High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe Fiameni, CINECA)

  • 1. High Performance & High Throughput Computing Giuseppe Fiameni – g.Fiameni@cineca.it http://hackaday.com/2016/07/13/nirvana-like-youve-never-heard-them-before/
  • 2. Agenda • Computational Problem • System architecture • Serial vs Parallel • Levels of parallelism • Models for parallel programming • HPC and HTC examples EUDAT Summer School 2017 - High Performance/Throughput Computing 2
  • 3. What does it make a problem be relevant from the computational point of view?
  • 4. Computational Dimension: number of operations needed to solve the problem, in general is a function of the size of the involved data structures (n, n2 , n3 , n log n, etc.) flop - Floating point operations indicates an arithmetic floating point operation. flop/s - Floating points operations per second is a unit to measure the speed of a computer. Computational problems today: 1015 – 1022 flop One year has about 3 x 107 seconds! Most powerful computers today have reach a sustained performance is of the order of Tflop/s - Pflop/s (1012 - 1015 flop/s). Size of computational applications EUDAT Summer School 2017 - High Performance/Throughput Computing 4
  • 5. Example: Weather Prediction Forecasts on a global scale:  3D Grid to represent the Earth  Earth's circumference:  40000 km  radius:  6370 km  Earth's surface:  4πr2  5•108 km2  6 variables:  temperature  pressure  humidity  wind speed in the 3 Cartesian directions  cells of 1 km on each side  100 slices to see how the variables evolve on the different levels of the atmosphere  a 30 seconds time step is required for the simulation with such resolution  Each cell requires about 1000 operations per time step (Navier-Stokes turbulence and various phenomena) In physics, the Navier–Stokes equations /nævˈjeɪ stoʊks/, named after Claude-Louis Navier and George Gabriel Stokes, describe the motion of viscous fluid substances. EUDAT Summer School 2017 - High Performance/Throughput Computing 5
  • 6. Grid: 5 x 108 x 100 = 5 x 1010 cells  each cell is represented with 8 Byte  Memory space:  (6 var) x (8 Byte) x (5 x 1010 cells)  2 x 1012 Byte = 2TB A 24 hours forecast needs: • 24 x 60 x 2  3 x 103 time-step • (5 x 1010 cells) x (103 oper.) x (3 x 103 time-steps) = 1.5 x 1017 operations ! A computer with a power of 1Tflop/s will take 1.5 x 105 sec. Example: Weather Prediction (cont.) 24 hours forecast will need 2 days to run ... but we shall obtain a very accurate forecast! EUDAT Summer School 2017 - High Performance/Throughput Computing 6
  • 7. John Von Neumann EUDAT Summer School 2017 - High Performance/Throughput Computing 7 https://en.wikipedia.org/wiki/Von_Neumann_architecture
  • 8. What if you want increase performance? • Increase the clock rate – number of clock cycles per second (peacemaker) • Increase the instruction rate – number of basic instructions that can be executed per clock cycle – Pipeling, vectorization • Reduce access time to data – Having frequently used data on the processor caches significantly improve computational performance EUDAT Summer School 2017 - High Performance/Throughput Computing 8
  • 9. The clock cycle  is defined as the time between two adjacent pulses of oscillator that sets the time of the processor. The number of these pulses per second is known as clock speed or clock frequency, generally measured in GHz (gigahertz, or billions of pulses per second). The clock cycle controls the synchronization of operations in a computer: All the operations inside the processor last a multiple of . Processor  (ns) freq (MHz) CDC 6600 100 10 Cyber 76 27.5 36,3 IBM ES 9000 9 111 Cray Y-MP C90 4.1 244 Intel i860 20 50 PC Pentium < 0.5 > 2 GHz Power PC 1.17 850 IBM Power 5 0.52 1.9 GHz IBM Power 6 0.21 4.7 GHz Increasing the clock frequency: The speed of light sets an upper limit to the speed with which electronic components can operate . Propagation velocity of a signal in a vacuum: 300.000 Km/s = 30 cm/ns Heat dissipation problems inside the processor. Also Quantum tunneling expected to become important. Speed of Processors: Clock Cycle and Frequency EUDAT Summer School 2017 - High Performance/Throughput Computing 9
  • 10. Memory access time: the time required by the processor to access data or to write data from / to memory The hierarchy exists because :  fast memory is expensive and small  slow memory is cheap and big Latency  how long do I have to wait for the data?  (cannot do anything while waiting) Throughput  how many bytes/second. but not important if waiting. Total time = latency + (amount of data / throughput) Time to run code = clock cycles running code + clock cycles waiting for memory Memory hierarchies EUDAT Summer School 2017 - High Performance/Throughput Computing 10
  • 11. Loop unrolling for (i=0; i<N; i++) s += a[i]*b[i] EUDAT Summer School 2017 - High Performance/Throughput Computing 11 for (i=0; i<N; i += 2) s += a[i]*b[i] + a[i+1]*b[i+1]
  • 12. Why Have Cache? EUDAT Summer School 2017 - High Performance/Throughput Computing 12 CPU 351 GB/sec 10.66 GB/sec (3%) 253 GB/sec (72%)
  • 13. Parallelism  Serial Process:  A process in which its sub-processes happen sequentially in time.  Speed depends only on the rate at which each sub-process will occur (e.g. processing unit clock speed).  Parallel Process:  Process in which multiple sub-processes can be active simultaneously.  Speed depends on execution rate of each sub- process AND how many sub-processes can be made to occur simultaneously. EUDAT Summer School 2017 - High Performance/Throughput Computing 13
  • 14. Parallel for Capacity • Example: Searching database of web-sites for some text (i.e. Google) • Searching sequentially through large volumes of text too time-consuming – Multiple servers hold different pages – Each server can report the result of each individual search – The more servers you add the quicker the search is EUDAT Summer School 2017 - High Performance/Throughput Computing 14
  • 15. Parallel for Capability • Example: Large scale weather simulation • Detailed description for atmosphere too large to run on today's desktop – Multiple servers are needed to hold all grid data in memory – Servers need to quickly communicate to synchronise work over entire grid – Communication between servers can become a bottleneck. Latency really matters! EUDAT Summer School 2017 - High Performance/Throughput Computing 15
  • 16. Capability and capacity • Parallel for Capability => High Performance Computing – deals with one large problem – problems that are hard to parallelise, not embarrassing parallel – single processor performance is important • Parallel for Capacity => High Throughput Computing – It is appropriate when the problem can be decomposed into many (very many) smaller problems that are essentially independent – problems that are easy to parallelise, parameter swapping – can be adapted to very large numbers of processors EUDAT Summer School 2017 - High Performance/Throughput Computing 16
  • 17. Systems main architecture • Single processor, single core EUDAT Summer School 2017 - High Performance/Throughput Computing 17 • Dual processor, single core • Single processor, dual core
  • 18. Multi core architecture • Modern architecture can go up to 34 cores per processor and to 2/4 processors per node EUDAT Summer School 2017 - High Performance/Throughput Computing 18
  • 19. Flynn Taxonomy • A computer architecture is categorized by the multiplicity of hardware used to manipulate streams of instructions (sequence of instructions executed by the computer) and streams of data (sequence of data used to execute a stream of instructions). EUDAT Summer School 2017 - High Performance/Throughput Computing 20
  • 20. CU Control Unit PU Processing Unit MM Memory Module DS Data stream IS Instruction Stream DS2 DS1 MM2 MM1 MMn . . . CU IS PU2 PU1 PUn DSn . . . SIMD Systems Synchronous parallelism  SIMD systems presents a single control unit  A single instruction operates simultaneously on multiple data.  Array processor and vector systems fall in this class EUDAT Summer School 2017 - High Performance/Throughput Computing 21
  • 21. 22 . . . . . . . . . IS2 CU2 IS2 PU2 MM2 DS2 PU1 MM1 DS1 PUn MMn DSn CU1 CUn ISn IS1 IS1 ISn CU Control Unit PU Processing Unit MM Memory Module DS Data stream IS Instruction Stream MIMD Systems Asynchronous parallelism  Multiple processors execute different instructions operating on different data.  Represents the multiprocessor version of the SIMD class.  Wide class ranging from multi-core systems to large MPP systems. EUDAT Summer School 2017 - High Performance/Throughput Computing 22
  • 22. 2 3 Moore's Law • Empirical law which states that the complexity of devices (number of transistors per square inch in microprocessors) doubles every 18 months..Gordon Moore, INTEL co-founder, 1965 • It is estimated that Moore's Law still applies in the near future but applied to the number of cores per processor EUDAT Summer School 2017 - High Performance/Throughput Computing 23
  • 23. The limits of parallelism - Amdahl’s Law If we have N processors: taking s as the fraction of the time spent in the sequential part of the program (s + p = 1) les.robertson@cern.ch s – time spent in a serial processor on serial parts of the code p – time spent in a serial processor on parts that could be executed in parallel Amdahl, G.M., Validity of single-processor approach to achieving large scale computing capability Proc. AFIPS Conf., Reston, VA, 1967, pp. 483-485 Speedup = 𝑠+𝑝 𝑠+𝑝/𝑁 Speedup = 1 𝑠+(1−𝑠) /𝑁 → 1/𝑠 EUDAT Summer School 2017 - High Performance/Throughput Computing
  • 24. Amdahl’s Law - maximum speedup max speedup 0 20 40 60 80 100 120 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% sequential part as percentage of total time speedup EUDAT Summer School 2017 - High Performance/Throughput Computing les.robertson@cern.ch
  • 25. The importance of the network • Latency – the round-trip time (RTT) to communicate between two processors Communications overhead: • For fine grained parallel programs the problem is latency, not bandwidth t … … … … … … sequential communications overhead parallelisable Speedup = 𝑠+𝑝 𝑠+𝑐+ 𝑝/𝑁 c = latency + data_transfer_time EUDAT Summer School 2017 - High Performance/Throughput Computing les.robertson@cern.ch
  • 26. Aspects of parallelism  It has been recognized for a long time that constant performance improvements cannot be obtained just by increasing factors such as processor clock speed – parallelism is needed.  Parallelism can be present at many levels:  Functional parallelism within the CPU  Pipelining and vectorization  Multi-processor and multi-core  Accelerators  Parallel I/O EUDAT Summer School 2017 - High Performance/Throughput Computing 28
  • 27. Pipelining  It is a technique where more instructions, belonging to a stream of sequential execution, overlap their execution  This technique improves the performance of the single processor  The concept of pipelining is similar to that of assembly line in a factory where in a flow line (pipe) of assembly stations the elements are assembled in a continuous flow.  All the assembly stations must operate at the same processing speed, otherwise the station slower becomes the bottleneck of the entire pipe. EUDAT Summer School 2017 - High Performance/Throughput Computing 29
  • 28. 30 The stages are 1 Fetch 2 D1 (main decode) 3 D2 (secondary decode, also called translate) 4 EX (execute) 5 WB (write back to registers and memory) http://www.gamedev.net/page/resources/_/technical/general-programming/a- journey-through-the-cpu-pipeline-r3115 Instructions Pipelining EUDAT Summer School 2017 - High Performance/Throughput Computing 30
  • 29. CPU Vector units  Vectorization performed by dedicated hardware on chip  Compiler generates vector instructions, when it can, from programmer’s code  Important optimization which can lead to 4x, 8x speedups according “size” of vector unit (e.g. 256 bit) EUDAT Summer School 2017 - High Performance/Throughput Computing 32
  • 30. CPU vs GPU • A CPU is designed to handle complex tasks while GPUs only do one thing well - handle billions of repetitive low level tasks - originally the rendering of triangles in 3D graphics • Originally, this was called GPGPU (General Purpose GPU programming), and it required mapping scientific code to the matrix operations for manipulating triangles. • This was insanely difficult to do and took a lot of dedication. However, with the advent of CUDA and OpenCL, high-level languages targeting the GPU, GPU programming is rapidly becoming mainstream in the scientific community. EUDAT Summer School 2017 - High Performance/Throughput Computing 33
  • 31. HPC vs HTC High Performance • Granularity largely defined by the algorithm, limitations in the hardware • Load balancing difficult • Reliability is all important – if one part fails the calculation stops (maybe even aborts!) – check-pointing essential – hard to dynamically reconfigure for smaller number of processors High Throughput • Granularity can be selected to fit the environment • Load balancing easy • Mixing workloads is easy • Sustained throughput is the key goal – the order in which the individual tasks execute is (usually) not important – if some equipment goes down the work can be re-run later – easy to re-schedule dynamically the workload to different configurations EUDAT Summer School 2017 - High Performance/Throughput Computing 34
  • 32. Granularity • fine-grained parallelism – the independent bits are small, need to exchange information, synchronize often – large number of more expensive processors expensively interconnected • coarse-grained – the problem can be decomposed into large chunks that can be processed independently – large number of inexpensive processors, inexpensively interconnected EUDAT Summer School 2017 - High Performance/Throughput Computing
  • 33. (1) Map Only (4) Point to Point or Map-Communication (3) Iterative Map Reduce or Map-Collective (2) Classic MapReduce Input map reduce Input map reduce Iterations Input Output map Local Graph BLAST Analysis Local Machine Learning Pleasingly Parallel High Energy Physics (HEP) Histograms Distributed search Recommender Engines Expectation maximization Clustering e.g. K- means Linear Algebra, PageRank Classic MPI PDE Solvers and Particle Dynamics Graph Problems MapReduce and Iterative Extensions (Spark, Twister) MPI, Giraph Integrated Systems such as Hadoop + Harp with Compute and Communication model separated Courtesy of Prof. Geoffrey Charles Fox – Indiana University EUDAT Summer School 2017 - High Performance/Throughput Computing 36
  • 34. A TYPICAL HPC ALGORITHM 37
  • 35. Matrix multiplication • Basic matrix multiplication on a 2-D grid • Matrix multiplication is an important application in HPC and appears in many areas (linear algebra) • C = A * B where A, B, and C are matrices (two- dimensional arrays) • A restricted case is when B has only one column, matrix-vector product, which appears in representation of linear equations and partial differential equations EUDAT Summer School 2017 - High Performance/Throughput Computing 38
  • 36. C = A x B EUDAT Summer School 2017 - High Performance/Throughput Computing 39
  • 37. A TYPICAL HTC ALGORITHM 40
  • 38. Monte Carlo Methods • Monte Carlo simulation is a broad term for methods that use random numbers and statistical sampling to solve problems, rather than exact modeling. • Due to the sampling, the result will have some uncertainty, but the statistical ‘law of large numbers’ will ensure that the uncertainty goes down as the number of samples grows. • The statistical law that underlies this is as follows: – if N independent observations are made of a quantity with standard deviation σ, then the standard deviation of the mean is σ=pN. This means that more observations will lead to more accuracy • What makes Monte Carlo methods interesting is that the gain in accuracy is not related to dimensionality of the original problem. EUDAT Summer School 2017 - High Performance/Throughput Computing 41
  • 39. Agia Pelagia gulf area EUDAT Summer School 2017 - High Performance/Throughput Computing 42 0 1 1
  • 40. Hyper-Threading Technology • Hyper-Threading Technology brings the simultaneous multi- threading approach to the Intel architecture. • Hyper-Threading Technology makes a single physical processor appear as two or more logical processors • Hyper-Threading Technology provides thread-level-parallelism (TLP) on each processor resulting in increased utilization of processor and execution resources • Each logical processor maintain one copy of the architecture state • While HT improves some performance benchmarks by up to 30%, its benefits are strongly application dependent. For HPC codes, the consequences of HT also depend on inter-node communication topology and load balance. EUDAT Summer School 2017 - High Performance/Throughput Computing 43
  • 41. EUDAT Summer School 2017 - High Performance/Throughput Computing 44
  • 42. Reference EUDAT Summer School 2017 - High Performance/Throughput Computing 45