SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
Parallel Computing
Mohamed Zahran (aka Z)
mzahran@cs.nyu.edu
http://www.mzahran.com
CSCI-UA.0480-003
Performance Analysis
Defining Performance
• Which airplane has the best performance?
0 100 200 300 400 500
Douglas
DC-8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Passenger Capacity
0 2000 4000 6000 8000 10000
Douglas DC-
8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Cruising Range (miles)
0 500 1000 1500
Douglas
DC-8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Cruising Speed (mph)
0 100000 200000 300000 400000
Douglas DC-
8-50
BAC/Sud
Concorde
Boeing 747
Boeing 777
Passengers x mph
• For some program running on machine X,
PerformanceX = 1 / Execution timeX
• "X is n times faster than Y"
PerformanceX / PerformanceY = n
Standard Definition of Performance
 Example: time taken to run a program
 10s on A, 15s on B
 Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
 So A is 1.5 times faster than B
Speedup
• Number of cores = p
• Serial run-time = Tserial
• Parallel run-time = Tparallel
Copyright © 2010, Elsevier Inc.
All rights Reserved
Tserial
Tparallel
S =
Example
Processor 1
time
100
time
1 2 3 4
25 25 25 25 time
1 2 3 4
35 35 35 35
,0.4
25
100
pS
balancingloadperfect
,85.2
35
100
pS
Perfect parallelization!
Does it ever occur?
synch cost
Example (cont.)
time
1 2 3 4
30 20 40 10
time
1 2 3 4
50 50 50 50
imbalanceload
,5.2
40
100
pS
costsyncand
imbalanceload
,0.2
50
100
pS
closest to
real life
parallel programs
Processor 1
time
100
A Glimpse at the Top 500 (Nov 2019 List)
Rmax: maximum achieved
Rpeak: theoretical maximum
Sources of Parallel Overheads
• Overhead of creating threads/processes
• Synchronization
• Load imbalance
• Communication
• Extra computation
• Memory access (for both sequential and
parallel!)
Efficiency of a parallel program
Copyright © 2010, Elsevier Inc.
All rights Reserved
E =
Tserial
Tparallel
Speedup
P
=
p
P = can be number of threads, processes, or cores
Be Careful about T
• Both Tserial and Tpar are wall-clock times, and as
such they are not objective. They can be influenced
by :
– The skill of the programmer who wrote the
implementations
– The choice of compiler (e.g. GNU C++ versus Intel C++)
– The compiler switches (e.g. turning optimization
on/off)
– The operating system
– The type of filesystem holding the input data (e.g.
EXT4, NTFS, etc.)
– The time of day... (different workloads, network
traffic, etc.)
Speedup
Copyright © 2010, Elsevier Inc.
All rights Reserved
Program: Matrix Vector Multiplication
Graph shows how speedup is affected by the number of processes and problem size.
Efficiency
Copyright © 2010, Elsevier Inc.
All rights Reserved
Program: Matrix Vector Multiplication
Scalability
 Scalability is the ability of a (software or
hardware) system to handle a growing amount
of work efficiently.
• If we keep the efficiency fixed by increasing
the number of processes/threads and without
increasing problem size, the problem is
strongly scalable.
• If we keep the efficiency fixed by increasing
the problem size at the same rate as we
increase the number of processes/threads,
the problem is weakly scalable.
Copyright © 2010, Elsevier Inc.
All rights Reserved
Let’s take a closer look at timing.
Taking Timings
• What is time?
• Start to finish?
• A program segment of interest?
• CPU time?
• Wall clock time?
Copyright © 2010, Elsevier Inc.
All rights Reserved
• Elapsed Time (aka wall-clock time)
– counts everything (disk and memory accesses,
I/O , etc.)
– a useful number, but often not good for
comparison purposes
• CPU time
– doesn't count I/O or time spent running other
programs
– can be broken up into system time, and user time
• Our focus: user CPU time
– time spent executing the lines of code that are
"in" our program
Execution Time
Execution Time (Elapsed Time)
I/O Time Disk and Memory timeCPU Time
User CPU Time System CPU Time
Taking Timings
In Linux:
time prog
Returns
real Xs
user Ys
sys Zs
Inside your C program:
clock_t clock(void)returns the number of clock ticks elapsed
since the program started
#include <time.h>
#include <stdio.h>
int main() {
clock_t start, end, total;
int i;
start = clock();
for(i=0; i< 10000000; i++) { }
end = clock();
total= (double)(end – start) / CLOCKS_PER_SEC;
printf("Total time taken by CPU: %fn", total);
}
Let’s Look at Two Simple Metrics
• Response time (aka Execution Time)
– The time between the start and completion
of a task
• Throughput
– Total amount of work done in a given time
What is the relationship between execution time and throughput?
Timing for sequential programs
IPC
MIPS
CPI
Execution
Time
# insts
seconds
program

cycles
program

seconds
cycle
ET
IC * CPI
CT
ET = IC X CPI X CT
ET = Execution Time
CPI = Cycles Per Instruction
IC = Instruction Count
Execution time for sequential program:
Example
A program runs in 10 seconds on computer A, which has a 4 GHz.
clock. We are trying to help a computer designer build a new
machine B, that will run this program in 6 seconds. The designer
can use new (or perhaps more expensive) technology to
substantially increase the clock rate, but has informed us that this
increase will affect the rest of the CPU design, causing machine B
to require 1.2 times as many clock cycles as machine A for the same
program. What clock rate should we tell the designer to target?“
CPI Example
• Suppose we have two implementations of the same instruction
set architecture (ISA).
For some program,
Machine A has a clock cycle time of 250 ps and a CPI of 2.0
Machine B has a clock cycle time of 500 ps and a CPI of 1.2
What machine is faster for this program, and by how much?
[ 10-3 = milli, 10-6 = micro, 10-9 = nano, 10-12 = pico, 10-15 = femto ]
#Instructions Example
• A compiler designer is trying to decide between two code sequences
for a particular machine. Based on the hardware implementation,
there are three different classes of instructions: Class A, Class B,
and Class C, and they require one, two, and three cycles
(respectively).
The first code sequence has 5 instructions:
2 of A, 1 of B, and 2 of C
The second sequence has 6 instructions:
4 of A, 1 of B, and 1 of C.
Which sequence will be faster? How much?
What is the CPI for each sequence?
MIPS Example
• Two different compilers are being tested for a 4 GHz. machine
with three different classes of instructions: Class A, Class B,
and Class C, which require one, two, and three cycles
(respectively). Both compilers are used to produce code for a
large piece of software.
The first compiler's code uses 5 million Class A instructions, 1
million Class B instructions, and 1 million Class C instructions.
The second compiler's code uses 10 million Class A instructions,
1 million Class B instructions, and 1 million Class C instructions.
• Which sequence will be faster according to MIPS?
• Which sequence will be faster according to execution time?
Pitfalls in timing in Parallel
Machines
For Multithreaded Programs
• You need to decide: Shall we use execution time or
throughput? or both?
• IPC (Instructions per Cycle) is not accurate here
– small timing variations may lead to different execution
time
– Order at which threads enter critical section may vary
– Different interrupt timing may lead to different
scheduling decisions
The total number of instructions executed may
be different across different runs!
For Multithreaded Programs
The total number of instructions executed may be different
across different runs!
This effect increases with
the number of cores
System-level code account for a significant fraction of the
total execution time
Your Program Does Not Run in A Vacuum
• OS at least is there.
• Multi-programming and/or mulithreading
setting is very common in multicore
settings
• Independent programs affect each
other performance (why?)
How to check the performance of
a parallel machine?
• Performance best determined by running a real
application
– Use programs typical of expected workload
– Or, typical of expected class of applications
– e.g., compilers/editors, scientific applications, graphics, etc.
• Small benchmarks
– nice for architects and designers
– easy to standardize
• Parallel Benchmarks: PARSEC, Rodinia, SPLASH-2
• SPEC (System Performance Evaluation Cooperative)
– companies have agreed on a set of real program and inputs
– valuable indicator of performance (and compiler technology)
Benchmarks
Role of Benchmarks
• help designer explore architectural
designs
• identify bottlenecks
• compare different systems
• conduct performance prediction
Example: PARSEC
• Princeton Application Repository for Shared-Memory
Computers
• Benchmark Suite for Chip-Multiprocessors
• Freely available at: http://parsec.cs.princeton.edu/
• Objectives:
– Multithreaded Applications: Future programs must run on
multiprocessors
– Emerging Workloads: Increasing CPU performance enables
new applications
– Diverse: Multiprocessors are being used for more and more
tasks
– State-of-Art Techniques: Algorithms and programming
techniques evolve rapidly
Example: PARSEC
Program Application Domain Parallelization
Blackscholes Financial Analysis Data-parallel
Bodytrack Computer Vision Data-parallel
Canneal Engineering Unstructured
Dedup Enterprise Storage Pipeline
Facesim Animation Data-parallel
Ferret Similarity Search Pipeline
Fluidanimate Animation Data-parallel
Freqmine Data Mining Data-parallel
Streamcluster Data Mining Data-parallel
Swaptions Financial Analysis Data-parallel
Vips Media Processing Data-parallel
X264 Media Processing Pipeline
Example: Rodinia
• A Benchmark Suite for Heterogeneous
Computing: multicore CPU and GPU
• University of Virginia
Conclusions
• Performance evaluation is very important to assess
programming quality as well as the underlying
architecture and how they interact.
• The following capture some aspects of the system but
do not represent overall performance: MIPS,
#instructions, #cycles, frequency
• Execution time is what matters: system time, CPU time,
I/O and memory time
– To know whether your execution time is good, you need to compare it with
sequential code, another parallel code, etc.
• Scalability and efficiency measure the quality of your code.
Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)

Mais conteúdo relacionado

Mais procurados

Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...Filip Krikava
 
MLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performanceMLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performancejemin lee
 
ACTRESS: Domain-Specific Modeling of Self-Adaptive Software Architectures
ACTRESS: Domain-Specific Modeling of Self-Adaptive Software ArchitecturesACTRESS: Domain-Specific Modeling of Self-Adaptive Software Architectures
ACTRESS: Domain-Specific Modeling of Self-Adaptive Software ArchitecturesFilip Krikava
 
Pipeline Mechanism
Pipeline MechanismPipeline Mechanism
Pipeline MechanismAshik Iqbal
 
Track A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMTrack A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMchiportal
 
Advanced pipelining
Advanced pipeliningAdvanced pipelining
Advanced pipelininga_spac
 
Pipeline & Nonpipeline Processor
Pipeline & Nonpipeline ProcessorPipeline & Nonpipeline Processor
Pipeline & Nonpipeline ProcessorSmit Shah
 
Compiler Optimization-Space Exploration
Compiler Optimization-Space ExplorationCompiler Optimization-Space Exploration
Compiler Optimization-Space Explorationtmusabbir
 
Next Generation MPICH: What to Expect - Lightweight Communication and More
Next Generation MPICH: What to Expect - Lightweight Communication and MoreNext Generation MPICH: What to Expect - Lightweight Communication and More
Next Generation MPICH: What to Expect - Lightweight Communication and MoreIntel® Software
 
Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...
Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...
Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...Hsien-Hsin Sean Lee, Ph.D.
 
AICamp - Dr Ramine Tinati - Making Computer Vision Real
AICamp - Dr Ramine Tinati - Making Computer Vision RealAICamp - Dr Ramine Tinati - Making Computer Vision Real
AICamp - Dr Ramine Tinati - Making Computer Vision RealRamine Tinati
 
LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)Linaro
 
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachjemin lee
 
Inside Dolphin Test Infrastructure
Inside Dolphin Test InfrastructureInside Dolphin Test Infrastructure
Inside Dolphin Test InfrastructureTao Jiang
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System AchitectureYashiUpadhyay3
 

Mais procurados (20)

Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
Integrating Adaptation Mechanisms Using Control Theory Centric Architecture M...
 
MLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performanceMLPerf an industry standard benchmark suite for machine learning performance
MLPerf an industry standard benchmark suite for machine learning performance
 
ACTRESS: Domain-Specific Modeling of Self-Adaptive Software Architectures
ACTRESS: Domain-Specific Modeling of Self-Adaptive Software ArchitecturesACTRESS: Domain-Specific Modeling of Self-Adaptive Software Architectures
ACTRESS: Domain-Specific Modeling of Self-Adaptive Software Architectures
 
Pipelining in computer architecture
Pipelining in computer architecturePipelining in computer architecture
Pipelining in computer architecture
 
Pipeline Mechanism
Pipeline MechanismPipeline Mechanism
Pipeline Mechanism
 
Track A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBMTrack A-Compilation guiding and adjusting - IBM
Track A-Compilation guiding and adjusting - IBM
 
Pipeline Computing by S. M. Risalat Hasan Chowdhury
Pipeline Computing by S. M. Risalat Hasan ChowdhuryPipeline Computing by S. M. Risalat Hasan Chowdhury
Pipeline Computing by S. M. Risalat Hasan Chowdhury
 
Advanced pipelining
Advanced pipeliningAdvanced pipelining
Advanced pipelining
 
Pipeline & Nonpipeline Processor
Pipeline & Nonpipeline ProcessorPipeline & Nonpipeline Processor
Pipeline & Nonpipeline Processor
 
Compiler Optimization-Space Exploration
Compiler Optimization-Space ExplorationCompiler Optimization-Space Exploration
Compiler Optimization-Space Exploration
 
Coa.ppt2
Coa.ppt2Coa.ppt2
Coa.ppt2
 
Pipelining slides
Pipelining slides Pipelining slides
Pipelining slides
 
Next Generation MPICH: What to Expect - Lightweight Communication and More
Next Generation MPICH: What to Expect - Lightweight Communication and MoreNext Generation MPICH: What to Expect - Lightweight Communication and More
Next Generation MPICH: What to Expect - Lightweight Communication and More
 
Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...
Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...
Lec0 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech ECE -- Introdu...
 
AICamp - Dr Ramine Tinati - Making Computer Vision Real
AICamp - Dr Ramine Tinati - Making Computer Vision RealAICamp - Dr Ramine Tinati - Making Computer Vision Real
AICamp - Dr Ramine Tinati - Making Computer Vision Real
 
LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)LAS16-TR04: Using tracing to tune and optimize EAS (English)
LAS16-TR04: Using tracing to tune and optimize EAS (English)
 
Efficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approachEfficient execution of quantized deep learning models a compiler approach
Efficient execution of quantized deep learning models a compiler approach
 
Pipeline hazard
Pipeline hazardPipeline hazard
Pipeline hazard
 
Inside Dolphin Test Infrastructure
Inside Dolphin Test InfrastructureInside Dolphin Test Infrastructure
Inside Dolphin Test Infrastructure
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System Achitecture
 

Semelhante a Parallel Computing - Lec 6

L-2 (Computer Performance).ppt
L-2 (Computer Performance).pptL-2 (Computer Performance).ppt
L-2 (Computer Performance).pptImranKhan997082
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)Nimmi Weeraddana
 
Cse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionShobha Kumar
 
Multicore_Architecture Book.pdf
Multicore_Architecture Book.pdfMulticore_Architecture Book.pdf
Multicore_Architecture Book.pdfSwatantraPrakash5
 
L07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptxL07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptxIsaac383415
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesMurtadha Alsabbagh
 
Lecture 3
Lecture 3Lecture 3
Lecture 3Mr SMAK
 
Evaluation of morden computer & system attributes in ACA
Evaluation of morden computer &  system attributes in ACAEvaluation of morden computer &  system attributes in ACA
Evaluation of morden computer & system attributes in ACAPankaj Kumar Jain
 
Performance of processor.ppt
Performance of processor.pptPerformance of processor.ppt
Performance of processor.pptnivedita murugan
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Cpu performance matrix
Cpu performance matrixCpu performance matrix
Cpu performance matrixRehman baig
 
Daniel dauwe ece 561 Trial 3
Daniel dauwe   ece 561 Trial 3Daniel dauwe   ece 561 Trial 3
Daniel dauwe ece 561 Trial 3cinedan
 
Daniel dauwe ece 561 Benchmarking Results
Daniel dauwe   ece 561 Benchmarking ResultsDaniel dauwe   ece 561 Benchmarking Results
Daniel dauwe ece 561 Benchmarking Resultscinedan
 
Daniel dauwe ece 561 Benchmarking Results Trial 2
Daniel dauwe   ece 561 Benchmarking Results Trial 2Daniel dauwe   ece 561 Benchmarking Results Trial 2
Daniel dauwe ece 561 Benchmarking Results Trial 2cinedan
 
Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 EstimationLawrence Bernstein
 

Semelhante a Parallel Computing - Lec 6 (20)

L-2 (Computer Performance).ppt
L-2 (Computer Performance).pptL-2 (Computer Performance).ppt
L-2 (Computer Performance).ppt
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)
 
Cse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solutionCse viii-advanced-computer-architectures-06cs81-solution
Cse viii-advanced-computer-architectures-06cs81-solution
 
Multicore_Architecture Book.pdf
Multicore_Architecture Book.pdfMulticore_Architecture Book.pdf
Multicore_Architecture Book.pdf
 
L07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptxL07_performance and cost in advanced hardware- computer architecture.pptx
L07_performance and cost in advanced hardware- computer architecture.pptx
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 
Evaluation of morden computer & system attributes in ACA
Evaluation of morden computer &  system attributes in ACAEvaluation of morden computer &  system attributes in ACA
Evaluation of morden computer & system attributes in ACA
 
Performance of processor.ppt
Performance of processor.pptPerformance of processor.ppt
Performance of processor.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Cpu performance matrix
Cpu performance matrixCpu performance matrix
Cpu performance matrix
 
Ch1
Ch1Ch1
Ch1
 
Ch1
Ch1Ch1
Ch1
 
Daniel dauwe ece 561 Trial 3
Daniel dauwe   ece 561 Trial 3Daniel dauwe   ece 561 Trial 3
Daniel dauwe ece 561 Trial 3
 
Daniel dauwe ece 561 Benchmarking Results
Daniel dauwe   ece 561 Benchmarking ResultsDaniel dauwe   ece 561 Benchmarking Results
Daniel dauwe ece 561 Benchmarking Results
 
Daniel dauwe ece 561 Benchmarking Results Trial 2
Daniel dauwe   ece 561 Benchmarking Results Trial 2Daniel dauwe   ece 561 Benchmarking Results Trial 2
Daniel dauwe ece 561 Benchmarking Results Trial 2
 
Autosar Basics hand book_v1
Autosar Basics  hand book_v1Autosar Basics  hand book_v1
Autosar Basics hand book_v1
 
Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 Estimation
 
ERTS_Unit 1_PPT.pdf
ERTS_Unit 1_PPT.pdfERTS_Unit 1_PPT.pdf
ERTS_Unit 1_PPT.pdf
 

Mais de Shah Zaib

Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisShah Zaib
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5Shah Zaib
 
Parallel Computing - Lec 4
Parallel Computing - Lec 4Parallel Computing - Lec 4
Parallel Computing - Lec 4Shah Zaib
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3Shah Zaib
 
Parallel Computing - Lec 2
Parallel Computing - Lec 2Parallel Computing - Lec 2
Parallel Computing - Lec 2Shah Zaib
 
Mpi collective communication operations
Mpi collective communication operationsMpi collective communication operations
Mpi collective communication operationsShah Zaib
 

Mais de Shah Zaib (10)

Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
 
MPI - 4
MPI - 4MPI - 4
MPI - 4
 
MPI - 3
MPI - 3MPI - 3
MPI - 3
 
MPI - 2
MPI - 2MPI - 2
MPI - 2
 
MPI - 1
MPI - 1MPI - 1
MPI - 1
 
Parallel Computing - Lec 5
Parallel Computing - Lec 5Parallel Computing - Lec 5
Parallel Computing - Lec 5
 
Parallel Computing - Lec 4
Parallel Computing - Lec 4Parallel Computing - Lec 4
Parallel Computing - Lec 4
 
Parallel Computing - Lec 3
Parallel Computing - Lec 3Parallel Computing - Lec 3
Parallel Computing - Lec 3
 
Parallel Computing - Lec 2
Parallel Computing - Lec 2Parallel Computing - Lec 2
Parallel Computing - Lec 2
 
Mpi collective communication operations
Mpi collective communication operationsMpi collective communication operations
Mpi collective communication operations
 

Último

(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)kojalkojal131
 
Call Girls in Vashi Escorts Services - 7738631006
Call Girls in Vashi Escorts Services - 7738631006Call Girls in Vashi Escorts Services - 7738631006
Call Girls in Vashi Escorts Services - 7738631006Pooja Nehwal
 
Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...Naicy mandal
 
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
presentation about microsoft power point
presentation about microsoft power pointpresentation about microsoft power point
presentation about microsoft power pointchhavia330
 
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
High Profile Call Girls In Andheri 7738631006 Call girls in mumbai Mumbai ...
High Profile Call Girls In Andheri 7738631006 Call girls in mumbai  Mumbai ...High Profile Call Girls In Andheri 7738631006 Call girls in mumbai  Mumbai ...
High Profile Call Girls In Andheri 7738631006 Call girls in mumbai Mumbai ...Pooja Nehwal
 
Book Sex Workers Available Pune Call Girls Yerwada 6297143586 Call Hot India...
Book Sex Workers Available Pune Call Girls Yerwada  6297143586 Call Hot India...Book Sex Workers Available Pune Call Girls Yerwada  6297143586 Call Hot India...
Book Sex Workers Available Pune Call Girls Yerwada 6297143586 Call Hot India...Call Girls in Nagpur High Profile
 
Shikrapur Call Girls Most Awaited Fun 6297143586 High Profiles young Beautie...
Shikrapur Call Girls Most Awaited Fun  6297143586 High Profiles young Beautie...Shikrapur Call Girls Most Awaited Fun  6297143586 High Profiles young Beautie...
Shikrapur Call Girls Most Awaited Fun 6297143586 High Profiles young Beautie...tanu pandey
 
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Call Girls in Nagpur High Profile
 
Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated  Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Top Rated  Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Call Girls in Nagpur High Profile
 
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...anilsa9823
 
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311  Call Girls in Thane , Independent Escort Service ThanePallawi 9167673311  Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service ThanePooja Nehwal
 
VIP Call Girls Kavuri Hills ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With ...
VIP Call Girls Kavuri Hills ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With ...VIP Call Girls Kavuri Hills ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With ...
VIP Call Girls Kavuri Hills ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With ...Suhani Kapoor
 
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service NashikLow Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查awo24iot
 
9004554577, Get Adorable Call Girls service. Book call girls & escort service...
9004554577, Get Adorable Call Girls service. Book call girls & escort service...9004554577, Get Adorable Call Girls service. Book call girls & escort service...
9004554577, Get Adorable Call Girls service. Book call girls & escort service...Pooja Nehwal
 
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Pooja Nehwal
 

Último (20)

(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
(=Towel) Dubai Call Girls O525547819 Call Girls In Dubai (Fav0r)
 
Call Girls in Vashi Escorts Services - 7738631006
Call Girls in Vashi Escorts Services - 7738631006Call Girls in Vashi Escorts Services - 7738631006
Call Girls in Vashi Escorts Services - 7738631006
 
Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
Makarba ( Call Girls ) Ahmedabad ✔ 6297143586 ✔ Hot Model With Sexy Bhabi Rea...
 
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(ANIKA) Wanwadi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
presentation about microsoft power point
presentation about microsoft power pointpresentation about microsoft power point
presentation about microsoft power point
 
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
 
High Profile Call Girls In Andheri 7738631006 Call girls in mumbai Mumbai ...
High Profile Call Girls In Andheri 7738631006 Call girls in mumbai  Mumbai ...High Profile Call Girls In Andheri 7738631006 Call girls in mumbai  Mumbai ...
High Profile Call Girls In Andheri 7738631006 Call girls in mumbai Mumbai ...
 
Book Sex Workers Available Pune Call Girls Yerwada 6297143586 Call Hot India...
Book Sex Workers Available Pune Call Girls Yerwada  6297143586 Call Hot India...Book Sex Workers Available Pune Call Girls Yerwada  6297143586 Call Hot India...
Book Sex Workers Available Pune Call Girls Yerwada 6297143586 Call Hot India...
 
Shikrapur Call Girls Most Awaited Fun 6297143586 High Profiles young Beautie...
Shikrapur Call Girls Most Awaited Fun  6297143586 High Profiles young Beautie...Shikrapur Call Girls Most Awaited Fun  6297143586 High Profiles young Beautie...
Shikrapur Call Girls Most Awaited Fun 6297143586 High Profiles young Beautie...
 
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
 
Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated  Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...Top Rated  Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
Top Rated Pune Call Girls Shirwal ⟟ 6297143586 ⟟ Call Me For Genuine Sex Ser...
 
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
 
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311  Call Girls in Thane , Independent Escort Service ThanePallawi 9167673311  Call Girls in Thane , Independent Escort Service Thane
Pallawi 9167673311 Call Girls in Thane , Independent Escort Service Thane
 
VIP Call Girls Kavuri Hills ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With ...
VIP Call Girls Kavuri Hills ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With ...VIP Call Girls Kavuri Hills ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With ...
VIP Call Girls Kavuri Hills ( Hyderabad ) Phone 8250192130 | ₹5k To 25k With ...
 
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
(ZARA) Call Girls Jejuri ( 7001035870 ) HI-Fi Pune Escorts Service
 
(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(PARI) Alandi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service NashikLow Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
 
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
如何办理(Adelaide毕业证)阿德莱德大学毕业证成绩单Adelaide学历认证真实可查
 
9004554577, Get Adorable Call Girls service. Book call girls & escort service...
9004554577, Get Adorable Call Girls service. Book call girls & escort service...9004554577, Get Adorable Call Girls service. Book call girls & escort service...
9004554577, Get Adorable Call Girls service. Book call girls & escort service...
 
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
 

Parallel Computing - Lec 6

  • 1. Parallel Computing Mohamed Zahran (aka Z) mzahran@cs.nyu.edu http://www.mzahran.com CSCI-UA.0480-003 Performance Analysis
  • 2. Defining Performance • Which airplane has the best performance? 0 100 200 300 400 500 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Passenger Capacity 0 2000 4000 6000 8000 10000 Douglas DC- 8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Range (miles) 0 500 1000 1500 Douglas DC-8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Cruising Speed (mph) 0 100000 200000 300000 400000 Douglas DC- 8-50 BAC/Sud Concorde Boeing 747 Boeing 777 Passengers x mph
  • 3. • For some program running on machine X, PerformanceX = 1 / Execution timeX • "X is n times faster than Y" PerformanceX / PerformanceY = n Standard Definition of Performance  Example: time taken to run a program  10s on A, 15s on B  Execution TimeB / Execution TimeA = 15s / 10s = 1.5  So A is 1.5 times faster than B
  • 4. Speedup • Number of cores = p • Serial run-time = Tserial • Parallel run-time = Tparallel Copyright © 2010, Elsevier Inc. All rights Reserved Tserial Tparallel S =
  • 5. Example Processor 1 time 100 time 1 2 3 4 25 25 25 25 time 1 2 3 4 35 35 35 35 ,0.4 25 100 pS balancingloadperfect ,85.2 35 100 pS Perfect parallelization! Does it ever occur? synch cost
  • 6. Example (cont.) time 1 2 3 4 30 20 40 10 time 1 2 3 4 50 50 50 50 imbalanceload ,5.2 40 100 pS costsyncand imbalanceload ,0.2 50 100 pS closest to real life parallel programs Processor 1 time 100
  • 7. A Glimpse at the Top 500 (Nov 2019 List) Rmax: maximum achieved Rpeak: theoretical maximum
  • 8. Sources of Parallel Overheads • Overhead of creating threads/processes • Synchronization • Load imbalance • Communication • Extra computation • Memory access (for both sequential and parallel!)
  • 9. Efficiency of a parallel program Copyright © 2010, Elsevier Inc. All rights Reserved E = Tserial Tparallel Speedup P = p P = can be number of threads, processes, or cores
  • 10. Be Careful about T • Both Tserial and Tpar are wall-clock times, and as such they are not objective. They can be influenced by : – The skill of the programmer who wrote the implementations – The choice of compiler (e.g. GNU C++ versus Intel C++) – The compiler switches (e.g. turning optimization on/off) – The operating system – The type of filesystem holding the input data (e.g. EXT4, NTFS, etc.) – The time of day... (different workloads, network traffic, etc.)
  • 11. Speedup Copyright © 2010, Elsevier Inc. All rights Reserved Program: Matrix Vector Multiplication Graph shows how speedup is affected by the number of processes and problem size.
  • 12. Efficiency Copyright © 2010, Elsevier Inc. All rights Reserved Program: Matrix Vector Multiplication
  • 13. Scalability  Scalability is the ability of a (software or hardware) system to handle a growing amount of work efficiently. • If we keep the efficiency fixed by increasing the number of processes/threads and without increasing problem size, the problem is strongly scalable. • If we keep the efficiency fixed by increasing the problem size at the same rate as we increase the number of processes/threads, the problem is weakly scalable. Copyright © 2010, Elsevier Inc. All rights Reserved
  • 14. Let’s take a closer look at timing.
  • 15. Taking Timings • What is time? • Start to finish? • A program segment of interest? • CPU time? • Wall clock time? Copyright © 2010, Elsevier Inc. All rights Reserved
  • 16. • Elapsed Time (aka wall-clock time) – counts everything (disk and memory accesses, I/O , etc.) – a useful number, but often not good for comparison purposes • CPU time – doesn't count I/O or time spent running other programs – can be broken up into system time, and user time • Our focus: user CPU time – time spent executing the lines of code that are "in" our program Execution Time
  • 17. Execution Time (Elapsed Time) I/O Time Disk and Memory timeCPU Time User CPU Time System CPU Time
  • 18. Taking Timings In Linux: time prog Returns real Xs user Ys sys Zs Inside your C program: clock_t clock(void)returns the number of clock ticks elapsed since the program started #include <time.h> #include <stdio.h> int main() { clock_t start, end, total; int i; start = clock(); for(i=0; i< 10000000; i++) { } end = clock(); total= (double)(end – start) / CLOCKS_PER_SEC; printf("Total time taken by CPU: %fn", total); }
  • 19. Let’s Look at Two Simple Metrics • Response time (aka Execution Time) – The time between the start and completion of a task • Throughput – Total amount of work done in a given time What is the relationship between execution time and throughput?
  • 20. Timing for sequential programs IPC MIPS CPI Execution Time # insts
  • 21. seconds program  cycles program  seconds cycle ET IC * CPI CT ET = IC X CPI X CT ET = Execution Time CPI = Cycles Per Instruction IC = Instruction Count Execution time for sequential program:
  • 22. Example A program runs in 10 seconds on computer A, which has a 4 GHz. clock. We are trying to help a computer designer build a new machine B, that will run this program in 6 seconds. The designer can use new (or perhaps more expensive) technology to substantially increase the clock rate, but has informed us that this increase will affect the rest of the CPU design, causing machine B to require 1.2 times as many clock cycles as machine A for the same program. What clock rate should we tell the designer to target?“
  • 23. CPI Example • Suppose we have two implementations of the same instruction set architecture (ISA). For some program, Machine A has a clock cycle time of 250 ps and a CPI of 2.0 Machine B has a clock cycle time of 500 ps and a CPI of 1.2 What machine is faster for this program, and by how much? [ 10-3 = milli, 10-6 = micro, 10-9 = nano, 10-12 = pico, 10-15 = femto ]
  • 24. #Instructions Example • A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively). The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. Which sequence will be faster? How much? What is the CPI for each sequence?
  • 25. MIPS Example • Two different compilers are being tested for a 4 GHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software. The first compiler's code uses 5 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. The second compiler's code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. • Which sequence will be faster according to MIPS? • Which sequence will be faster according to execution time?
  • 26. Pitfalls in timing in Parallel Machines
  • 27. For Multithreaded Programs • You need to decide: Shall we use execution time or throughput? or both? • IPC (Instructions per Cycle) is not accurate here – small timing variations may lead to different execution time – Order at which threads enter critical section may vary – Different interrupt timing may lead to different scheduling decisions The total number of instructions executed may be different across different runs!
  • 28. For Multithreaded Programs The total number of instructions executed may be different across different runs! This effect increases with the number of cores System-level code account for a significant fraction of the total execution time
  • 29. Your Program Does Not Run in A Vacuum • OS at least is there. • Multi-programming and/or mulithreading setting is very common in multicore settings • Independent programs affect each other performance (why?)
  • 30. How to check the performance of a parallel machine?
  • 31. • Performance best determined by running a real application – Use programs typical of expected workload – Or, typical of expected class of applications – e.g., compilers/editors, scientific applications, graphics, etc. • Small benchmarks – nice for architects and designers – easy to standardize • Parallel Benchmarks: PARSEC, Rodinia, SPLASH-2 • SPEC (System Performance Evaluation Cooperative) – companies have agreed on a set of real program and inputs – valuable indicator of performance (and compiler technology) Benchmarks
  • 32. Role of Benchmarks • help designer explore architectural designs • identify bottlenecks • compare different systems • conduct performance prediction
  • 33. Example: PARSEC • Princeton Application Repository for Shared-Memory Computers • Benchmark Suite for Chip-Multiprocessors • Freely available at: http://parsec.cs.princeton.edu/ • Objectives: – Multithreaded Applications: Future programs must run on multiprocessors – Emerging Workloads: Increasing CPU performance enables new applications – Diverse: Multiprocessors are being used for more and more tasks – State-of-Art Techniques: Algorithms and programming techniques evolve rapidly
  • 34. Example: PARSEC Program Application Domain Parallelization Blackscholes Financial Analysis Data-parallel Bodytrack Computer Vision Data-parallel Canneal Engineering Unstructured Dedup Enterprise Storage Pipeline Facesim Animation Data-parallel Ferret Similarity Search Pipeline Fluidanimate Animation Data-parallel Freqmine Data Mining Data-parallel Streamcluster Data Mining Data-parallel Swaptions Financial Analysis Data-parallel Vips Media Processing Data-parallel X264 Media Processing Pipeline
  • 35. Example: Rodinia • A Benchmark Suite for Heterogeneous Computing: multicore CPU and GPU • University of Virginia
  • 36. Conclusions • Performance evaluation is very important to assess programming quality as well as the underlying architecture and how they interact. • The following capture some aspects of the system but do not represent overall performance: MIPS, #instructions, #cycles, frequency • Execution time is what matters: system time, CPU time, I/O and memory time – To know whether your execution time is good, you need to compare it with sequential code, another parallel code, etc. • Scalability and efficiency measure the quality of your code. Powered by TCPDF (www.tcpdf.org)Powered by TCPDF (www.tcpdf.org)