SlideShare uma empresa Scribd logo
1 de 30
1a.1
Parallel Computing and
Parallel Computers
ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, 2012. Jan 9, 2012
1a.2
Parallel Computing
• Using more than one computer, or a computer with
more than one processor, to solve a problem.
Motives
• Usually faster computation.
• Very simple idea
– n computers operating simultaneously can achieve
the result faster
– it will not be n times faster for various reasons
• Other motives include: fault tolerance, larger amount
of memory available, ...
1a.3
Demand for Computational Speed
• Continual demand for greater computational
speed from a computer system than is
currently possible
• Areas requiring great computational speed
include:
– Numerical modeling
– Simulation of scientific and engineering problems.
• Computations need to be completed within a
“reasonable” time period.
1a.4
“Grand Challenge” Problems
Ones that cannot be solved in a reasonable
amount of time with today’s computers.
Obviously, an execution time of 10 years is
always unreasonable.
Grand Challenge Problem Examples
• Modeling large DNA structures
• Global weather forecasting
• Modeling motion of astronomical bodies.
1a.5
Weather Forecasting
• Atmosphere modeled by dividing it into 3-
dimensional cells.
• Calculations of each cell repeated many
times to model passage of time.
Temperature,
pressure,
humidity, etc.
1a.6
Global Weather Forecasting Example
• Suppose whole global atmosphere divided into cells of size 1
mile × 1 mile × 1 mile to a height of 10 miles (10 cells high) -
about 5 × 108
cells.
• Suppose each calculation requires 200 floating point
operations. In one time step, 1011
floating point operations
necessary.
• To forecast weather over 7 days using 1-minute intervals, a
computer operating at 1Gflops (109
floating point operations/s)
takes 106
seconds or over 10 days.
• To perform calculation in 5 minutes requires computer
operating at 3.4 Tflops (3.4 × 1012
floating point operations/sec)
• Needs to be 34,000 faster.
1a.7
Modeling Motion of Astronomical Bodies
Each body attracted to each other body by
gravitational forces. Movement of each body
predicted by calculating total force on each body.
1a.8
Modeling Motion of Astronomical Bodies
• With N bodies, N - 1 forces to calculate for each
body, or approx. N2
calculations, i.e. O(N2
) *
• After determining new positions of bodies,
calculations repeated, i.e. N2
x T calculations
where T is the number of time steps.
* There is an O(N log2 N) algorithm, which we will cover in the
course
1a.9
• A galaxy might have, say, 1011
stars.
• Even if each calculation done in 1 ms (extremely
optimistic figure), it takes:
• 109
years for one iteration using N2
algorithm
or
• Almost a year for one iteration using the N log2 N
algorithm assuming the calculations take the same
time (which may not be true).
1a.10
Astrophysical N-body simulation by Scott Linssen (undergraduate
UNC-Charlotte student).
1a.11
Parallel programming
Programming parallel computers
–
Has been around for more than 50years.
1a.12
Gill writes in 1958:
“... There is therefore nothing new in the idea of parallel
programming, but its application to computers. The
author cannot believe that there will be any insuperable
difficulty in extending it to computers. It is not to be
expected that the necessary programming techniques will
be worked out overnight. Much experimenting remains to
be done. After all, the techniques that are commonly
used in programming today were only won at the cost of
considerable toil several years ago. In fact the advent of
parallel programming may do something to revive the
pioneering spirit in programming which seems at the
present to be degenerating into a rather dull and routine
occupation ...”
Gill, S. (1958), “Parallel Programming,” The Computer Journal, vol. 1, April, pp. 2-10.
1a.13
Potential for parallel
computers/parallel
programming
1a.14
Speedup Factor
where ts is execution time on a single processor and tp is
execution time on a multiprocessor.
S(p) gives increase in speed by using multiprocessor.
Typically use best sequential algorithm with single
processor system. Underlying algorithm for parallel
implementation might be (and is usually) different.
S(p) =
Execution time using one processor (best sequential algorithm)
Execution time using a multiprocessor with p processors
ts
tp
=
1a.15
Speedup factor can also be cast in terms
of computational steps:
Can also extend time complexity to
parallel computations.
S(p) =
Number of computational steps using one processor
Number of parallel computational steps with p processors
1a.16
Maximum Speedup
Maximum speedup usually p with p processors
(linear speedup).
Possible to get superlinear speedup (greater
than p) but usually a specific reason such as:
• Extra memory in multiprocessor system
• Nondeterministic algorithm
1a.17
Maximum Speedup
Amdahl’s law
Serial section Parallelizable sections
(a) One processor
(b) Multiple
processors
fts (1 - f)ts
ts
(1 - f)ts/ptp
p processors
1a.18
Speedup factor is given by:
This equation is known as Amdahl’s law
S(p) =
ts p=
fts + (1− f )ts /p 1 + (p − 1)f
1a.19
Speedup against number of processors
4
8
12
16
20
4 8 12 16 20
f = 20%
f = 10%
f = 5%
f = 0%
Number of processors , p
1a.20
Even with infinite number of processors, maximum
speedup limited to 1/f.
Example
With only 5% of computation being serial, maximum
speedup is 20, irrespective of number of processors.
This is a very discouraging result.
Amdahl used this argument to support the design of
ultra-high speed single processor systems in the
1960s.
Gustafson’s law
Later, Gustafson (1988) described how the
conclusion of Amdahl’s law might be overcome by
considering the effect of increasing the problem
size.
He argued that when a problem is ported onto a
multiprocessor system, larger problem sizes can
be considered, that is, the same problem but with a
larger number of data values.
1a.21
Gustafson’s law
Starting point for Gustafson’s law is the computation
on the multiprocessor rather than on the single
computer.
In Gustafson’s analysis, parallel execution time kept
constant, which we assume to be some acceptable
time for waiting for the solution.
1a.22
Gustafson’s law
Parallel computation composed of fraction computed sequentially
say f ’ and fraction that contains parallel parts,1 – f ’.
Gustafson’s so-called scaled speedup fraction given by:
f ’ is fraction of computation on multiprocessor that cannot be
parallelized.
f ’ is different to f previously, which is fraction of computation on a
single computer that cannot be parallelized.
Conclusion drawn from Gustafson’s law is almost linear increase
in speedup with increasing number of processors, but the
fractional part f ‘ needs to remain small. 1a.23
S’(p) =
f ’tp + (1 – f ’)ptp
tp
= p + (1 – p)f ’
Gustafson’s law
For example if is 5%, the scaled speedup computes
to 19.05 with 20 processors whereas with Amdahl’s
law with f = 5% the speedup computes to 10.26.
Gustafson quotes results obtained in practice of very
high speedup close to linear on a 1024-processor
hypercube.
1a.24
1a.25
Superlinear Speedup example
- Searching
(a) Searching each sub-space sequentially
ts
ts/p
Start Time
∆t
Solution found
xts/p
Sub-space
search
x indeterminate
1a.26
(b) Searching each sub-
space in parallel
Solution found
∆t
1a.27
Speed-up then given by
S(p)
x
t
s
p
× t∆+
t∆
=
1a.28
Worst case for sequential search when solution
found in last sub-space search. Then parallel
version offers greatest benefit, i.e.
S(p)
p 1–
p
t
s
t∆+×
t∆
∞→=
as ∆t tends to zero
1a.29
Least advantage for parallel version when
solution found in first sub-space search of
the sequential search, i.e.
Actual speed-up depends upon which
subspace holds solution but could be
extremely large.
S(p) = t∆
t∆
= 1
1a.30
• Next question to answer is how does
one construct a computer system with
multiple processors to achieve the
speed-up?

Mais conteúdo relacionado

Mais procurados

Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel AlgorithmsHeman Pathak
 
Analytical Models of Parallel Programs
Analytical Models of Parallel ProgramsAnalytical Models of Parallel Programs
Analytical Models of Parallel ProgramsDr Shashikant Athawale
 
Energy power efficient real time systems
Energy power efficient real time systemsEnergy power efficient real time systems
Energy power efficient real time systemspragya arya
 
STATISTICAL APPROACH TO DETERMINE MOST EFFICIENT VALUE FOR TIME QUANTUM IN RO...
STATISTICAL APPROACH TO DETERMINE MOST EFFICIENT VALUE FOR TIME QUANTUM IN RO...STATISTICAL APPROACH TO DETERMINE MOST EFFICIENT VALUE FOR TIME QUANTUM IN RO...
STATISTICAL APPROACH TO DETERMINE MOST EFFICIENT VALUE FOR TIME QUANTUM IN RO...ijcsit
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisShah Zaib
 
Analysis of algorithn class 2
Analysis of algorithn class 2Analysis of algorithn class 2
Analysis of algorithn class 2Kumar
 
empirical masurements
empirical masurementsempirical masurements
empirical masurementsRajendran
 
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...Tiziano De Matteis
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
 
Using information theory principles to schedule real time tasks
 Using information theory principles to schedule real time tasks Using information theory principles to schedule real time tasks
Using information theory principles to schedule real time tasksazm13
 
Determininstic rounding of linear programs
Determininstic rounding of linear programsDetermininstic rounding of linear programs
Determininstic rounding of linear programsshibbirtanvin
 
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesNECST Lab @ Politecnico di Milano
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...NECST Lab @ Politecnico di Milano
 
2 nomenclature with digressions
2 nomenclature with digressions2 nomenclature with digressions
2 nomenclature with digressionsRaja Sekhar
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clusteringSubhas Kumar Ghosh
 
Presentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingPresentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingAntonio Maria Fiscarelli
 

Mais procurados (20)

Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Matrix Multiplication Report
Matrix Multiplication ReportMatrix Multiplication Report
Matrix Multiplication Report
 
Analytical Models of Parallel Programs
Analytical Models of Parallel ProgramsAnalytical Models of Parallel Programs
Analytical Models of Parallel Programs
 
Energy power efficient real time systems
Energy power efficient real time systemsEnergy power efficient real time systems
Energy power efficient real time systems
 
STATISTICAL APPROACH TO DETERMINE MOST EFFICIENT VALUE FOR TIME QUANTUM IN RO...
STATISTICAL APPROACH TO DETERMINE MOST EFFICIENT VALUE FOR TIME QUANTUM IN RO...STATISTICAL APPROACH TO DETERMINE MOST EFFICIENT VALUE FOR TIME QUANTUM IN RO...
STATISTICAL APPROACH TO DETERMINE MOST EFFICIENT VALUE FOR TIME QUANTUM IN RO...
 
Job shop scheduling problem using genetic algorithm
Job shop scheduling problem using genetic algorithmJob shop scheduling problem using genetic algorithm
Job shop scheduling problem using genetic algorithm
 
Analysis algorithm
Analysis algorithmAnalysis algorithm
Analysis algorithm
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
 
Analysis of algorithn class 2
Analysis of algorithn class 2Analysis of algorithn class 2
Analysis of algorithn class 2
 
empirical masurements
empirical masurementsempirical masurements
empirical masurements
 
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...
Keep Calm and React with Foresight: Strategies for Low-Latency and Energy-Eff...
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Using information theory principles to schedule real time tasks
 Using information theory principles to schedule real time tasks Using information theory principles to schedule real time tasks
Using information theory principles to schedule real time tasks
 
Determininstic rounding of linear programs
Determininstic rounding of linear programsDetermininstic rounding of linear programs
Determininstic rounding of linear programs
 
Self-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policiesSelf-adaptive container monitoring with performance-aware Load-Shedding policies
Self-adaptive container monitoring with performance-aware Load-Shedding policies
 
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
A Highly Parallel Semi-Dataflow FPGA Architecture for Large-Scale N-Body Simu...
 
Cs 331 Data Structures
Cs 331 Data StructuresCs 331 Data Structures
Cs 331 Data Structures
 
2 nomenclature with digressions
2 nomenclature with digressions2 nomenclature with digressions
2 nomenclature with digressions
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
Presentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop schedulingPresentation_Parallel GRASP algorithm for job shop scheduling
Presentation_Parallel GRASP algorithm for job shop scheduling
 

Destaque

parallelization strategy
parallelization strategyparallelization strategy
parallelization strategyR. M.
 
الديسلكسيا العسر القرائي
الديسلكسيا العسر القرائيالديسلكسيا العسر القرائي
الديسلكسيا العسر القرائيLAILAF_M
 
第3回twitter研究会「閉会の挨拶」
第3回twitter研究会「閉会の挨拶」第3回twitter研究会「閉会の挨拶」
第3回twitter研究会「閉会の挨拶」Tomohiro Nishitani
 
Exposed_ The Secret Life of Roots _ United States Botanic Garden
Exposed_ The Secret Life of Roots _ United States Botanic GardenExposed_ The Secret Life of Roots _ United States Botanic Garden
Exposed_ The Secret Life of Roots _ United States Botanic Gardenjerry_d_glover
 
Plumbing Point Loma- Frozen Pipes
Plumbing Point Loma- Frozen PipesPlumbing Point Loma- Frozen Pipes
Plumbing Point Loma- Frozen PipesPeter Somhegyi
 

Destaque (20)

parallelization strategy
parallelization strategyparallelization strategy
parallelization strategy
 
ER_appreciation
ER_appreciationER_appreciation
ER_appreciation
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
الديسلكسيا العسر القرائي
الديسلكسيا العسر القرائيالديسلكسيا العسر القرائي
الديسلكسيا العسر القرائي
 
Multicore
MulticoreMulticore
Multicore
 
Announcements for june 2014
Announcements for june 2014Announcements for june 2014
Announcements for june 2014
 
February 2013 announcements
February 2013 announcementsFebruary 2013 announcements
February 2013 announcements
 
Announcements for june 2013
Announcements for june 2013Announcements for june 2013
Announcements for june 2013
 
GCC
GCCGCC
GCC
 
第3回twitter研究会「閉会の挨拶」
第3回twitter研究会「閉会の挨拶」第3回twitter研究会「閉会の挨拶」
第3回twitter研究会「閉会の挨拶」
 
November 2012 announcements
November 2012 announcementsNovember 2012 announcements
November 2012 announcements
 
Announcements for March 11 2012
Announcements for March 11 2012Announcements for March 11 2012
Announcements for March 11 2012
 
Announcements for May 2014
Announcements for May 2014Announcements for May 2014
Announcements for May 2014
 
Exposed_ The Secret Life of Roots _ United States Botanic Garden
Exposed_ The Secret Life of Roots _ United States Botanic GardenExposed_ The Secret Life of Roots _ United States Botanic Garden
Exposed_ The Secret Life of Roots _ United States Botanic Garden
 
Announcements for june 2014
Announcements for june 2014Announcements for june 2014
Announcements for june 2014
 
September 2012 announcements
September 2012 announcementsSeptember 2012 announcements
September 2012 announcements
 
Announcements for june 2013
Announcements for june 2013Announcements for june 2013
Announcements for june 2013
 
Plumbing Point Loma- Frozen Pipes
Plumbing Point Loma- Frozen PipesPlumbing Point Loma- Frozen Pipes
Plumbing Point Loma- Frozen Pipes
 
July 2012 announcements
July 2012 announcementsJuly 2012 announcements
July 2012 announcements
 
Command GM
Command GMCommand GM
Command GM
 

Semelhante a Parallel programming

Performance measures
Performance measuresPerformance measures
Performance measuresDivya Tiwari
 
Multicore and GPU Programming
Multicore and GPU ProgrammingMulticore and GPU Programming
Multicore and GPU ProgrammingRoland Bruggmann
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxjohnsmith96441
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
algocomplexity cost effective tradeoff in
algocomplexity cost effective tradeoff inalgocomplexity cost effective tradeoff in
algocomplexity cost effective tradeoff injaved75
 
complexity analysis.pdf
complexity analysis.pdfcomplexity analysis.pdf
complexity analysis.pdfpasinduneshan
 
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...ijcsit
 
parallel Questions & answers
parallel Questions & answersparallel Questions & answers
parallel Questions & answersMd. Mashiur Rahman
 
Temporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware schedulingTemporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware schedulingijesajournal
 
Temporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware schedulingTemporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware schedulingijesajournal
 
Report on High Performance Computing
Report on High Performance ComputingReport on High Performance Computing
Report on High Performance ComputingPrateek Sarangi
 
Multiprocessor Real-Time Scheduling.pptx
Multiprocessor Real-Time Scheduling.pptxMultiprocessor Real-Time Scheduling.pptx
Multiprocessor Real-Time Scheduling.pptxnaghamallella
 
presentationfinal-090714235255-phpapp01 (1) (2).pptx
presentationfinal-090714235255-phpapp01 (1) (2).pptxpresentationfinal-090714235255-phpapp01 (1) (2).pptx
presentationfinal-090714235255-phpapp01 (1) (2).pptxjaved75
 
Measuring Performance by Irfanullah
Measuring Performance by IrfanullahMeasuring Performance by Irfanullah
Measuring Performance by Irfanullahguest2e9811e
 

Semelhante a Parallel programming (20)

Nbvtalkatjntuvizianagaram
NbvtalkatjntuvizianagaramNbvtalkatjntuvizianagaram
Nbvtalkatjntuvizianagaram
 
Performance measures
Performance measuresPerformance measures
Performance measures
 
Multicore and GPU Programming
Multicore and GPU ProgrammingMulticore and GPU Programming
Multicore and GPU Programming
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Lecture1
Lecture1Lecture1
Lecture1
 
algocomplexity cost effective tradeoff in
algocomplexity cost effective tradeoff inalgocomplexity cost effective tradeoff in
algocomplexity cost effective tradeoff in
 
Matrix multiplication
Matrix multiplicationMatrix multiplication
Matrix multiplication
 
complexity analysis.pdf
complexity analysis.pdfcomplexity analysis.pdf
complexity analysis.pdf
 
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
Bounded ant colony algorithm for task Allocation on a network of homogeneous ...
 
Chap5 slides
Chap5 slidesChap5 slides
Chap5 slides
 
Ch5 answers
Ch5 answersCh5 answers
Ch5 answers
 
parallel Questions & answers
parallel Questions & answersparallel Questions & answers
parallel Questions & answers
 
Temporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware schedulingTemporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware scheduling
 
Temporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware schedulingTemporal workload analysis and its application to power aware scheduling
Temporal workload analysis and its application to power aware scheduling
 
Report on High Performance Computing
Report on High Performance ComputingReport on High Performance Computing
Report on High Performance Computing
 
Multiprocessor Real-Time Scheduling.pptx
Multiprocessor Real-Time Scheduling.pptxMultiprocessor Real-Time Scheduling.pptx
Multiprocessor Real-Time Scheduling.pptx
 
presentationfinal-090714235255-phpapp01 (1) (2).pptx
presentationfinal-090714235255-phpapp01 (1) (2).pptxpresentationfinal-090714235255-phpapp01 (1) (2).pptx
presentationfinal-090714235255-phpapp01 (1) (2).pptx
 
Measuring Performance by Irfanullah
Measuring Performance by IrfanullahMeasuring Performance by Irfanullah
Measuring Performance by Irfanullah
 

Mais de Anshul Sharma (11)

Understanding concurrency
Understanding concurrencyUnderstanding concurrency
Understanding concurrency
 
Interm codegen
Interm codegenInterm codegen
Interm codegen
 
Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
 
Open MPI 2
Open MPI 2Open MPI 2
Open MPI 2
 
Open MPI
Open MPIOpen MPI
Open MPI
 
Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2
 
Cuda 3
Cuda 3Cuda 3
Cuda 3
 
Cuda 2
Cuda 2Cuda 2
Cuda 2
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
Des
DesDes
Des
 
Intoduction to Linux
Intoduction to LinuxIntoduction to Linux
Intoduction to Linux
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Parallel programming

  • 1. 1a.1 Parallel Computing and Parallel Computers ITCS 4/5145 Parallel Programming UNC-Charlotte, B. Wilkinson, 2012. Jan 9, 2012
  • 2. 1a.2 Parallel Computing • Using more than one computer, or a computer with more than one processor, to solve a problem. Motives • Usually faster computation. • Very simple idea – n computers operating simultaneously can achieve the result faster – it will not be n times faster for various reasons • Other motives include: fault tolerance, larger amount of memory available, ...
  • 3. 1a.3 Demand for Computational Speed • Continual demand for greater computational speed from a computer system than is currently possible • Areas requiring great computational speed include: – Numerical modeling – Simulation of scientific and engineering problems. • Computations need to be completed within a “reasonable” time period.
  • 4. 1a.4 “Grand Challenge” Problems Ones that cannot be solved in a reasonable amount of time with today’s computers. Obviously, an execution time of 10 years is always unreasonable. Grand Challenge Problem Examples • Modeling large DNA structures • Global weather forecasting • Modeling motion of astronomical bodies.
  • 5. 1a.5 Weather Forecasting • Atmosphere modeled by dividing it into 3- dimensional cells. • Calculations of each cell repeated many times to model passage of time. Temperature, pressure, humidity, etc.
  • 6. 1a.6 Global Weather Forecasting Example • Suppose whole global atmosphere divided into cells of size 1 mile × 1 mile × 1 mile to a height of 10 miles (10 cells high) - about 5 × 108 cells. • Suppose each calculation requires 200 floating point operations. In one time step, 1011 floating point operations necessary. • To forecast weather over 7 days using 1-minute intervals, a computer operating at 1Gflops (109 floating point operations/s) takes 106 seconds or over 10 days. • To perform calculation in 5 minutes requires computer operating at 3.4 Tflops (3.4 × 1012 floating point operations/sec) • Needs to be 34,000 faster.
  • 7. 1a.7 Modeling Motion of Astronomical Bodies Each body attracted to each other body by gravitational forces. Movement of each body predicted by calculating total force on each body.
  • 8. 1a.8 Modeling Motion of Astronomical Bodies • With N bodies, N - 1 forces to calculate for each body, or approx. N2 calculations, i.e. O(N2 ) * • After determining new positions of bodies, calculations repeated, i.e. N2 x T calculations where T is the number of time steps. * There is an O(N log2 N) algorithm, which we will cover in the course
  • 9. 1a.9 • A galaxy might have, say, 1011 stars. • Even if each calculation done in 1 ms (extremely optimistic figure), it takes: • 109 years for one iteration using N2 algorithm or • Almost a year for one iteration using the N log2 N algorithm assuming the calculations take the same time (which may not be true).
  • 10. 1a.10 Astrophysical N-body simulation by Scott Linssen (undergraduate UNC-Charlotte student).
  • 11. 1a.11 Parallel programming Programming parallel computers – Has been around for more than 50years.
  • 12. 1a.12 Gill writes in 1958: “... There is therefore nothing new in the idea of parallel programming, but its application to computers. The author cannot believe that there will be any insuperable difficulty in extending it to computers. It is not to be expected that the necessary programming techniques will be worked out overnight. Much experimenting remains to be done. After all, the techniques that are commonly used in programming today were only won at the cost of considerable toil several years ago. In fact the advent of parallel programming may do something to revive the pioneering spirit in programming which seems at the present to be degenerating into a rather dull and routine occupation ...” Gill, S. (1958), “Parallel Programming,” The Computer Journal, vol. 1, April, pp. 2-10.
  • 14. 1a.14 Speedup Factor where ts is execution time on a single processor and tp is execution time on a multiprocessor. S(p) gives increase in speed by using multiprocessor. Typically use best sequential algorithm with single processor system. Underlying algorithm for parallel implementation might be (and is usually) different. S(p) = Execution time using one processor (best sequential algorithm) Execution time using a multiprocessor with p processors ts tp =
  • 15. 1a.15 Speedup factor can also be cast in terms of computational steps: Can also extend time complexity to parallel computations. S(p) = Number of computational steps using one processor Number of parallel computational steps with p processors
  • 16. 1a.16 Maximum Speedup Maximum speedup usually p with p processors (linear speedup). Possible to get superlinear speedup (greater than p) but usually a specific reason such as: • Extra memory in multiprocessor system • Nondeterministic algorithm
  • 17. 1a.17 Maximum Speedup Amdahl’s law Serial section Parallelizable sections (a) One processor (b) Multiple processors fts (1 - f)ts ts (1 - f)ts/ptp p processors
  • 18. 1a.18 Speedup factor is given by: This equation is known as Amdahl’s law S(p) = ts p= fts + (1− f )ts /p 1 + (p − 1)f
  • 19. 1a.19 Speedup against number of processors 4 8 12 16 20 4 8 12 16 20 f = 20% f = 10% f = 5% f = 0% Number of processors , p
  • 20. 1a.20 Even with infinite number of processors, maximum speedup limited to 1/f. Example With only 5% of computation being serial, maximum speedup is 20, irrespective of number of processors. This is a very discouraging result. Amdahl used this argument to support the design of ultra-high speed single processor systems in the 1960s.
  • 21. Gustafson’s law Later, Gustafson (1988) described how the conclusion of Amdahl’s law might be overcome by considering the effect of increasing the problem size. He argued that when a problem is ported onto a multiprocessor system, larger problem sizes can be considered, that is, the same problem but with a larger number of data values. 1a.21
  • 22. Gustafson’s law Starting point for Gustafson’s law is the computation on the multiprocessor rather than on the single computer. In Gustafson’s analysis, parallel execution time kept constant, which we assume to be some acceptable time for waiting for the solution. 1a.22
  • 23. Gustafson’s law Parallel computation composed of fraction computed sequentially say f ’ and fraction that contains parallel parts,1 – f ’. Gustafson’s so-called scaled speedup fraction given by: f ’ is fraction of computation on multiprocessor that cannot be parallelized. f ’ is different to f previously, which is fraction of computation on a single computer that cannot be parallelized. Conclusion drawn from Gustafson’s law is almost linear increase in speedup with increasing number of processors, but the fractional part f ‘ needs to remain small. 1a.23 S’(p) = f ’tp + (1 – f ’)ptp tp = p + (1 – p)f ’
  • 24. Gustafson’s law For example if is 5%, the scaled speedup computes to 19.05 with 20 processors whereas with Amdahl’s law with f = 5% the speedup computes to 10.26. Gustafson quotes results obtained in practice of very high speedup close to linear on a 1024-processor hypercube. 1a.24
  • 25. 1a.25 Superlinear Speedup example - Searching (a) Searching each sub-space sequentially ts ts/p Start Time ∆t Solution found xts/p Sub-space search x indeterminate
  • 26. 1a.26 (b) Searching each sub- space in parallel Solution found ∆t
  • 27. 1a.27 Speed-up then given by S(p) x t s p × t∆+ t∆ =
  • 28. 1a.28 Worst case for sequential search when solution found in last sub-space search. Then parallel version offers greatest benefit, i.e. S(p) p 1– p t s t∆+× t∆ ∞→= as ∆t tends to zero
  • 29. 1a.29 Least advantage for parallel version when solution found in first sub-space search of the sequential search, i.e. Actual speed-up depends upon which subspace holds solution but could be extremely large. S(p) = t∆ t∆ = 1
  • 30. 1a.30 • Next question to answer is how does one construct a computer system with multiple processors to achieve the speed-up?