SlideShare uma empresa Scribd logo
1 de 41
Parallel Computing
   Lecture # 3




                     1
Course Material
Text Books:
- Computer Architecture & Parallel Processing
  Kai Hwang, Faye A. Briggs
- Advanced Computer Architecture
  Kai Hwang.
Reference Book:
- Scalable Computer Architecture

                                                2
What is Parallel
        Processing?
It is an efficient form of information
processing which emphasizes on the
exploitation of the concurrent events in the
computing process.

Efficiency is measured as:-
  Efficiency = Time / Speed + Accuracy

            * Always first classify definitions then give properties.   3
Types of Concurrent Events
There are 3 types of concurrent events:-
• Parallel Event or Synchronous Event :-
   (Type of concurrency is parallelism)
   It may occur in multiple resources during
    the same interval time.
  Example
             Array/Vector Processors
                        CU

             PE         PE        PE      Based on ALU

                                                         4
2. Simultaneous Event or Asynchronous
   Event :-
  (Type of concurrency is simultaneity )

  It may occur in multiple resources during the
  same interval time.
  Example
               Multiprocessing System

3. Pipelined Event or Overlapped Event :-
  It may occur in overlapped spans.
  Example
                Pipelined Processor
                                           5
System Attributes versus
  Performance Factors
The ideal performance of a computer system
requires a perfect match between machine
capability and program behavior.
Machine capability can be enhanced with better
hardware technology, however program behavior
is difficult to predict due to its dependence on
application and run-time conditions.
Below are the five fundamental factors for
projecting the performance of a computer.
                                               6
• Clock Rate :- CPU is driven by a clock
   of constant cycle time (τ).
                 τ = 1/ f   (ns)

2. CPI :- (Cycles per instruction)
    As different instructions acquire different
cycles to execute, CPI will be taken as an
average value for a given instruction set and a
given program mix.

                                            7
3. Execution Time :- Let Ic be Instruction
Count or total number of instructions in the
program. So
        Execution Time = ?
            T = Ic × CPI ×   τ
Now,
CPI = Instruction Cycle = Processor Cycles +
                          Memory Cycles
   ∴ Instruction cycle = p + m × k
where
      m = number of memory references
                                         8
P = number of processor cycles
   k = latency factor (how much the memory
                     is slow w.r.t to CPU)

Now let C be Total number of cycles required
to execute a program.
So,                C=?
              C = Ic × CPI
And the time to execute a program will be

               T=C×τ
                                             9
4. MIPS Rate :-
                         Ic
         MIPS rate =
                                6
                       T × 10

5. Throughput Rate:-             Number        of
programs executed per unit time.
                W=?
               W=1/T
                  OR
              W = MIPS × 10 6


                     Ic                   10
Numerical:- A benchmark program is
 executed on a 40MHz processor. The
 benchmark program has the following
 statistics.
Instruction Type Instruction Count Clock Cycle Count
 Arithmetic           45000               1
 Branch               32000               2
 Load/Store           15000               2
 Floating Point        8000               2

 Calculate average CPI,MIPS rate & execution
 for the above benchmark program.
                                              11
Average CPI = C
               Ic
C = Total # cycles to execute a whole program
Ic           Total Instruction
  = 45000 ×1 + 32000×2 + 1500×2 + 8000×2
        45000 + 3200 + 15000 + 8000
   = 155000
     100000
CPI = 1.55
         Execution Time = C / f
                                        12
6
T = 150000 / 40 × 10
T = 0.155 / 40
T = 3.875 ms
                                     6
           MIPS rate = Ic / T × 10
                 MIPS rate = 25.8




                                         13
System             Performance Factors
Attributes        Ic       CPI
                       p    m    k   τ
Instruction-set
Architecture
Compiler
Technology
CPU
Implementation
& Technology
Memory
Hierarchy
                                         14
Practice Problems :-
• Do problem number 1.4 from the book
  Advanced Computer Architecture by Kai
  Hwang.
2. A benchmark program containing 234,000
   instructions is executed on a processor
   having a cycle time of 0.15ns The statistics
   of the program is given below.
     Each memory reference requires 3 CPU
   cycles to complete.Calculate MIPS rate &
   throughput for the program.
                                            15
Instruction Instruction Processor   Memory
   Type         Mix     Cycles      Cycles
Arithmetic    58 %          2         2
Branch        33 %          3         1
Load/Store     9%           3         2




                                          16
Programmatic Levels of
       Parallel Processing
Parallel Processing can be challenged in 4
programmatic levels:-
3. Job / Program Level
2. Task / Procedure Level
3. Interinstruction Level
4. Intrainstruction Level
                                             17
1. Job / Program Level :-
                                It requires the
development     of    parallel     processable
algorithms.The implementation of parallel
algorithms depends on the efficient allocation
of limited hardware and software resources to
multiple programs being used to solve a large
computational problem.
Example: Weather forecasting , medical
consulting , oil exploration etc.

                                           18
2. Task / Procedure Level :-
                               It is conducted
among procedure/tasks within the same
program. This involves the decomposition of
the program into multiple tasks.
( for simultaneous execution )
3. Interinstruction Level :-
                               Interinstruction
level is to exploit concurrency among
multiple instructions so that they can be
executed simultaneously. Data dependency
analysis is often performed to reveal parallel-
                                           19
-lism amoung instructions. Vectorization may
be desired for scalar operations within DO
loops.

4. Intrainstruction Level :-
                                Intrainstruction
level exploits faster and concurrent
operations within each instruction e.g. use of
carry look ahead and carry save address
instead of ripple carry address.


                                            20
Key Points :-
1. Hardware role increases from high to low
   levels whereas software role increases from
   low to high levels.
2. As highest job level is conducted
   algorithmically, lowest level is implemented
   directly by hardware means.
3. The trade-off between hardware and
   software approaches to solve a problem is
   always a very controversial issue.
                                          21
4. As hardware cost declines and software
  cost increases , more and more hardware
  method are replacing the conventional
  software approaches.
Conclusion :-
                        Parallel Processing is a
combined field of studies which requires a
broad knowledge of and experience with all
aspects of algorithms, languages, hardware,
software,   performance       evaluation    and
computing alternatives.
                                           22
Parallel Processing in
      Uniprocessor Systems
A number of parallel processing mechanisms
have been developed in uniprocessor
computers. We identify them in six categories
which are described below.
1. Multiplicity of Functional Units :-
 Different ALU functions can be distributed to
 multiple & specialized functional units which
 can operate in parallel.
                                          23
The CDC-6600 has 10 functional units built in
its CPU.
           IBM 360 / 91


      fixed point        floating point

                      add / sub   mul / div




                                              24
2. Parallelism & Pipelining within the CPU :-
 Use of carry-lookahead & carry-save address
 instead of ripple-carry adders.




 Cascade two 4-bit parallel adders to create an 8-bit parallel adder.
                                                                        25
Ripple-carry Adder :-
At each stage the sum bit is not valid until
after the carry bits in all the preceding stages
are valid.
No of bits is directly proportional to the time
required for valid addition.
Problem :- The time required to generate each
carryout bit in the 8-bit parallel adder is 24ns.
Once all inputs to an adder are valid, there is a
delay of 32ns until the output sum bit is valid.
What is the maximum number of additions per
                                            26
second that the adder can perform?
          1 addition = 7 × 24 + 32
                   = 200ns
 Additions / sec = 1 / 200
                            -3    9
                 = 0.5 × 10 × 10
                 = 5 × 10 6


                 = 5 million additions / sec



                                               27
Practice Problem
Assuming the 32ns delay in producing a valid
sum bit in the 8-bit parallel adder. What
maximum delay in generating a carry out bit is
allowed if the adder must be capable of
             7

performing 10 additions per second.




                                         28
Carry-Lookahead Adder :-




  A 4-bit parallel adder incorporating carry look-ahead. Each full adder
  is of the type shown in fig.
                                                                           29
Essence & Idea :-
To determine & generate the carry input bits
 for all stages after examining the input bits
simultaneously.
           C1 = A0 B0 + A0 C0 + B0C0
              = A0 B0 + ( A0 + B0 ) C0
           C2 = A1B1 + ( A1 + B1 ) C1
Carry                                     Carry
Generate                                  Propagate

           Cn = An-1Bn-1 + ( An-1 + Bn-1 ) Cn-1
                                                      30
If Ai and Bi both are 1 then Ci+1 = 1. It means
that the input data itself generating a carry this
is called carry generate.
                 G0 = A0B0
                 G1 = A1B1


                  Gn-1 = An-1 Bn-1
Ci+1 can be 1 if Ci = 1 and if either Ai or Bi = 1
it means that A0 or B0 is used to propagate the
carry. This is called carry propagate
represented by P0.                            31
P0 = A0B0
              P1 = A1 + Bo


                Pn-1 = An-1 + Bn-1
Now writing the carry equations in terms of
carry generate and carry propagate.
         C1 = G0 + P0C0
         C2 = G1 + P1C1
            = G1 + P1 (G0 + P0C0 )
                                      32
         C =G +PG +PPC
C3 = G2 + G2C2
        = G2 + P2 ( G1 + P1G0 + P1P0C0 )
     C3 = G2 + P2G1 + P2P1G0 + P2P1P0C0
Problem :- In each full adder of a 4-bit carry
look-ahead adder, there is a propagation delay
of 4ns before the carry propagate & carry
generate outputs are valid. The delay in each
external logic gate is 3ns. Once all inputs to an
adder are valid, there is a delay of 6ns before
the output sum bit is valid. What is the
maximum no of additions/sec that the adder  33
perform?

1 addition = 4ns + 3ns + 3ns + 6ns = 16ns

              AND gate is   OR gate is in
              in parallel   serial

 Additions / sec = 1 / 16
                             -3   9
                 = 62.5 × 10 × 10
                 = 62.5 × 10  6


                 = 62.5 million additions / sec

                                            34
3. Overlapping CPU & I/O Operations :-
 DMA is conducted on a cycle-stealing basis.
• CDC-6600 has 10 I/O processors of I/O
  multiprocessing.
• Simultaneous I/O operations & CPU
  computations can be achieved using
  separate I/O controllers, channels.
4. Use of hierarchical Memory System :-
 A hierarchal memory system can be used to
close up the speed gap between the CPU &
                                          35
main memory because CPU is 1000 times
faster than memory access.
5. Balancing of Subsystem Bandwidth :-
 Consider the relation
                 t m< tm < td
Bandwidth of a System :-
Bandwidth of a system is defined as the
number of operations performed per unit time.
Bandwidth of a memory :-
The memory bandwidth is the number of words
                                        36
accessed per unit time. It is represented by Bm. If ‘W’ is the
total number of words accessed per memory cycle tm then
                Bm = W         (words / sec )
                     tm
In case of interleaved memory of M modules, the memory
access conflicts may cause delayed access to some of the
processors requests. The utilized memory bandwidth will be:
                 Bum = Bm (words / sec )
                          √M
Processor Bandwidth :-
Bp :- maximum CPU computation rate.
  u
Bp :- utilized processor bandwidth or the no. of
                                            37
output results per second.
               u
             Bp = Rw (word result)
                  Tp
Rw :- no of word results.
Tp :- Total CPU time to generate Rw results.
Bd :- Bandwidth of devices. (which is assumed
      as provided by the vendor).
The following relationships have been
observed between the bandwidths of the major
subsystems      in    a    high   performance
uniprocessor. u B ≥ u B
          Bm ≥ Bm ≥ p Bp ≥ d                38
Due to the unbalanced speeds we need to
match the processing power of the three
subsystem, we need to match the processing
power of the three subsystems.
Two major approaches are described below :-
• Bandwidth balancing b/w CPU & memory :-
 Using fast cache having access time tc = tp.
2. Bandwidth balancing b/w memory & I/O :-
 Intelligent disk controllers can be used to filter
 out irrelevant data off the tracks. Buffering can
 be performed by I/O channels.                39
6a. Multiprogramming :-

As we know that some computer programs are
 CPU bound & some are I/O bound.Whenever
a
Process P1 is tied up with I/O operations.The
system scheduler can switch the CPU to
process P2.This allows simultaneous execution
of several programs in the system.This
interleaving of CPU & I/O operations among
several programs is called multiprogramming,
so the total execution time is reduced.    40
6b. Time sharing :-
In multiprogramming, sometimes a high
priority program may occupy the CPU for too
long to
allow others to share. This problem can be
overcome by using a time-sharing operating
system.The concept extends from multiprogram
–ming by assigning fixed or variable time slices
to multiple programs. In other words, equal
opportunities are given to all programs
competing for the use of CPU.
    Time sharing is particularly effective when
applied to a computer system connected to   41

Mais conteúdo relacionado

Mais procurados

system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACAPankaj Kumar Jain
 
Chapter 04 the processor
Chapter 04   the processorChapter 04   the processor
Chapter 04 the processorBảo Hoang
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection NetworkAli A Jalil
 
Cache management
Cache managementCache management
Cache managementUET Taxila
 
Computer Organization and Architecture.pptx
Computer Organization and Architecture.pptxComputer Organization and Architecture.pptx
Computer Organization and Architecture.pptxAshokRachapalli1
 
Parallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptx
Parallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptxParallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptx
Parallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptxSumalatha A
 
All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations Syed Zaid Irshad
 
Hypothetical machine and instruction flow scenarios
Hypothetical machine and instruction flow scenariosHypothetical machine and instruction flow scenarios
Hypothetical machine and instruction flow scenariosMunaam Munawar
 
Bootstrapping in Compiler
Bootstrapping in CompilerBootstrapping in Compiler
Bootstrapping in CompilerAkhil Kaushik
 
Data transfer and manipulation
Data transfer and manipulationData transfer and manipulation
Data transfer and manipulationrajshreemuthiah
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipeliningMazin Alwaaly
 
Computer organization memory
Computer organization memoryComputer organization memory
Computer organization memoryDeepak John
 
Basic Processing Unit
Basic Processing UnitBasic Processing Unit
Basic Processing UnitSlideshare
 
Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Bhavik Vashi
 
Pipeline and data hazard
Pipeline and data hazardPipeline and data hazard
Pipeline and data hazardWaed Shagareen
 

Mais procurados (20)

Evolution of os
Evolution of osEvolution of os
Evolution of os
 
system interconnect architectures in ACA
system interconnect architectures in ACAsystem interconnect architectures in ACA
system interconnect architectures in ACA
 
Draw and explain the architecture of general purpose microprocessor
Draw and explain the architecture of general purpose microprocessor Draw and explain the architecture of general purpose microprocessor
Draw and explain the architecture of general purpose microprocessor
 
Chapter 04 the processor
Chapter 04   the processorChapter 04   the processor
Chapter 04 the processor
 
Interconnection Network
Interconnection NetworkInterconnection Network
Interconnection Network
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Cache management
Cache managementCache management
Cache management
 
Computer Organization and Architecture.pptx
Computer Organization and Architecture.pptxComputer Organization and Architecture.pptx
Computer Organization and Architecture.pptx
 
Parallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptx
Parallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptxParallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptx
Parallel Processing & Pipelining in Computer Architecture_Prof.Sumalatha.pptx
 
All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations
 
Pipelining
PipeliningPipelining
Pipelining
 
Hypothetical machine and instruction flow scenarios
Hypothetical machine and instruction flow scenariosHypothetical machine and instruction flow scenarios
Hypothetical machine and instruction flow scenarios
 
Bootstrapping in Compiler
Bootstrapping in CompilerBootstrapping in Compiler
Bootstrapping in Compiler
 
Data transfer and manipulation
Data transfer and manipulationData transfer and manipulation
Data transfer and manipulation
 
Computer architecture pipelining
Computer architecture pipeliningComputer architecture pipelining
Computer architecture pipelining
 
Computer organization memory
Computer organization memoryComputer organization memory
Computer organization memory
 
Basic Processing Unit
Basic Processing UnitBasic Processing Unit
Basic Processing Unit
 
Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)
 
Pipeline and data hazard
Pipeline and data hazardPipeline and data hazard
Pipeline and data hazard
 
pipelining
pipeliningpipelining
pipelining
 

Destaque

Computer architecture kai hwang
Computer architecture   kai hwangComputer architecture   kai hwang
Computer architecture kai hwangSumedha
 
Advanced Computer Architecture chapter 5 problem solutions
Advanced Computer  Architecture  chapter 5 problem solutionsAdvanced Computer  Architecture  chapter 5 problem solutions
Advanced Computer Architecture chapter 5 problem solutionsJoe Christensen
 
Object detection
Object detectionObject detection
Object detectionSomesh Vyas
 
Object recognition
Object recognitionObject recognition
Object recognitionakkichester
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition Intel Nervana
 
Advance data structure & algorithm
Advance data structure & algorithmAdvance data structure & algorithm
Advance data structure & algorithmK Hari Shankar
 
Made easy notes of Digital electronics Part-2
Made easy notes of Digital electronics Part-2Made easy notes of Digital electronics Part-2
Made easy notes of Digital electronics Part-2ranjeet kumar singh
 
Computer Networks Foundation - Study Notes
Computer Networks Foundation - Study NotesComputer Networks Foundation - Study Notes
Computer Networks Foundation - Study NotesMarius FAILLOT DEVARRE
 
Time complexity (linear search vs binary search)
Time complexity (linear search vs binary search)Time complexity (linear search vs binary search)
Time complexity (linear search vs binary search)Kumar
 
Made easy notes of Digital electronics Part-1
Made easy notes of Digital electronics Part-1Made easy notes of Digital electronics Part-1
Made easy notes of Digital electronics Part-1ranjeet kumar singh
 
Gate mathematics questions all branch by s k mondal
Gate mathematics questions all branch by s k mondalGate mathematics questions all branch by s k mondal
Gate mathematics questions all branch by s k mondalAashishv
 
Digital Electronics Notes
Digital Electronics Notes Digital Electronics Notes
Digital Electronics Notes Srikrishna Thota
 
Made easy notes of Digital electronics Part-3
Made easy notes of Digital electronics Part-3Made easy notes of Digital electronics Part-3
Made easy notes of Digital electronics Part-3ranjeet kumar singh
 
Time and space complexity
Time and space complexityTime and space complexity
Time and space complexityAnkit Katiyar
 
Digital notes
Digital notesDigital notes
Digital notesstivengo2
 
Advanced Computer Architecture Chapter 123 Problems Solution
Advanced Computer Architecture Chapter 123 Problems SolutionAdvanced Computer Architecture Chapter 123 Problems Solution
Advanced Computer Architecture Chapter 123 Problems SolutionJoe Christensen
 

Destaque (20)

Computer architecture kai hwang
Computer architecture   kai hwangComputer architecture   kai hwang
Computer architecture kai hwang
 
Advanced Computer Architecture chapter 5 problem solutions
Advanced Computer  Architecture  chapter 5 problem solutionsAdvanced Computer  Architecture  chapter 5 problem solutions
Advanced Computer Architecture chapter 5 problem solutions
 
Object detection
Object detectionObject detection
Object detection
 
Object recognition
Object recognitionObject recognition
Object recognition
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
Object Recognition
Object RecognitionObject Recognition
Object Recognition
 
Advance data structure & algorithm
Advance data structure & algorithmAdvance data structure & algorithm
Advance data structure & algorithm
 
Made easy notes of Digital electronics Part-2
Made easy notes of Digital electronics Part-2Made easy notes of Digital electronics Part-2
Made easy notes of Digital electronics Part-2
 
Computer Networks Foundation - Study Notes
Computer Networks Foundation - Study NotesComputer Networks Foundation - Study Notes
Computer Networks Foundation - Study Notes
 
Time complexity (linear search vs binary search)
Time complexity (linear search vs binary search)Time complexity (linear search vs binary search)
Time complexity (linear search vs binary search)
 
Made easy notes of Digital electronics Part-1
Made easy notes of Digital electronics Part-1Made easy notes of Digital electronics Part-1
Made easy notes of Digital electronics Part-1
 
Gate mathematics questions all branch by s k mondal
Gate mathematics questions all branch by s k mondalGate mathematics questions all branch by s k mondal
Gate mathematics questions all branch by s k mondal
 
Gate mathematics
Gate mathematicsGate mathematics
Gate mathematics
 
Digital Electronics Notes
Digital Electronics Notes Digital Electronics Notes
Digital Electronics Notes
 
Made easy notes of Digital electronics Part-3
Made easy notes of Digital electronics Part-3Made easy notes of Digital electronics Part-3
Made easy notes of Digital electronics Part-3
 
Time and space complexity
Time and space complexityTime and space complexity
Time and space complexity
 
Kai hwang solution
Kai hwang solutionKai hwang solution
Kai hwang solution
 
Digital notes
Digital notesDigital notes
Digital notes
 
Advanced Computer Architecture Chapter 123 Problems Solution
Advanced Computer Architecture Chapter 123 Problems SolutionAdvanced Computer Architecture Chapter 123 Problems Solution
Advanced Computer Architecture Chapter 123 Problems Solution
 
Complexity of Algorithm
Complexity of AlgorithmComplexity of Algorithm
Complexity of Algorithm
 

Semelhante a Lecture 3

Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6Shah Zaib
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)Nimmi Weeraddana
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System AchitectureYashiUpadhyay3
 
Pipelining and vector processing
Pipelining and vector processingPipelining and vector processing
Pipelining and vector processingKamal Acharya
 
Multicore_Architecture Book.pdf
Multicore_Architecture Book.pdfMulticore_Architecture Book.pdf
Multicore_Architecture Book.pdfSwatantraPrakash5
 
Evaluation of morden computer & system attributes in ACA
Evaluation of morden computer &  system attributes in ACAEvaluation of morden computer &  system attributes in ACA
Evaluation of morden computer & system attributes in ACAPankaj Kumar Jain
 
5.7 Parallel Processing - Reactive Programming.pdf.pptx
5.7 Parallel Processing - Reactive Programming.pdf.pptx5.7 Parallel Processing - Reactive Programming.pdf.pptx
5.7 Parallel Processing - Reactive Programming.pdf.pptxMohamedBilal73
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxkrnaween
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisShah Zaib
 
CA UNIT I PPT.ppt
CA UNIT I PPT.pptCA UNIT I PPT.ppt
CA UNIT I PPT.pptRAJESH S
 
Computer Architecture and Organization
Computer Architecture and OrganizationComputer Architecture and Organization
Computer Architecture and Organizationssuserdfc773
 
Cs2253 coa-2marks-2013
Cs2253 coa-2marks-2013Cs2253 coa-2marks-2013
Cs2253 coa-2marks-2013Buvana Buvana
 

Semelhante a Lecture 3 (20)

Parallel Computing - Lec 6
Parallel Computing - Lec 6Parallel Computing - Lec 6
Parallel Computing - Lec 6
 
Computer architecture short note (version 8)
Computer architecture short note (version 8)Computer architecture short note (version 8)
Computer architecture short note (version 8)
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System Achitecture
 
Pipelining and vector processing
Pipelining and vector processingPipelining and vector processing
Pipelining and vector processing
 
Multicore_Architecture Book.pdf
Multicore_Architecture Book.pdfMulticore_Architecture Book.pdf
Multicore_Architecture Book.pdf
 
Evaluation of morden computer & system attributes in ACA
Evaluation of morden computer &  system attributes in ACAEvaluation of morden computer &  system attributes in ACA
Evaluation of morden computer & system attributes in ACA
 
5.7 Parallel Processing - Reactive Programming.pdf.pptx
5.7 Parallel Processing - Reactive Programming.pdf.pptx5.7 Parallel Processing - Reactive Programming.pdf.pptx
5.7 Parallel Processing - Reactive Programming.pdf.pptx
 
Lecture1
Lecture1Lecture1
Lecture1
 
unit_1.pdf
unit_1.pdfunit_1.pdf
unit_1.pdf
 
Pipeline Computing by S. M. Risalat Hasan Chowdhury
Pipeline Computing by S. M. Risalat Hasan ChowdhuryPipeline Computing by S. M. Risalat Hasan Chowdhury
Pipeline Computing by S. M. Risalat Hasan Chowdhury
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
Parallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptxParallel Computing-Part-1.pptx
Parallel Computing-Part-1.pptx
 
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance AnalysisParallel Programming for Multi- Core and Cluster Systems - Performance Analysis
Parallel Programming for Multi- Core and Cluster Systems - Performance Analysis
 
CA UNIT I PPT.ppt
CA UNIT I PPT.pptCA UNIT I PPT.ppt
CA UNIT I PPT.ppt
 
Computer Architecture and Organization
Computer Architecture and OrganizationComputer Architecture and Organization
Computer Architecture and Organization
 
Nbvtalkatjntuvizianagaram
NbvtalkatjntuvizianagaramNbvtalkatjntuvizianagaram
Nbvtalkatjntuvizianagaram
 
Cs2253 coa-2marks-2013
Cs2253 coa-2marks-2013Cs2253 coa-2marks-2013
Cs2253 coa-2marks-2013
 
Debate on RISC-CISC
Debate on RISC-CISCDebate on RISC-CISC
Debate on RISC-CISC
 
CSC204PPTNOTES
CSC204PPTNOTESCSC204PPTNOTES
CSC204PPTNOTES
 

Mais de Mr SMAK

Fyp list batch-2009 (project approval -rejected list)
Fyp list batch-2009 (project approval -rejected list)Fyp list batch-2009 (project approval -rejected list)
Fyp list batch-2009 (project approval -rejected list)Mr SMAK
 
Assigments2009
Assigments2009Assigments2009
Assigments2009Mr SMAK
 
Evaluation of cellular network
Evaluation of cellular networkEvaluation of cellular network
Evaluation of cellular networkMr SMAK
 
Common protocols
Common protocolsCommon protocols
Common protocolsMr SMAK
 
Cellular network
Cellular networkCellular network
Cellular networkMr SMAK
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1Mr SMAK
 
Lecture 6
Lecture  6Lecture  6
Lecture 6Mr SMAK
 
Parallel architecture
Parallel architectureParallel architecture
Parallel architectureMr SMAK
 
Lecture 2
Lecture 2Lecture 2
Lecture 2Mr SMAK
 
Lecture 1
Lecture 1Lecture 1
Lecture 1Mr SMAK
 
Lecture 6
Lecture  6Lecture  6
Lecture 6Mr SMAK
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1Mr SMAK
 
Chapter 2 ASE
Chapter 2 ASEChapter 2 ASE
Chapter 2 ASEMr SMAK
 
Structure of project plan and schedule
Structure of project plan and scheduleStructure of project plan and schedule
Structure of project plan and scheduleMr SMAK
 
Proposal format
Proposal formatProposal format
Proposal formatMr SMAK
 
Proposal announcement batch2009
Proposal announcement batch2009Proposal announcement batch2009
Proposal announcement batch2009Mr SMAK
 
List ofsuparco projectsforuniversities
List ofsuparco projectsforuniversitiesList ofsuparco projectsforuniversities
List ofsuparco projectsforuniversitiesMr SMAK
 
Fyp timeline & assessment policy batch 2009
Fyp timeline & assessment policy batch 2009Fyp timeline & assessment policy batch 2009
Fyp timeline & assessment policy batch 2009Mr SMAK
 
Fyp registration form batch 2009
Fyp registration form batch 2009Fyp registration form batch 2009
Fyp registration form batch 2009Mr SMAK
 

Mais de Mr SMAK (20)

Fyp list batch-2009 (project approval -rejected list)
Fyp list batch-2009 (project approval -rejected list)Fyp list batch-2009 (project approval -rejected list)
Fyp list batch-2009 (project approval -rejected list)
 
Assigments2009
Assigments2009Assigments2009
Assigments2009
 
Week1
Week1Week1
Week1
 
Evaluation of cellular network
Evaluation of cellular networkEvaluation of cellular network
Evaluation of cellular network
 
Common protocols
Common protocolsCommon protocols
Common protocols
 
Cellular network
Cellular networkCellular network
Cellular network
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Parallel architecture
Parallel architectureParallel architecture
Parallel architecture
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Lecture 6
Lecture  6Lecture  6
Lecture 6
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1
 
Chapter 2 ASE
Chapter 2 ASEChapter 2 ASE
Chapter 2 ASE
 
Structure of project plan and schedule
Structure of project plan and scheduleStructure of project plan and schedule
Structure of project plan and schedule
 
Proposal format
Proposal formatProposal format
Proposal format
 
Proposal announcement batch2009
Proposal announcement batch2009Proposal announcement batch2009
Proposal announcement batch2009
 
List ofsuparco projectsforuniversities
List ofsuparco projectsforuniversitiesList ofsuparco projectsforuniversities
List ofsuparco projectsforuniversities
 
Fyp timeline & assessment policy batch 2009
Fyp timeline & assessment policy batch 2009Fyp timeline & assessment policy batch 2009
Fyp timeline & assessment policy batch 2009
 
Fyp registration form batch 2009
Fyp registration form batch 2009Fyp registration form batch 2009
Fyp registration form batch 2009
 

Último

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...RKavithamani
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 

Último (20)

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
Privatization and Disinvestment - Meaning, Objectives, Advantages and Disadva...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

Lecture 3

  • 1. Parallel Computing Lecture # 3 1
  • 2. Course Material Text Books: - Computer Architecture & Parallel Processing Kai Hwang, Faye A. Briggs - Advanced Computer Architecture Kai Hwang. Reference Book: - Scalable Computer Architecture 2
  • 3. What is Parallel Processing? It is an efficient form of information processing which emphasizes on the exploitation of the concurrent events in the computing process. Efficiency is measured as:- Efficiency = Time / Speed + Accuracy * Always first classify definitions then give properties. 3
  • 4. Types of Concurrent Events There are 3 types of concurrent events:- • Parallel Event or Synchronous Event :- (Type of concurrency is parallelism) It may occur in multiple resources during the same interval time. Example Array/Vector Processors CU PE PE PE Based on ALU 4
  • 5. 2. Simultaneous Event or Asynchronous Event :- (Type of concurrency is simultaneity ) It may occur in multiple resources during the same interval time. Example Multiprocessing System 3. Pipelined Event or Overlapped Event :- It may occur in overlapped spans. Example Pipelined Processor 5
  • 6. System Attributes versus Performance Factors The ideal performance of a computer system requires a perfect match between machine capability and program behavior. Machine capability can be enhanced with better hardware technology, however program behavior is difficult to predict due to its dependence on application and run-time conditions. Below are the five fundamental factors for projecting the performance of a computer. 6
  • 7. • Clock Rate :- CPU is driven by a clock of constant cycle time (τ). τ = 1/ f (ns) 2. CPI :- (Cycles per instruction) As different instructions acquire different cycles to execute, CPI will be taken as an average value for a given instruction set and a given program mix. 7
  • 8. 3. Execution Time :- Let Ic be Instruction Count or total number of instructions in the program. So Execution Time = ? T = Ic × CPI × τ Now, CPI = Instruction Cycle = Processor Cycles + Memory Cycles ∴ Instruction cycle = p + m × k where m = number of memory references 8
  • 9. P = number of processor cycles k = latency factor (how much the memory is slow w.r.t to CPU) Now let C be Total number of cycles required to execute a program. So, C=? C = Ic × CPI And the time to execute a program will be T=C×τ 9
  • 10. 4. MIPS Rate :- Ic MIPS rate = 6 T × 10 5. Throughput Rate:- Number of programs executed per unit time. W=? W=1/T OR W = MIPS × 10 6 Ic 10
  • 11. Numerical:- A benchmark program is executed on a 40MHz processor. The benchmark program has the following statistics. Instruction Type Instruction Count Clock Cycle Count Arithmetic 45000 1 Branch 32000 2 Load/Store 15000 2 Floating Point 8000 2 Calculate average CPI,MIPS rate & execution for the above benchmark program. 11
  • 12. Average CPI = C Ic C = Total # cycles to execute a whole program Ic Total Instruction = 45000 ×1 + 32000×2 + 1500×2 + 8000×2 45000 + 3200 + 15000 + 8000 = 155000 100000 CPI = 1.55 Execution Time = C / f 12
  • 13. 6 T = 150000 / 40 × 10 T = 0.155 / 40 T = 3.875 ms 6 MIPS rate = Ic / T × 10 MIPS rate = 25.8 13
  • 14. System Performance Factors Attributes Ic CPI p m k τ Instruction-set Architecture Compiler Technology CPU Implementation & Technology Memory Hierarchy 14
  • 15. Practice Problems :- • Do problem number 1.4 from the book Advanced Computer Architecture by Kai Hwang. 2. A benchmark program containing 234,000 instructions is executed on a processor having a cycle time of 0.15ns The statistics of the program is given below. Each memory reference requires 3 CPU cycles to complete.Calculate MIPS rate & throughput for the program. 15
  • 16. Instruction Instruction Processor Memory Type Mix Cycles Cycles Arithmetic 58 % 2 2 Branch 33 % 3 1 Load/Store 9% 3 2 16
  • 17. Programmatic Levels of Parallel Processing Parallel Processing can be challenged in 4 programmatic levels:- 3. Job / Program Level 2. Task / Procedure Level 3. Interinstruction Level 4. Intrainstruction Level 17
  • 18. 1. Job / Program Level :- It requires the development of parallel processable algorithms.The implementation of parallel algorithms depends on the efficient allocation of limited hardware and software resources to multiple programs being used to solve a large computational problem. Example: Weather forecasting , medical consulting , oil exploration etc. 18
  • 19. 2. Task / Procedure Level :- It is conducted among procedure/tasks within the same program. This involves the decomposition of the program into multiple tasks. ( for simultaneous execution ) 3. Interinstruction Level :- Interinstruction level is to exploit concurrency among multiple instructions so that they can be executed simultaneously. Data dependency analysis is often performed to reveal parallel- 19
  • 20. -lism amoung instructions. Vectorization may be desired for scalar operations within DO loops. 4. Intrainstruction Level :- Intrainstruction level exploits faster and concurrent operations within each instruction e.g. use of carry look ahead and carry save address instead of ripple carry address. 20
  • 21. Key Points :- 1. Hardware role increases from high to low levels whereas software role increases from low to high levels. 2. As highest job level is conducted algorithmically, lowest level is implemented directly by hardware means. 3. The trade-off between hardware and software approaches to solve a problem is always a very controversial issue. 21
  • 22. 4. As hardware cost declines and software cost increases , more and more hardware method are replacing the conventional software approaches. Conclusion :- Parallel Processing is a combined field of studies which requires a broad knowledge of and experience with all aspects of algorithms, languages, hardware, software, performance evaluation and computing alternatives. 22
  • 23. Parallel Processing in Uniprocessor Systems A number of parallel processing mechanisms have been developed in uniprocessor computers. We identify them in six categories which are described below. 1. Multiplicity of Functional Units :- Different ALU functions can be distributed to multiple & specialized functional units which can operate in parallel. 23
  • 24. The CDC-6600 has 10 functional units built in its CPU. IBM 360 / 91 fixed point floating point add / sub mul / div 24
  • 25. 2. Parallelism & Pipelining within the CPU :- Use of carry-lookahead & carry-save address instead of ripple-carry adders. Cascade two 4-bit parallel adders to create an 8-bit parallel adder. 25
  • 26. Ripple-carry Adder :- At each stage the sum bit is not valid until after the carry bits in all the preceding stages are valid. No of bits is directly proportional to the time required for valid addition. Problem :- The time required to generate each carryout bit in the 8-bit parallel adder is 24ns. Once all inputs to an adder are valid, there is a delay of 32ns until the output sum bit is valid. What is the maximum number of additions per 26
  • 27. second that the adder can perform? 1 addition = 7 × 24 + 32 = 200ns Additions / sec = 1 / 200 -3 9 = 0.5 × 10 × 10 = 5 × 10 6 = 5 million additions / sec 27
  • 28. Practice Problem Assuming the 32ns delay in producing a valid sum bit in the 8-bit parallel adder. What maximum delay in generating a carry out bit is allowed if the adder must be capable of 7 performing 10 additions per second. 28
  • 29. Carry-Lookahead Adder :- A 4-bit parallel adder incorporating carry look-ahead. Each full adder is of the type shown in fig. 29
  • 30. Essence & Idea :- To determine & generate the carry input bits for all stages after examining the input bits simultaneously. C1 = A0 B0 + A0 C0 + B0C0 = A0 B0 + ( A0 + B0 ) C0 C2 = A1B1 + ( A1 + B1 ) C1 Carry Carry Generate Propagate Cn = An-1Bn-1 + ( An-1 + Bn-1 ) Cn-1 30
  • 31. If Ai and Bi both are 1 then Ci+1 = 1. It means that the input data itself generating a carry this is called carry generate. G0 = A0B0 G1 = A1B1 Gn-1 = An-1 Bn-1 Ci+1 can be 1 if Ci = 1 and if either Ai or Bi = 1 it means that A0 or B0 is used to propagate the carry. This is called carry propagate represented by P0. 31
  • 32. P0 = A0B0 P1 = A1 + Bo Pn-1 = An-1 + Bn-1 Now writing the carry equations in terms of carry generate and carry propagate. C1 = G0 + P0C0 C2 = G1 + P1C1 = G1 + P1 (G0 + P0C0 ) 32 C =G +PG +PPC
  • 33. C3 = G2 + G2C2 = G2 + P2 ( G1 + P1G0 + P1P0C0 ) C3 = G2 + P2G1 + P2P1G0 + P2P1P0C0 Problem :- In each full adder of a 4-bit carry look-ahead adder, there is a propagation delay of 4ns before the carry propagate & carry generate outputs are valid. The delay in each external logic gate is 3ns. Once all inputs to an adder are valid, there is a delay of 6ns before the output sum bit is valid. What is the maximum no of additions/sec that the adder 33
  • 34. perform? 1 addition = 4ns + 3ns + 3ns + 6ns = 16ns AND gate is OR gate is in in parallel serial Additions / sec = 1 / 16 -3 9 = 62.5 × 10 × 10 = 62.5 × 10 6 = 62.5 million additions / sec 34
  • 35. 3. Overlapping CPU & I/O Operations :- DMA is conducted on a cycle-stealing basis. • CDC-6600 has 10 I/O processors of I/O multiprocessing. • Simultaneous I/O operations & CPU computations can be achieved using separate I/O controllers, channels. 4. Use of hierarchical Memory System :- A hierarchal memory system can be used to close up the speed gap between the CPU & 35
  • 36. main memory because CPU is 1000 times faster than memory access. 5. Balancing of Subsystem Bandwidth :- Consider the relation t m< tm < td Bandwidth of a System :- Bandwidth of a system is defined as the number of operations performed per unit time. Bandwidth of a memory :- The memory bandwidth is the number of words 36
  • 37. accessed per unit time. It is represented by Bm. If ‘W’ is the total number of words accessed per memory cycle tm then Bm = W (words / sec ) tm In case of interleaved memory of M modules, the memory access conflicts may cause delayed access to some of the processors requests. The utilized memory bandwidth will be: Bum = Bm (words / sec ) √M Processor Bandwidth :- Bp :- maximum CPU computation rate. u Bp :- utilized processor bandwidth or the no. of 37
  • 38. output results per second. u Bp = Rw (word result) Tp Rw :- no of word results. Tp :- Total CPU time to generate Rw results. Bd :- Bandwidth of devices. (which is assumed as provided by the vendor). The following relationships have been observed between the bandwidths of the major subsystems in a high performance uniprocessor. u B ≥ u B Bm ≥ Bm ≥ p Bp ≥ d 38
  • 39. Due to the unbalanced speeds we need to match the processing power of the three subsystem, we need to match the processing power of the three subsystems. Two major approaches are described below :- • Bandwidth balancing b/w CPU & memory :- Using fast cache having access time tc = tp. 2. Bandwidth balancing b/w memory & I/O :- Intelligent disk controllers can be used to filter out irrelevant data off the tracks. Buffering can be performed by I/O channels. 39
  • 40. 6a. Multiprogramming :- As we know that some computer programs are CPU bound & some are I/O bound.Whenever a Process P1 is tied up with I/O operations.The system scheduler can switch the CPU to process P2.This allows simultaneous execution of several programs in the system.This interleaving of CPU & I/O operations among several programs is called multiprogramming, so the total execution time is reduced. 40
  • 41. 6b. Time sharing :- In multiprogramming, sometimes a high priority program may occupy the CPU for too long to allow others to share. This problem can be overcome by using a time-sharing operating system.The concept extends from multiprogram –ming by assigning fixed or variable time slices to multiple programs. In other words, equal opportunities are given to all programs competing for the use of CPU. Time sharing is particularly effective when applied to a computer system connected to 41