O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Fundamentals of Computer Design
Unit 1- Chapter 1
1
Reference :
Computer Architecture, A
Quantitative Approach – John
L. H...
2
Outline
 Introduction
 Computing Classes
 Task of Computer Designer
 Measuring, Reporting,
Summarizing Performance
...
3
Introduction
Eniac, 1946
(first stored-program computer)
Occupied 50x30 feet room,
weighted 30 tonnes,
contained 18000 e...
4
A short history of computing
 Continuous growth in performance due to advances in technology and
innovations in compute...
5
Effect of this Dramatic Growth
 Significant enhancement of the capability
available to computer user
 Example: a today...
6
Changing Face of Computing
 In the 1960s mainframes roamed the planet
 Very expensive, operators oversaw operations
 ...
7
Computing Classes: A Summary
Feature Desktop Server Embedded
Price of the
system
$500-$5K $5K-$5M $10-$100K (including n...
8
Desktop Computers
 Largest market in dollar terms
 Spans low-end (<$500) to high-end ($5K) systems
 Optimize price-p...
9
Servers
 Provide more reliable file and computing
services (Web servers)
 Key requirements
 Availability – effectivel...
10
Embedded Computers
 Fastest growing portion of the market
 Computers as parts of other devices where their presence i...
11
Task of Computer Designer
 “Determine what attributes are important for
a new machine; then design a machine to
maximi...
12
What is Computer Architecture?
 Instruction Set Architecture
 the computer visible to the assembler language
programm...
Instruction Set Architecture, Organization, and
Hardware
13
14
Instruction Set Architecture:
Critical Interface
instruction set
software
hardware
 Properties of a good abstraction
...
Measuring, Reporting,
Summarizing Performance
15
16
Cost-Performance
 Purchasing perspective: from a collection of
machines, choose one which has
 best performance?
 le...
17
Two “notions” of performance
 Which computer has better performance?
 User: one which runs a program in less time
 C...
18
An Example
 Which has higher performance?
 Time to deliver 1 passenger?
 B is 6.5/3 = 2.2 times faster
 No. of pass...
19
Definition of Performance
 We are primarily concerned with Response Time
 Performance :
 “X is n times faster than Y...
20
Execution Time and Its Components
 Execution Time - response time, elapsed time
 the latency to complete a task, incl...
21
CPU Time for a program
 Instruction count (IC) = Number of
instructions executed
 Clock cycles per instruction (CPI)
...
22
CPU Execution Time (cont’d)
timecycleClockCPIICtimeCPU 
Program
Seconds
cycleClock
Seconds
nInstructio
cyclesClock
P...
 Basic technologies involved in changing each
characteristic are interdependent:
 Hence it is difficult to change one pa...
24
How to Calculate 3 Components?
 Clock Cycle Time
 in specification of computer
(Clock Rate in advertisements)
 Instr...
25
Metrics
 For desktop and workstations in most cases it is execution
time
 Also known as response time
 Reciprocal of...
26
SPEC- Benchmarks
 Want to compare two processors by measuring
their performance
 Need some standardized set of progra...
27
Choosing Programs to Evaluate Per.
 Ideally run typical programs with typical input before
purchase, or before even bu...
28
Benchmarks
 Different types of benchmarks
 Real programs (Ex. MSWord, Excel, Photoshop,...)
 Kernels - small pieces ...
29
Desktop benchmarks
 SPEC CPU is the standard
 Provided by Standard Performance Evaluation Corporation (SPEC)
http://w...
30
Server benchmarks
 SPECrate
 Not the most reliable measure: run a copy of the same
SPEC application on each processor...
31
Server benchmarks
 Transaction processing (TP)
 Probably the most widely used benchmark in server community
 Measure...
32
Embedded sector
 Quite difficult to come up with a good benchmark
 Wide range of applications
 Requirements vary wid...
33
Comparing and Summarising Per.
 An Example
 What we can learn from these statements?
 We know nothing about
relative...
 A Benchmark Program is compiled and run
on computer under test- note the running time
 The same pgm is compiled and run...
 The above process is repeated for all pgms in
the SPEC suite
 The overall SPEC ratio for the computer
35







...
 SPECRatio of computer A on a benchmark
was 1.25 times higher than computer B
 This means performance of computer A is
1...
37
Quantitative Principles of Design
 Where to spend time making improvements?
 Make the Common Case Fast
 Most importa...
38
1. Focus on the Common Case -
Amdahl’s Law
 Suppose that we make an enhancement to a
machine that will improve its per...
39
Computing Speedup
 Fraction_enhanced = fraction of execution time in
the original machine that can be converted to tak...
40
Computing Speedup (cont’d)
 so times are:
 Factor out Timeold and divide by
Speedupenhanced:
 Overall speedup is rat...
41
Example 1
 Enhancement runs 10 times faster and it
affects 40% of the execution time; compute
the overall speedup :
 ...
Example 2
 A common transformation required in graphics processors is square
root. Implementations of floating-point (FP)...
 Hence improving the performance of the FP operations
overall is slightly better because of the higher frequency.
43
enha...
Another method of calculating CPU time
44
where –
ICi represents number of times instruction i is executed in a program
CP...
45
The CPU time taken by a program is given by :
The overall CPI is expressed by :
Example 3
 Suppose we have made the following measurements:

 Frequency of FP operations =25%
 Average CPI of FP opera...
 Find the original CPI with neither enhancement
 Compute the CPI for the enhanced FPSQR by
subtracting the cycles saved ...
 compute the CPI for the enhancement of all FP
instructions the same way:
CPInewFP = CPIoriginal - 25% x (CPIoldFP - CPIn...
2. Take Advantage of Parallelism
 One of the most important methods for
improving performance
In a server:
 multiple pro...
Individual processor
 Parallelism among instructions - pipelining
 set-associative caches use multiple banks of
memory
5...
3. Principle of Locality
 Programs tend to reuse data and instructions
they have used recently.
 Program spends 90% of i...
Principle of Locality
 We can predict which instructions and data a
program will use in the near future based on
its acce...
Próximos SlideShares
Carregando em…5
×

Unit i-introduction

184 visualizações

Publicada em

fundamental of computer design

Publicada em: Engenharia
  • Seja o primeiro a comentar

Unit i-introduction

  1. 1. Fundamentals of Computer Design Unit 1- Chapter 1 1 Reference : Computer Architecture, A Quantitative Approach – John L. Hennessey and David A. Patterson. 4th edition
  2. 2. 2 Outline  Introduction  Computing Classes  Task of Computer Designer  Measuring, Reporting, Summarizing Performance  Quantitative Principles of Design
  3. 3. 3 Introduction Eniac, 1946 (first stored-program computer) Occupied 50x30 feet room, weighted 30 tonnes, contained 18000 electronic valves, consumed 25KW of electrical power; capable to perform 100K calc. per second CHANGE! PlayStation Portable (PSP) Approx. 170 mm (L) x 74 mm (W) x 23 mm (D) Weight: Approx. 260 g (including battery) CPU: PSP CPU (clock frequency 1~333MHz) Main Memory: 32MB Embedded DRAM: 4MB Profile: PSP Game, UMD Audio, UMD Video
  4. 4. 4 A short history of computing  Continuous growth in performance due to advances in technology and innovations in computer design  First 25 years (1945 – 1970)  25% yearly growth in performance  Both forces contributed to performance improvement  Mainframes and minicomputers dominated the industry  Late 70s, emergence of the microprocessor  35% yearly growth in performance thanks to integrated circuit technology  Changes in computer marketplace: elimination of assembly language programming, emergence of Unix  easier to develop new architectures  Mid 80s, emergence of RISCs (Reduced Instruction Set Computers)  52% yearly growth in performance  Performance improvements through instruction level parallelism (pipelining, multiple instruction issue), caches  Since ‘02, end of 16 years of renaissance  20% yearly growth in performance  Limited by 3 hurdles: maximum power dissipation, instruction-level parallelism, and so called “memory wall”  Switch from ILP to TLP and DLP (Thread-, Data-level Parallelism)
  5. 5. 5 Effect of this Dramatic Growth  Significant enhancement of the capability available to computer user  Example: a today’s $500 PC has more performance, more main memory, and more disk storage than a $1 million computer in 1985  Microprocessor-based computers dominate  Workstations and PCs have emerged as major products  Minicomputers - replaced by servers  Mainframes - replaced by multiprocessors  Supercomputers - replaced by large arrays of microprocessors
  6. 6. 6 Changing Face of Computing  In the 1960s mainframes roamed the planet  Very expensive, operators oversaw operations  Applications: business data processing, large scale scientific computing  In the 1970s, minicomputers emerged  Less expensive, time sharing  In the 1990s, Internet and WWW, handheld devices (PDA), high-performance consumer electronics for video games and set-top boxes have emerged  Dramatic changes have led to 3 different computing markets  Desktop computing, Servers, Embedded Computers
  7. 7. 7 Computing Classes: A Summary Feature Desktop Server Embedded Price of the system $500-$5K $5K-$5M $10-$100K (including network routers at high end) Price of the processor $50-$500 $200-$10K $0.01 - $100 Sold per year (estimates for 2000) 150M 4M 300M (only 32-bit and 64-bit) Critical system design issues Price & performance. graphics, compute performance Throughput, availability, scalability Price, power consumption, application-specific performance
  8. 8. 8 Desktop Computers  Largest market in dollar terms  Spans low-end (<$500) to high-end ($5K) systems  Optimize price-performance  Performance measured in the number of calculations and graphic operations  Price is what matters to customers  Arena where the newest, highest-performance and cost-reduced microprocessors appear  Reasonably well characterized in terms of applications and benchmarking
  9. 9. 9 Servers  Provide more reliable file and computing services (Web servers)  Key requirements  Availability – effectively provide service (Yahoo!, Google, eBay)  Reliability – never fails  Scalability – server systems grow over time, so the ability to scale up the computing capacity is crucial  Performance – transactions per minute  Related category: clusters / supercomputers
  10. 10. 10 Embedded Computers  Fastest growing portion of the market  Computers as parts of other devices where their presence is not obviously visible  E.g., home appliances, printers, smart cards, cell phones, palmtops, gaming consoles, network routers  Wide range of processing power and cost  $0.1 (8-bit, 16-bit processors), $10 (32-bit capable to execute 50M instructions per second), $100-$200 (high-end video gaming consoles and network switches)  Requirements  Real-time performance requirement (e.g., time to process a video frame is limited)  Minimize memory requirements, power  SOCs (System-on-a-chip) combine processor cores and application-specific circuitry, DSP processors…
  11. 11. 11 Task of Computer Designer  “Determine what attributes are important for a new machine; then design a machine to maximize performance while staying within cost, power, and availability constraints.”  Aspects of this task  Instruction set design  Functional organization  Logic design and implementation (IC design, packaging, power, cooling...)
  12. 12. 12 What is Computer Architecture?  Instruction Set Architecture  the computer visible to the assembler language programmer or compiler writer (registers, data types, instruction set, instruction formats, addressing modes)  Organization  high level aspects of computer’s design such as the memory system, the bus structure, and the internal CPU (datapath + control) design  Hardware  detailed logic design, interconnection and packing technology, external connections Computer Architecture covers all three aspects of computer design
  13. 13. Instruction Set Architecture, Organization, and Hardware 13
  14. 14. 14 Instruction Set Architecture: Critical Interface instruction set software hardware  Properties of a good abstraction  Lasts through many generations (portability)  Used in many different ways (generality)  Provides convenient functionality to higher levels  Permits an efficient implementation at lower levels
  15. 15. Measuring, Reporting, Summarizing Performance 15
  16. 16. 16 Cost-Performance  Purchasing perspective: from a collection of machines, choose one which has  best performance?  least cost?  best performance/cost?  Computer designer perspective: faced with design options, select one which has  best performance improvement?  least cost?  best performance/cost?  Both require: basis for comparison and metric for evaluation
  17. 17. 17 Two “notions” of performance  Which computer has better performance?  User: one which runs a program in less time  Computer centre manager: one which completes more jobs in a given time  Users are interested in reducing Response time or Execution time  the time between the start and the completion of an event  Managers are interested in increasing Throughput or Bandwidth  total amount of work done in a given time
  18. 18. 18 An Example  Which has higher performance?  Time to deliver 1 passenger?  B is 6.5/3 = 2.2 times faster  No. of passengers delivered per hour?  A is 72/44 = 1.6 times faster Plane DC to Paris [hour] Top Speed [mph] Passe -ngers Throughput [p/h] A 6.5 610 470 72 (=470/6.5) B 3 1350 132 44 (=132/3)
  19. 19. 19 Definition of Performance  We are primarily concerned with Response Time  Performance :  “X is n times faster than Y”  As faster means both increased performance and decreased execution time, to reduce confusion will use “improve performance” or “improve execution time” )(_ )( xtimeExecution xePerformanc 1  )( )( )(_ )(_ yePerformanc xePerformanc xtimeExecution ytimeExecution n 
  20. 20. 20 Execution Time and Its Components  Execution Time - response time, elapsed time  the latency to complete a task, including disk accesses, memory accesses, input/output activities, operating system overhead,...  CPU time  the time the CPU is computing, excluding I/O or running other programs with multiprogramming  often further divided into user and system CPU times  User CPU time  the CPU time spent in the program  System CPU time  the CPU time spent in the operating system
  21. 21. 21 CPU Time for a program  Instruction count (IC) = Number of instructions executed  Clock cycles per instruction (CPI) timecycleClockprogramaforcyclesclockCPUtimeCPU  rateClock programaforcyclesclockCPU CPUtime  IC programaforcyclesclockCPU CPI  CPI - one way to compare two machines with same instruction set, since Instruction Count would be the same
  22. 22. 22 CPU Execution Time (cont’d) timecycleClockCPIICtimeCPU  Program Seconds cycleClock Seconds nInstructio cyclesClock Program nsInstructio timeCPU  Substituting the units of measurement
  23. 23.  Basic technologies involved in changing each characteristic are interdependent:  Hence it is difficult to change one parameter without affecting others 23 IC CPI Clock rate Program X Compiler X ISA X X Organisation X X Technology X CPU Execution Time (cont’d)
  24. 24. 24 How to Calculate 3 Components?  Clock Cycle Time  in specification of computer (Clock Rate in advertisements)  Instruction count  Count instructions in loop of small program  Use simulator to count instructions  Hardware counter in special register (Pentium II)  CPI  Calculate: Execution Time / Clock cycle time / Instruction Count  Hardware counter in special register (Pentium II)
  25. 25. 25 Metrics  For desktop and workstations in most cases it is execution time  Also known as response time  Reciprocal of performance  For servers it is normally throughput  How many jobs can get done in unit time  Almost always a response time limit (per job) is also imposed  For embedded processors it is also execution time  In many situations a hard or soft real time deadline is imposed  Hard deadlines cannot be missed, soft deadlines can be missed in a limited cases  Goal: maximize performance within power budget
  26. 26. 26 SPEC- Benchmarks  Want to compare two processors by measuring their performance  Need some standardized set of programs  These are called benchmark programs  Each market sector has a different focus: needs different set of benchmark
  27. 27. 27 Choosing Programs to Evaluate Per.  Ideally run typical programs with typical input before purchase, or before even build machine  Engineer uses compiler, spreadsheet  Author uses word processor, drawing program, compression software  Workload – mixture of programs and OS commands that users run on a machine
  28. 28. 28 Benchmarks  Different types of benchmarks  Real programs (Ex. MSWord, Excel, Photoshop,...)  Kernels - small pieces from real programs (Linpack,...)  Toy Benchmarks - short, easy to type and run ( Quicksort, Puzzle,...)  Synthetic benchmarks - code that matches frequency of key instructions and operations to real programs (Whetstone, Dhrystone)  Need industry standards so that different processors can be fairly compared  Companies exist that create these benchmarks: “typical” code used to evaluate systems
  29. 29. 29 Desktop benchmarks  SPEC CPU is the standard  Provided by Standard Performance Evaluation Corporation (SPEC) http://www.spec.org/  Currently in the 5th generation: SPEC89, SPEC92, SPEC95, SPEC2000, SPEC2006  Has two types of programs: integer and floating-point to stress different CPU units  SPEC2000 has 12 integer and 14 floating-point programs  If you are introducing a new processor for desktop or uniprocessor workstation/server, you are supposed to report performance of all 26 programs  For graphics performance two available benchmarks: SPECviewperf and SPECapc
  30. 30. 30 Server benchmarks  SPECrate  Not the most reliable measure: run a copy of the same SPEC application on each processor and report average number of jobs finished per unit time  SPECSFS  Tests the performance of NFS using a series of file server requests; measures CPU, disk and network throughput; also puts response time limit on each request  SPECWeb  Web server benchmark; simulates multiple clients requesting both static and dynamic pages from a server; also clients can upload data on the server
  31. 31. 31 Server benchmarks  Transaction processing (TP)  Probably the most widely used benchmark in server community  Measures database access and update throughput of a server  Simple examples: airline reservation, bank ATM  Complex systems: complicated query processing e.g. online book shops (amazon)  Provided by Transaction Processing Council (TPC)  TPC-A was the first one; now we have TPC-C (complex queries), TPC-H (unrelated queries), TPC-R (DBMS is optimized based on past query pattern), TPC-W (business-oriented transactional web server)  The measured metric is transactions per second along with response time limit for individual transactions
  32. 32. 32 Embedded sector  Quite difficult to come up with a good benchmark  Wide range of applications  Requirements vary widely from one system to another  Reality: in many cases an embedded system designer designs his/her own benchmarks which are either the target applications or kernels extracted from them  Today the best known benchmark set is offered by EEMBC (Embedded Microprocessor Benchmark Consortium)  EEMBC has five types of applications: automotive/industrial (16 kernels), consumer (5 multimedia kernels), networking (3 kernels), office automation (4 graphics and text processing kernels), telecommunications (6 filtering and DSP kernels)
  33. 33. 33 Comparing and Summarising Per.  An Example  What we can learn from these statements?  We know nothing about relative performance of computers A, B, C!  One approach to summarise relative performance: use total execution times of programs Program Com. A Com. B Com. C P1 (sec) 1 10 20 P2 (sec) 1000 100 20 Total (sec) 1001 110 40 – A is 20 times faster than C for program P1 – C is 50 times faster than A for program P2 – B is 2 times faster than C for program P1 – C is 5 times faster than B for program P2
  34. 34.  A Benchmark Program is compiled and run on computer under test- note the running time  The same pgm is compiled and run on a reference computer  Compute the SPEC ratio as 34 testundercomputertheontimerunning computerrefencetheontimerunning RatioSPEC  SPEC ratio of 50 means comp. under test is 50 times faster than ref. comp. Comparing and Sum. Per. (cont’d)
  35. 35.  The above process is repeated for all pgms in the SPEC suite  The overall SPEC ratio for the computer 35          n i i ratioSPEC SPEC n 1 1 Where, n – number of pgms in the suite SPECi – ratio for the pgm i in the suite Comparing and Sum. Per. (cont’d)
  36. 36.  SPECRatio of computer A on a benchmark was 1.25 times higher than computer B  This means performance of computer A is 1.25 times higher than computer B 36  Note - choice of the reference computer is irrelevant Comparing and Sum. Per. (cont’d)
  37. 37. 37 Quantitative Principles of Design  Where to spend time making improvements?  Make the Common Case Fast  Most important principle of computer design: Spend your time on improvements where those improvements will do the most good  Example  Instruction A represents 5% of execution  Instruction B represents 20% of execution  Even if you can drive the time for A to 0, the CPU will only be 5% faster  Key questions  What the frequent case is?  How much performance can be improved by making that case faster?
  38. 38. 38 1. Focus on the Common Case - Amdahl’s Law  Suppose that we make an enhancement to a machine that will improve its performance; Speedup is ratio:  Amdahl’s Law states that the performance improvement that can be gained by a particular enhancement is limited by the amount of time that enhancement can be used tenhancemenusingtaskentireforExTime tenhancemenwithouttaskentireforExTime Speedup  tenhancemenwithouttaskentireforePerformanc tenhancemenusingtaskentireforePerformanc Speedup 
  39. 39. 39 Computing Speedup  Fraction_enhanced = fraction of execution time in the original machine that can be converted to take advantage of enhancement (E.g., 10/30)  Speedup_enhanced = how much faster the enhanced code will run (E.g., 10/2=5)  Execution_timenew = ex time of unenhanced part + new ex time of the enhanced part enhanced enhanced unenhancednew Speedup ExTime ExTimeExTime  20 10 20 2
  40. 40. 40 Computing Speedup (cont’d)  so times are:  Factor out Timeold and divide by Speedupenhanced:  Overall speedup is ratio of Timeold to Timenew:  enhancedoldunenhanced FractionExTimeExTime  1 enhancedoldenhanced FractionExTimeExTime         enhanced enhanced enhancedoldnew Speedup Fraction FractionExTimeExTime 1 enhanced enhanced enhanced Speedup Fraction Fraction Speedup   1 1
  41. 41. 41 Example 1  Enhancement runs 10 times faster and it affects 40% of the execution time; compute the overall speedup :  Fractionenhanced = 0.40  Speedupenhanced = 10  Speedupoverall = ? 561 640 1 10 40 401 1 . .. .   Speedup
  42. 42. Example 2  A common transformation required in graphics processors is square root. Implementations of floating-point (FP) square root vary significantly in performance, especially among processors designed for graphics. Suppose FP square root (FPSQR) is responsible for 20% of the execution time of a critical graphics benchmark. One proposal is to enhance the FPSQR hardware and speed up this operation by a factor of 10.  The other alternative is just to try to make all FP instructions in the graphics processor run faster by a factor of 1.6; FP instructions are responsible for half of the execution time for the application. The design team believes that they can make all FP instructions run 1.6 times faster with the same effort as required for the fast square root. Compare these two design alternatives. 42
  43. 43.  Hence improving the performance of the FP operations overall is slightly better because of the higher frequency. 43 enhanced enhanced enhanced Speedup Fraction Fraction Speedup   1 1
  44. 44. Another method of calculating CPU time 44 where – ICi represents number of times instruction i is executed in a program CPIi represents the average number of clocks per instruction for instruction i. The CPU clock cycles taken by a program is given by :
  45. 45. 45 The CPU time taken by a program is given by : The overall CPI is expressed by :
  46. 46. Example 3  Suppose we have made the following measurements:   Frequency of FP operations =25%  Average CPI of FP operations =4.0  Average CPI of other instructions = 1.33  Frequency of FPSQR=2%  CPI of FPSQR=20  Assume that the two design alternatives are to decrease the CPI of FPSQR to 2 or to decrease the average CPI of all FP operations to 2.5. Compare these two design alternatives using the processor performance equation. 46
  47. 47.  Find the original CPI with neither enhancement  Compute the CPI for the enhanced FPSQR by subtracting the cycles saved from the original CPI: 47
  48. 48.  compute the CPI for the enhancement of all FP instructions the same way: CPInewFP = CPIoriginal - 25% x (CPIoldFP - CPInewFPonly ) =2 – 0.25 x (4 - 2.5) =2 – (0.25 X 1.5) =1.625  Performance is marginally better with FP enhancement  Speedup with new FP = 2 / 1.625 = 1.23 48
  49. 49. 2. Take Advantage of Parallelism  One of the most important methods for improving performance In a server:  multiple processors and multiple disks can be used - improves throughput.  Scalability- Being able to expand memory, number of processors, disks - is a valuable asset for servers. 49
  50. 50. Individual processor  Parallelism among instructions - pipelining  set-associative caches use multiple banks of memory 50 2. Take Advantage of Parallelism
  51. 51. 3. Principle of Locality  Programs tend to reuse data and instructions they have used recently.  Program spends 90% of its execution time in only 10% of the code  Two different types of locality  Temporal locality- recently accessed items are likely to be accessed in the near future.  Spatial locality- items whose addresses are near one another tend to be referenced close together in time. 51
  52. 52. Principle of Locality  We can predict which instructions and data a program will use in the near future based on its accesses in the recent past 52

×