SlideShare a Scribd company logo
1 of 79
Download to read offline
Computer Organization
and Architecture

Computer
Evolution and
Performance

Yanmin Zhu
yzhu@cs.sjtu.edu.cn

                      Yanmin Zhu @ SJTU   1
The First Computer?
• Abacus 算盘
• Invented by ancient
  Chinese
• The 2nd century BC




Yanmin Zhu @ SJTU       2
ENIAC - background
• Electronic Numerical
  Computer
• Eckert and Mauchly
• University of Pennsylvania
• Trajectory tables for
  weapons
• Started 1943
• Finished 1946
     —Too late for war effort
• Used until 1955
                                John Mauchly (1907-1980) &
                                  J. Presper Eckert (1919-
Yanmin Zhu @ SJTU                          1995)             3
ENIAC - details
 • Decimal (not binary)
 • 20 accumulators of 10
    digits
 • Programmed manually
    by switches
 • 18,000 vacuum tubes
 • 30 tons
 • 15,000 square feet
 • 140 kW power
    consumption
 • 5,000 additions per
    second
Yanmin Zhu @ SJTU          4
von Neumann/Turing
• Stored Program concept
• Main memory storing programs   Father of computers
  and data                       Von Neumann, 1903-1957

• ALU operating on binary data
• Control unit interpreting
  instructions from memory and
  executing
• Input and output equipment       Alan Turing, 1912-
  operated by control unit         1954

• Princeton Institute for
  Advanced Studies
     —IAS computer
• Completed 1952
Yanmin Zhu @ SJTU                   IAS computer 5
Structure of von Neumann machine




Yanmin Zhu @ SJTU                  6
IAS - details
• 1000 x 40 bit words
     —Binary number
     —2 x 20 bit instructions




Yanmin Zhu @ SJTU               7
Structure of IAS – detail




                        • Memory Buffer Register
                        • Memory Address
                        Register
                        • Instruction Register
                        • Instruction Buffer
                        Register
                        • Program Counter
                        • Accumulator
                        • Multiplier Quotient
Commercial Computers
• 1947 - Eckert-Mauchly Computer
  Corporation
• UNIVAC I (Universal Automatic Computer)
• US Bureau of Census
     —1950 calculations
• Became part of Sperry-Rand Corporation
• Late 1950s - UNIVAC II
     —Faster
     —More memory



Yanmin Zhu @ SJTU                           9
IBM
• Punched-card processing
  equipment
• 1953 - the 701
     —IBM’s first stored program
      computer
     —Scientific calculations
• 1955 - the 702
     —Business applications
• Lead to 700/7000 series



Yanmin Zhu @ SJTU
                                   IBM 701 at Douglas Aircraft.
                                                            10
Transistors – the 2nd generation
• Replaced vacuum tubes
• Smaller
• Cheaper
• Less heat dissipation
• Solid State device
• Made from Silicon (Sand)
• Invented 1947 at Bell
  Labs
• William Shockley et al.


Yanmin Zhu @ SJTU                  11
Transistor Based Computers
• Second generation
  machines
• NCR & RCA
  produced small
  transistor machines     IBM 7000
• IBM
     —Produced IBM 7000
• DEC - 1957
     —Produced PDP-1




Yanmin Zhu @ SJTU         PDP-1      12
Microelectronics – 3rd Generation
• Literally - ―small
  electronics‖
• A computer is made up
  of gates, memory cells
  and interconnections
• These can be
  manufactured on a
  semiconductor
• e.g. silicon wafer



Yanmin Zhu @ SJTU                   13
Manufacturing ICs




   • Yield: proportion of working dies per wafer

Yanmin Zhu @ SJTU                              14
AMD Opteron X2 Wafer




 • X2: 300mm wafer, 117 chips, 90nm
   technology
Yanmin Zhu @ SJTU                     15
Generations of Computer
• Vacuum tube - 1946-1957
• Transistor - 1958-1964
• Small scale integration - 1965 on
     —Up to 100 devices on a chip
• Medium scale integration - to 1971
     —100-3,000 devices on a chip
• Large scale integration - 1971-1977
     —3,000 - 100,000 devices on a chip
• Very large scale integration - 1978 -1991
     —100,000 - 100,000,000 devices on a chip
• Ultra large scale integration – 1991 -
     —Over 100,000,000 devices on a chip
Yanmin Zhu @ SJTU                               16
Generations of Computer




Yanmin Zhu @ SJTU         17
Moore’s Law
• Increased density of
  components on chip
• Since 1970’s development
  has slowed a little
     — Number of transistors
       doubles every 18 months
• Cost of a chip has              Gordon Moore
  remained almost                 – co-founder of
  unchanged                       Intel
     — That means?
                                 Number of
                                 transistors on a chip
                                 will double every
                                 year
Yanmin Zhu @ SJTU                                        18
Growth in CPU Transistor Count
Consequences of Moore’s Law
• Higher packing density means shorter
  electrical paths, giving higher
  performance
• Smaller size gives increased flexibility
• Reduced power and cooling requirements
• Fewer interconnections increases
  reliability




Yanmin Zhu @ SJTU                            20
IBM 360 series
• 1964
• Replaced (& not
  compatible with) 7000
  series
• First planned ―family‖
  of computers
     — Similar or identical
       instruction sets
     — Similar or identical O/S
     — Increasing speed
     — Increasing number of
       I/O ports (i.e. more
       terminals)                 This series offered a standard
     — Increased memory size
                                  interface and a wide choice of
     — Increased cost
• Multiplexed switch              standard peripheral devices.
  structure                       The standard interface made it easy
                                  to change the peripherals when the
                                  customer's needs changed.
Yanmin Zhu @ SJTU                                                  21
Example members of System/360




Yanmin Zhu @ SJTU               22
DEC PDP-8
• 1964
• First minicomputer (after
  miniskirt!)
     —Did not need air conditioned room
     —Small enough to sit on a lab bench
• $16,000
     —$100k+ for IBM 360


• Embedded applications & OEM
• BUS STRUCTURE


Yanmin Zhu @ SJTU                          23
Spectrum of Computers
                               Supercomputer

                             Mainframe

                        Minicomputer

       Desktop      Server




Laptop


Yanmin Zhu @ SJTU                              24
Central Control vs Bus
                    Multiplexed switch structure




                            BUS STRUCTURE




Yanmin Zhu @ SJTU                                  25
Intel                 Later generations
• 1971 - 4004
     —First microprocessor (All CPU
      components on a single chip)
     —4 bit
• Followed in 1972 by 8008
     —8 bit
     —Both designed for specific
      applications
• 1974 - 8080
     —Intel’s first general purpose
      microprocessor


Yanmin Zhu @ SJTU                         26
Intel processors in 1970s and 1980s




Yanmin Zhu @ SJTU                     27
Intel processors in 1990s and beyond




Yanmin Zhu @ SJTU                  28
Performance Gap between
Processor & Memory
• Processor speed increased
• Memory capacity increased
• Memory speed lags behind processor speed




                                   What is the
                                   frequency of
                                   today’s
                                   memory?




Yanmin Zhu @ SJTU                                 29
Solutions
• Change DRAM interface
     —Cache
• Reduce frequency of memory access
     —More complex cache and cache on chip
• Increase number of bits retrieved at one
  time
     —Make DRAM ―wider‖ rather than ―deeper‖
• Increase interconnection bandwidth
     —High speed buses
     —Hierarchy of buses


Yanmin Zhu @ SJTU                              30
I/O Devices
• Peripherals with intensive I/O demands
     —Large data throughput demands




Yanmin Zhu @ SJTU                          31
Problem and Solutions
• Problem moving data between processor
  and peripherals

• Solutions:
     —Caching
     —Buffering
     —Higher-speed interconnection buses
     —More elaborate bus structures
     —Multiple-processor configurations




Yanmin Zhu @ SJTU                          32
Key is Balance
•   Processor components
•   Main memory
•   I/O devices
•   Interconnection structures


            Balance the throughput and
         processing demands of processor,
                 memory, IO, etc.



Yanmin Zhu @ SJTU                           33
Speeding Microprocessor Up

                      • Clock rate
                      • Logic density
                      • Pipelining
                      • On board L1 & L2
                        cache
                      • Branch prediction
        How to make   • Data flow analysis
        faster        • Speculative
        processors?     execution

Yanmin Zhu @ SJTU                            34
Increase hardware speed of
processor
• Fundamentally due to shrinking
  logic gate size
     —More gates, packed more tightly,
      and increasing clock rate
     —Propagation time for signals
      reduced




Yanmin Zhu @ SJTU                        35
Problem with Clock Speed
• Power
     —Power density increases with density of
      logic and clock speed
     —Dissipating heat
     —Some fundamental physical limits are being
      reached




Yanmin Zhu @ SJTU                                  36
Power Trends




   In CMOS IC technology

        Power  Capacitive load  Voltage2  Frequency

                    ×30               5V → 1V     ×1000
Yanmin Zhu @ SJTU                                         37
Reducing Power
   • Suppose a new CPU has
       —85% of capacitive load of old CPU
       —15% voltage and 15% frequency reduction

    Pnew Cold  0.85  (Vold  0.85)2  Fold  0.85
                                                    0.85 4  0.52
                     Cold  Vold  Fold
                                2
    Pold

      The power wall
          We can’t reduce voltage further
          We can’t remove more heat
      How else can we improve performance?
Yanmin Zhu @ SJTU                                                     38
Problems with Logic Density
• RC delay
  —Speed at which electrons flow limited
   by resistance and capacitance of metal
   wires connecting them
  —Delay increases as RC product
   increases
  —Wire interconnects thinner, increasing
   resistance
  —Wires closer together, increasing
   capacitance

Yanmin Zhu @ SJTU                           39
Solution?




Increase clock rate   Power Wall      More
                                   emphasis on
                                   Organization
                                      al and
                                   Architectural
                                   Approaches
Increase Density      RC Delay




Yanmin Zhu @ SJTU                             40
Techniques developed




Yanmin Zhu @ SJTU   Intel Microprocessor History   41
Increased Cache Capacity
• Typically two or three levels
  of cache between processor
  and main memory
• Chip density increased
     —More cache memory on chip
         – Faster cache access
• Pentium chip devoted about
  10% of chip area to cache
• Pentium 4 devotes about
  50%


Yanmin Zhu @ SJTU                 42
More Complex Execution Logic
• Enable parallel execution of
  instructions
     —Pipelining
     —Superscalar
• Pipeline works like assembly line            Example
     —Different stages of execution of         in your
      different instructions at same time      daily life?
      along pipeline
• Superscalar allows multiple
  pipelines within single processor
     —Instructions that do not depend on one
      another can be executed in parallel
Yanmin Zhu @ SJTU                                       43
Diminishing Returns
• Internal organization of processors
  complex
     —Can get a great deal of parallelism
     —Further significant increases likely to be
      relatively modest


• Benefits from cache are reaching limit




Yanmin Zhu @ SJTU                                  44
Uniprocessor Performance




  Constrained by power, instruction-level parallelism, memory
  latency
New Approach – Multiple Cores
• Multiple processors on single chip
     — Large shared cache
• Within a processor, increase in performance
  proportional to square root of increase in
  complexity
• If software can use multiple processors, doubling
  number of processors almost doubles
  performance
• So, use two simpler processors on the chip
  rather than one more complex processor




Yanmin Zhu @ SJTU                                     46
Multicore Challenges
• Requires explicitly parallel programming
  —Compare with instruction level
    parallelism
         – Hardware executes multiple instructions at
           once
      –Hidden from the programmer
     —Hard to do
         – Programming for performance
         – Load balancing
         – Optimizing communication and
           synchronization

Yanmin Zhu @ SJTU                                       47
CISC vs. RISC
   • CISC: Complex Instruction Set Computer
   • RISC: Reduced Instruction Set Computer




Intel x86                               ARM


          CISC                 RISC
   Kill two each time   Kill one each time
Yanmin Zhu @ SJTU                             48
x86 Evolution (1)
• 8080
   — first general purpose microprocessor
   — 8 bit data path
   — Used in first personal computer – Altair
• 8086 – 5MHz – 29,000 transistors
   — much more powerful
   — 16 bit
   — instruction cache, prefetch few instructions
   — 8088 (8 bit external bus) used in first IBM PC
• 80286
   — 16 Mbyte memory addressable
   — up from 1Mb
• 80386
   — 32 bit
   — Support for multitasking
• 80486
   — sophisticated powerful cache and instruction pipelining
   — built in maths co-processor
Yanmin Zhu @ SJTU                                              49
x86 Evolution (2)
• Pentium
     — Superscalar
     — Multiple instructions executed in parallel
• Pentium Pro
     — Increased superscalar organization
     — Aggressive register renaming
     — branch prediction
     — data flow analysis
     — speculative execution
• Pentium II
     — MMX technology
     — graphics, video & audio processing
• Pentium III
     — Additional floating point instructions for 3D
       graphics

Yanmin Zhu @ SJTU                                      50
x86 Evolution (3)
• Pentium 4
     — Note Arabic rather than Roman
       numerals
     — Further floating point and
       multimedia enhancements
• Core
     — First x86 with dual core
• Core 2
     — 64 bit architecture
• Core 2 Quad – 3GHz – 820 million
  transistors
     — Four processors on chip



Yanmin Zhu @ SJTU                      51
x86 Evolution (4)
• x86 architecture dominant outside
  embedded systems
• Organization and technology changed
  dramatically
• Instruction set architecture evolved with
  backwards compatibility
• ~1 instruction per month added
• 500 instructions available




Yanmin Zhu @ SJTU                             52
Embedded Systems & ARM
• ARM evolved from RISC
  design
• Used mainly in embedded
  systems
     —Used within product
     —Not general purpose computer
     —Dedicated function
     —E.g. Anti-lock brakes in car
      (ABS)




Yanmin Zhu @ SJTU                    53
Embedded System - Definition
A combination of computer hardware and software, and
perhaps additional mechanical or other parts, designed
to perform a dedicated function. In many cases,
embedded systems are part of a larger system or
product, as in the case of an antilock braking system in
a car




Yanmin Zhu @ SJTU                                          54
Embedded Systems Requirements
• Different sizes
     —Different constraints, optimization, reuse
• Different requirements
     —Safety, reliability, real-time, flexibility,
      legislation
     —Lifespan
     —Environmental conditions
     —Static v dynamic loads
     —Slow to fast speeds
     —Computation v I/O intensive
     —Descrete event v continuous dynamics

Yanmin Zhu @ SJTU                                    55
Possible Organization of an Embedded System




Yanmin Zhu @ SJTU                             56
ARM Evolution
• Designed by ARM Inc., Cambridge,
  England
• Licensed to manufacturers
• High speed, small die, low power
  consumption
• PDAs, hand held games, phones
     —E.g. iPod, iPhone, HTC
• Acorn produced ARM1 & ARM2 in 1985
  and ARM3 in 1989
• Acorn, VLSI and Apple Computer founded
  ARM Ltd.
Yanmin Zhu @ SJTU                          57
ARM Systems Categories
• Embedded real time
• Application platform
     —Linux, Palm OS, Symbian OS, Windows mobile
• Secure applications




Yanmin Zhu @ SJTU                                  58
Computer Assessment
• Performance

•   Cost
•   Size
•   Security
•   Reliability
•   Power consumption
•   Cool look



Yanmin Zhu @ SJTU       59
Defining Performance
   • Which airplane has the best performance?

     Boeing 777                                          Boeing 777


     Boeing 747                                          Boeing 747

      BAC/Sud                                              BAC/Sud
      Concorde                                             Concorde
       Douglas                                           Douglas DC-
       DC-8-50                                              8-50

                  0   100     200   300     400    500                 0   2000    4000   6000   8000 10000

                        Passenger Capacity                                   Cruising Range (miles)




     Boeing 777                                          Boeing 777


     Boeing 747                                          Boeing 747

      BAC/Sud                                              BAC/Sud
      Concorde                                             Concorde
       Douglas                                           Douglas DC-
       DC-8-50                                              8-50

                  0         500      1000         1500                 0   100000 200000 300000 400000

                       Cruising Speed (mph)                                       Passengers x mph




Yanmin Zhu @ SJTU                                                                                             60
Response Time and Throughput
• Response time
     —How long it takes to do a task
• Throughput
     —Total work done per unit time
         – e.g., tasks/transactions/… per hour


• How are response time and throughput
  affected by
     —Replacing the processor with a faster version?
     —Adding more processors?


Yanmin Zhu @ SJTU                                      61
Relative Performance
– response time
   • Define Performance = 1/Execution Time
   • ―X is n times faster than Y‖

          Performanc e X Performanc e Y
           Execution time Y Execution time X  n

     Example: time taken to run a program
         10s on A, 15s on B
         Execution TimeB / Execution TimeA
          = 15s / 10s = 1.5
         So A is 1.5 times faster than B

Yanmin Zhu @ SJTU                                   62
Measuring Execution Time
• Elapsed time
     —Total response time, including all aspects
         – Processing, I/O, OS overhead, idle time
     —Determines system performance


• CPU time
     —Time spent processing a given job
         – Discounts I/O time, other jobs’ shares
     —Comprises user CPU time and system CPU
      time
     —Different programs are affected differently by
      CPU and system performance


Yanmin Zhu @ SJTU                                      63
CPU Clocking
   • Operation of digital hardware governed by
     a constant-rate clock
                           Clock period

   Clock (cycles)

   Data transfer
   and computation
   Update state


            Clock period: duration of a clock cycle
                   e.g., 250ps = 0.25ns = 250× –12s
                                               10
            Clock frequency (rate): cycles per second
                   e.g., 4.0GHz = 4000MHz = 4.0× 9Hz
                                                 10
Yanmin Zhu @ SJTU                                        64
System Clock




   System clock cannot be arbitrarily fast
   • Signals in CPU take time to settle down to 1 or 0
   • Signals may change at different speeds
   • Operations need to be synchronised
Yanmin Zhu @ SJTU                                        65
CPU Time

     CPU Time  CPU Clock Cycles  Clock Cycle Time
                      CPU Clock Cycles
                    
                        Clock Rate


   • Performance improved by
       —Reducing number of clock cycles
       —Increasing clock rate
       —Hardware designer must often trade off clock
        rate against cycle count


Yanmin Zhu @ SJTU                                      66
CPU Time Example
   • Computer A: 2GHz clock, 10s CPU time
   • Designing Computer B
       — Aim for 6s CPU time
       — Can do faster clock, but causes 1.2 × clock cycles
   • How fast must Computer B clock be?
                      Clock CyclesB 1.2  Clock Cycles A
       Clock Rate B               
                       CPU Time B            6s
    Clock Cycles A  CPU Time A  Clock Rate A
                     10s  2GHz  20  109
                      1.2  20  109 24  109
       Clock Rate B                          4GHz
                            6s         6s
Yanmin Zhu @ SJTU                                             67
Instruction Count and CPI
 CPI= consumer price index??
Clock Cycles  Instructio n Count  Cycles per Instructio n
    CPU Time  Instructio n Count  CPI  Clock Cycle Time
                      Instructio n Count  CPI
                    
                            Clock Rate
 • Instruction Count for a program
      —Determined by program, ISA and compiler
 • Average cycles per instruction
      —Determined by CPU hardware
      —Different instructions have different CPI
          – Average CPI affected by instruction mix
Yanmin Zhu @ SJTU                                             68
CPI Example
   •   Computer A: Cycle Time = 250ps, CPI = 2.0
   •   Computer B: Cycle Time = 500ps, CPI = 1.2
   •   Same ISA
   •   Which is faster, and by how much?

       CPU Time          Instructio n Count  CPI  Cycle Time
                    A                               A           A
                         I  2.0  250ps  I  500ps        A is faster…
       CPU Time          Instructio n Count  CPI  Cycle Time
                    B                               B           B
                         I  1.2  500ps  I  600ps

                B  I  600ps  1.2
       CPU Time
       CPU Time     I  500ps                              …by this much
                A

Yanmin Zhu @ SJTU                                                     69
CPI in More Detail
   • If different instruction classes take
     different numbers of cycles
                            n
           Clock Cycles   (CPIi  Instructio n Count i )
                            i1


           Weighted average CPI
         Clock Cycles      n
                                     Instructio n Count i 
 CPI                       CPIi                       
       Instructio n Count i1         Instructio n Count 

                                           Relative frequency

Yanmin Zhu @ SJTU                                               70
CPI Example
   • Alternative compiled code sequences using
     instructions in classes A, B, C
            Class                 A          B          C
            CPI for class         1          2          3
            IC in sequence 1      2          1          2
            IC in sequence 2      4          1          1

    Sequence 1: IC = 5              Sequence 2: IC = 6
         Clock Cycles                   Clock Cycles
          = 2× + 1× + 2×
               1    2      3              = 4× + 1× + 1×
                                              1     2      3
          = 10                            =9
         Avg. CPI = 10/5 = 2.0          Avg. CPI = 9/6 = 1.5

Yanmin Zhu @ SJTU                                                71
CPU Time Summary
   The BIG Picture

              Instructio ns Clock cycles   Seconds
   CPU Time                            
                Program      Instructio n Clock cycle

   • Performance depends on
       —Algorithm: affects IC, possibly CPI
       —Programming language: affects IC, CPI
       —Compiler: affects IC, CPI
       —Instruction set architecture: affects IC, CPI,
        Tc

Yanmin Zhu @ SJTU                                       72
Instruction Execution Rate
   Instructions executed per second!


• MIPS: Millions of instructions per second
• MFLOPS: Millions of floating point
  instructions per second


 Heavily dependent on
   instruction set
   compiler design
   processor implementation
   cache & memory hierarchy
Yanmin Zhu @ SJTU                             73
Benchmarks
• Benchmark: Programs designed to test
  performance
     — Written in high level language: Portable
     — Represents style of task: Systems, numerical,
       commercial
     — Easily measured
     — Widely distributed
• E.g. System Performance Evaluation Corporation
  (SPEC)
     — CPU2006 for computation bound
         – 17 floating point programs in C, C++, Fortran
         – 12 integer programs in C, C++
         – 3 million lines of code
     — Speed and rate metrics
         – Single task and throughput


Yanmin Zhu @ SJTU                                          74
SPEC Speed Metric -Single task
• Base runtime defined for each benchmark using
  reference machine
• Results are reported as ratio of reference time to
  system run time
     — Trefi execution time for benchmark i on reference
       machine
     — Tsuti execution time of benchmark i on test system




• Overall performance calculated by averaging
  ratios for all 12 integer benchmarks
    — Use geometric mean: Appropriate for normalized
      numbers such as ratios



Yanmin Zhu @ SJTU                                           75
Amdahl’s Law
• Gene Amdahl [AMDA67]
• Potential speed up of program
  using multiple processors

• Concluded that:
     —Code needs to be parallelizable
     —Speed up is bound, giving
      diminishing returns for more
      processors

Yanmin Zhu @ SJTU                       76
Amdahl’s Law Formula
• For program running on single processor
    — Fraction f of code infinitely parallelizable with no
      scheduling overhead
    — Fraction (1-f) of code inherently serial
    — T is total execution time for program on single
      processor
    — N is number of processors that fully exploit
      parallel portions of code




Yanmin Zhu @ SJTU                                            77
Amdahl’s Law Formula
                  100
                                                                              data1
                    90

                    80

                    70

                    60

                    50

        f/(1-f)     40

                    30

                    20

                    10

                    0
                         0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9      1


                                                     f
   • f small, parallel processors has little effect
   • N ->∞, speedup bound by 1/(1 – f)
        —Diminishing returns for using more processors
Yanmin Zhu @ SJTU                                                                         78
Homework #1
• Page 45-47

• 2.4, 2.9, 2.10, 2.13, 2.14

• Due: class time next Friday (18 March)

• Writing on paper/exercise book (student
  no and name).

• Policy: No late submission is allowed

Yanmin Zhu @ SJTU                           79

More Related Content

Similar to 02 evolution zhu

ch2 -A Computer Evolution and Performance updated.pdf
ch2 -A Computer Evolution and Performance updated.pdfch2 -A Computer Evolution and Performance updated.pdf
ch2 -A Computer Evolution and Performance updated.pdfKhizarKhizar8
 
Computer Evolution.ppt
Computer Evolution.pptComputer Evolution.ppt
Computer Evolution.pptVivekTrial
 
02 computer evolution and performance.ppt [compatibility mode]
02 computer evolution and performance.ppt [compatibility mode]02 computer evolution and performance.ppt [compatibility mode]
02 computer evolution and performance.ppt [compatibility mode]bogi007
 
02 computer evolution and performance
02 computer evolution and performance02 computer evolution and performance
02 computer evolution and performancedilip kumar
 
02_Computer-Evolution(1).ppt
02_Computer-Evolution(1).ppt02_Computer-Evolution(1).ppt
02_Computer-Evolution(1).pptShaistaRiaz4
 
02_Computer-Evolution(1).ppt
02_Computer-Evolution(1).ppt02_Computer-Evolution(1).ppt
02_Computer-Evolution(1).pptShaistaRiaz4
 
02 computer evolution and performance
02 computer evolution and performance02 computer evolution and performance
02 computer evolution and performanceSher Shah Merkhel
 
Unit_1_L1_LPVLSI.ppt
Unit_1_L1_LPVLSI.pptUnit_1_L1_LPVLSI.ppt
Unit_1_L1_LPVLSI.pptRavi Selvaraj
 
Evolution of Computers ppt.pptx
Evolution of Computers ppt.pptxEvolution of Computers ppt.pptx
Evolution of Computers ppt.pptxAnusha sivakumar
 
Intro to Computing Lec 03.pptx
Intro to Computing Lec 03.pptxIntro to Computing Lec 03.pptx
Intro to Computing Lec 03.pptxFalakNiaz15
 
Computer Basics | Computer Fundamental and Organization
Computer Basics | Computer Fundamental and OrganizationComputer Basics | Computer Fundamental and Organization
Computer Basics | Computer Fundamental and OrganizationSmit Luvani
 
Nikhil Rajput generation of computers.pptx
Nikhil Rajput generation of computers.pptxNikhil Rajput generation of computers.pptx
Nikhil Rajput generation of computers.pptxNikhilRajput88
 
ECESLU Microprocessors lecture
ECESLU Microprocessors lecture ECESLU Microprocessors lecture
ECESLU Microprocessors lecture Jeffrey Des Binwag
 
Evolutions of computing
Evolutions of computingEvolutions of computing
Evolutions of computingAgraj Garg
 
VLSI Design-Lecture2 introduction to ic technology
VLSI Design-Lecture2 introduction to ic technologyVLSI Design-Lecture2 introduction to ic technology
VLSI Design-Lecture2 introduction to ic technologysritulasiadigopula
 
Understanding Computer Generations-MaHi.pptx
Understanding Computer Generations-MaHi.pptxUnderstanding Computer Generations-MaHi.pptx
Understanding Computer Generations-MaHi.pptxTafshirul Alam
 
Generation of Computers
Generation of ComputersGeneration of Computers
Generation of ComputersMac Mac
 

Similar to 02 evolution zhu (20)

ch2 -A Computer Evolution and Performance updated.pdf
ch2 -A Computer Evolution and Performance updated.pdfch2 -A Computer Evolution and Performance updated.pdf
ch2 -A Computer Evolution and Performance updated.pdf
 
Computer Evolution.ppt
Computer Evolution.pptComputer Evolution.ppt
Computer Evolution.ppt
 
02 computer evolution and performance.ppt [compatibility mode]
02 computer evolution and performance.ppt [compatibility mode]02 computer evolution and performance.ppt [compatibility mode]
02 computer evolution and performance.ppt [compatibility mode]
 
02 computer evolution and performance
02 computer evolution and performance02 computer evolution and performance
02 computer evolution and performance
 
02_Computer-Evolution(1).ppt
02_Computer-Evolution(1).ppt02_Computer-Evolution(1).ppt
02_Computer-Evolution(1).ppt
 
02_Computer-Evolution(1).ppt
02_Computer-Evolution(1).ppt02_Computer-Evolution(1).ppt
02_Computer-Evolution(1).ppt
 
02 computer evolution and performance
02 computer evolution and performance02 computer evolution and performance
02 computer evolution and performance
 
Unit_1_L1_LPVLSI.ppt
Unit_1_L1_LPVLSI.pptUnit_1_L1_LPVLSI.ppt
Unit_1_L1_LPVLSI.ppt
 
FOC PPT.pptx
FOC PPT.pptxFOC PPT.pptx
FOC PPT.pptx
 
Evolution of Computers ppt.pptx
Evolution of Computers ppt.pptxEvolution of Computers ppt.pptx
Evolution of Computers ppt.pptx
 
Intro to Computing Lec 03.pptx
Intro to Computing Lec 03.pptxIntro to Computing Lec 03.pptx
Intro to Computing Lec 03.pptx
 
Computer Basics | Computer Fundamental and Organization
Computer Basics | Computer Fundamental and OrganizationComputer Basics | Computer Fundamental and Organization
Computer Basics | Computer Fundamental and Organization
 
Nikhil Rajput generation of computers.pptx
Nikhil Rajput generation of computers.pptxNikhil Rajput generation of computers.pptx
Nikhil Rajput generation of computers.pptx
 
marketing s.pptx
marketing s.pptxmarketing s.pptx
marketing s.pptx
 
Chap1 basic
Chap1 basicChap1 basic
Chap1 basic
 
ECESLU Microprocessors lecture
ECESLU Microprocessors lecture ECESLU Microprocessors lecture
ECESLU Microprocessors lecture
 
Evolutions of computing
Evolutions of computingEvolutions of computing
Evolutions of computing
 
VLSI Design-Lecture2 introduction to ic technology
VLSI Design-Lecture2 introduction to ic technologyVLSI Design-Lecture2 introduction to ic technology
VLSI Design-Lecture2 introduction to ic technology
 
Understanding Computer Generations-MaHi.pptx
Understanding Computer Generations-MaHi.pptxUnderstanding Computer Generations-MaHi.pptx
Understanding Computer Generations-MaHi.pptx
 
Generation of Computers
Generation of ComputersGeneration of Computers
Generation of Computers
 

More from c09271

0121 2494-pys-50-11 (1)
0121 2494-pys-50-11 (1)0121 2494-pys-50-11 (1)
0121 2494-pys-50-11 (1)c09271
 
S01.s1 material
S01.s1   materialS01.s1   material
S01.s1 materialc09271
 
jcbenitezp
jcbenitezpjcbenitezp
jcbenitezpc09271
 
Pdi paterno m_lab1
Pdi paterno m_lab1Pdi paterno m_lab1
Pdi paterno m_lab1c09271
 
Ul rc_cap4_capa de red - encaminamiento ruteo
 Ul rc_cap4_capa de red - encaminamiento ruteo Ul rc_cap4_capa de red - encaminamiento ruteo
Ul rc_cap4_capa de red - encaminamiento ruteoc09271
 
Ul rc_cap3_el nivel de red en internet
 Ul rc_cap3_el nivel de red en internet Ul rc_cap3_el nivel de red en internet
Ul rc_cap3_el nivel de red en internetc09271
 
Ul rc_cap2_la capa de red
 Ul rc_cap2_la capa de red Ul rc_cap2_la capa de red
Ul rc_cap2_la capa de redc09271
 
X 4 prospeccion
X 4 prospeccionX 4 prospeccion
X 4 prospeccionc09271
 
Carrier ethernetessentials
Carrier ethernetessentialsCarrier ethernetessentials
Carrier ethernetessentialsc09271
 
Metro ethernet-services
Metro ethernet-servicesMetro ethernet-services
Metro ethernet-servicesc09271
 
Metroethernet redes-y-servicios
Metroethernet redes-y-serviciosMetroethernet redes-y-servicios
Metroethernet redes-y-serviciosc09271
 
Utp pdi_2014-2_sap3 transformaciones básicas a nivel espacial i
 Utp pdi_2014-2_sap3 transformaciones básicas a nivel espacial i Utp pdi_2014-2_sap3 transformaciones básicas a nivel espacial i
Utp pdi_2014-2_sap3 transformaciones básicas a nivel espacial ic09271
 
Ia 2014 2 balotario de la pc1
Ia 2014 2 balotario de la pc1Ia 2014 2 balotario de la pc1
Ia 2014 2 balotario de la pc1c09271
 
9275315981 reduce
9275315981 reduce9275315981 reduce
9275315981 reducec09271
 
Utp sirn_s3_red perceptron
 Utp sirn_s3_red perceptron Utp sirn_s3_red perceptron
Utp sirn_s3_red perceptronc09271
 
Utp 2014-2_pdi_sap2 iluminacion y modos de color
 Utp 2014-2_pdi_sap2 iluminacion y modos de color Utp 2014-2_pdi_sap2 iluminacion y modos de color
Utp 2014-2_pdi_sap2 iluminacion y modos de colorc09271
 
Utp 2014-2_ia_s2_intro a las rna
 Utp 2014-2_ia_s2_intro a las rna  Utp 2014-2_ia_s2_intro a las rna
Utp 2014-2_ia_s2_intro a las rna c09271
 
Utp sirn_s2_rna 2014-2
 Utp sirn_s2_rna 2014-2 Utp sirn_s2_rna 2014-2
Utp sirn_s2_rna 2014-2c09271
 
Utp sirn_s2_rna 2014-2
 Utp sirn_s2_rna 2014-2 Utp sirn_s2_rna 2014-2
Utp sirn_s2_rna 2014-2c09271
 

More from c09271 (20)

0121 2494-pys-50-11 (1)
0121 2494-pys-50-11 (1)0121 2494-pys-50-11 (1)
0121 2494-pys-50-11 (1)
 
S01.s1 material
S01.s1   materialS01.s1   material
S01.s1 material
 
jcbenitezp
jcbenitezpjcbenitezp
jcbenitezp
 
Pdi paterno m_lab1
Pdi paterno m_lab1Pdi paterno m_lab1
Pdi paterno m_lab1
 
Ul rc_cap4_capa de red - encaminamiento ruteo
 Ul rc_cap4_capa de red - encaminamiento ruteo Ul rc_cap4_capa de red - encaminamiento ruteo
Ul rc_cap4_capa de red - encaminamiento ruteo
 
Ul rc_cap3_el nivel de red en internet
 Ul rc_cap3_el nivel de red en internet Ul rc_cap3_el nivel de red en internet
Ul rc_cap3_el nivel de red en internet
 
Ul rc_cap2_la capa de red
 Ul rc_cap2_la capa de red Ul rc_cap2_la capa de red
Ul rc_cap2_la capa de red
 
X 4 prospeccion
X 4 prospeccionX 4 prospeccion
X 4 prospeccion
 
Carrier ethernetessentials
Carrier ethernetessentialsCarrier ethernetessentials
Carrier ethernetessentials
 
64 66
64 6664 66
64 66
 
Metro ethernet-services
Metro ethernet-servicesMetro ethernet-services
Metro ethernet-services
 
Metroethernet redes-y-servicios
Metroethernet redes-y-serviciosMetroethernet redes-y-servicios
Metroethernet redes-y-servicios
 
Utp pdi_2014-2_sap3 transformaciones básicas a nivel espacial i
 Utp pdi_2014-2_sap3 transformaciones básicas a nivel espacial i Utp pdi_2014-2_sap3 transformaciones básicas a nivel espacial i
Utp pdi_2014-2_sap3 transformaciones básicas a nivel espacial i
 
Ia 2014 2 balotario de la pc1
Ia 2014 2 balotario de la pc1Ia 2014 2 balotario de la pc1
Ia 2014 2 balotario de la pc1
 
9275315981 reduce
9275315981 reduce9275315981 reduce
9275315981 reduce
 
Utp sirn_s3_red perceptron
 Utp sirn_s3_red perceptron Utp sirn_s3_red perceptron
Utp sirn_s3_red perceptron
 
Utp 2014-2_pdi_sap2 iluminacion y modos de color
 Utp 2014-2_pdi_sap2 iluminacion y modos de color Utp 2014-2_pdi_sap2 iluminacion y modos de color
Utp 2014-2_pdi_sap2 iluminacion y modos de color
 
Utp 2014-2_ia_s2_intro a las rna
 Utp 2014-2_ia_s2_intro a las rna  Utp 2014-2_ia_s2_intro a las rna
Utp 2014-2_ia_s2_intro a las rna
 
Utp sirn_s2_rna 2014-2
 Utp sirn_s2_rna 2014-2 Utp sirn_s2_rna 2014-2
Utp sirn_s2_rna 2014-2
 
Utp sirn_s2_rna 2014-2
 Utp sirn_s2_rna 2014-2 Utp sirn_s2_rna 2014-2
Utp sirn_s2_rna 2014-2
 

Recently uploaded

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

02 evolution zhu

  • 1. Computer Organization and Architecture Computer Evolution and Performance Yanmin Zhu yzhu@cs.sjtu.edu.cn Yanmin Zhu @ SJTU 1
  • 2. The First Computer? • Abacus 算盘 • Invented by ancient Chinese • The 2nd century BC Yanmin Zhu @ SJTU 2
  • 3. ENIAC - background • Electronic Numerical Computer • Eckert and Mauchly • University of Pennsylvania • Trajectory tables for weapons • Started 1943 • Finished 1946 —Too late for war effort • Used until 1955 John Mauchly (1907-1980) & J. Presper Eckert (1919- Yanmin Zhu @ SJTU 1995) 3
  • 4. ENIAC - details • Decimal (not binary) • 20 accumulators of 10 digits • Programmed manually by switches • 18,000 vacuum tubes • 30 tons • 15,000 square feet • 140 kW power consumption • 5,000 additions per second Yanmin Zhu @ SJTU 4
  • 5. von Neumann/Turing • Stored Program concept • Main memory storing programs Father of computers and data Von Neumann, 1903-1957 • ALU operating on binary data • Control unit interpreting instructions from memory and executing • Input and output equipment Alan Turing, 1912- operated by control unit 1954 • Princeton Institute for Advanced Studies —IAS computer • Completed 1952 Yanmin Zhu @ SJTU IAS computer 5
  • 6. Structure of von Neumann machine Yanmin Zhu @ SJTU 6
  • 7. IAS - details • 1000 x 40 bit words —Binary number —2 x 20 bit instructions Yanmin Zhu @ SJTU 7
  • 8. Structure of IAS – detail • Memory Buffer Register • Memory Address Register • Instruction Register • Instruction Buffer Register • Program Counter • Accumulator • Multiplier Quotient
  • 9. Commercial Computers • 1947 - Eckert-Mauchly Computer Corporation • UNIVAC I (Universal Automatic Computer) • US Bureau of Census —1950 calculations • Became part of Sperry-Rand Corporation • Late 1950s - UNIVAC II —Faster —More memory Yanmin Zhu @ SJTU 9
  • 10. IBM • Punched-card processing equipment • 1953 - the 701 —IBM’s first stored program computer —Scientific calculations • 1955 - the 702 —Business applications • Lead to 700/7000 series Yanmin Zhu @ SJTU IBM 701 at Douglas Aircraft. 10
  • 11. Transistors – the 2nd generation • Replaced vacuum tubes • Smaller • Cheaper • Less heat dissipation • Solid State device • Made from Silicon (Sand) • Invented 1947 at Bell Labs • William Shockley et al. Yanmin Zhu @ SJTU 11
  • 12. Transistor Based Computers • Second generation machines • NCR & RCA produced small transistor machines IBM 7000 • IBM —Produced IBM 7000 • DEC - 1957 —Produced PDP-1 Yanmin Zhu @ SJTU PDP-1 12
  • 13. Microelectronics – 3rd Generation • Literally - ―small electronics‖ • A computer is made up of gates, memory cells and interconnections • These can be manufactured on a semiconductor • e.g. silicon wafer Yanmin Zhu @ SJTU 13
  • 14. Manufacturing ICs • Yield: proportion of working dies per wafer Yanmin Zhu @ SJTU 14
  • 15. AMD Opteron X2 Wafer • X2: 300mm wafer, 117 chips, 90nm technology Yanmin Zhu @ SJTU 15
  • 16. Generations of Computer • Vacuum tube - 1946-1957 • Transistor - 1958-1964 • Small scale integration - 1965 on —Up to 100 devices on a chip • Medium scale integration - to 1971 —100-3,000 devices on a chip • Large scale integration - 1971-1977 —3,000 - 100,000 devices on a chip • Very large scale integration - 1978 -1991 —100,000 - 100,000,000 devices on a chip • Ultra large scale integration – 1991 - —Over 100,000,000 devices on a chip Yanmin Zhu @ SJTU 16
  • 18. Moore’s Law • Increased density of components on chip • Since 1970’s development has slowed a little — Number of transistors doubles every 18 months • Cost of a chip has Gordon Moore remained almost – co-founder of unchanged Intel — That means? Number of transistors on a chip will double every year Yanmin Zhu @ SJTU 18
  • 19. Growth in CPU Transistor Count
  • 20. Consequences of Moore’s Law • Higher packing density means shorter electrical paths, giving higher performance • Smaller size gives increased flexibility • Reduced power and cooling requirements • Fewer interconnections increases reliability Yanmin Zhu @ SJTU 20
  • 21. IBM 360 series • 1964 • Replaced (& not compatible with) 7000 series • First planned ―family‖ of computers — Similar or identical instruction sets — Similar or identical O/S — Increasing speed — Increasing number of I/O ports (i.e. more terminals) This series offered a standard — Increased memory size interface and a wide choice of — Increased cost • Multiplexed switch standard peripheral devices. structure The standard interface made it easy to change the peripherals when the customer's needs changed. Yanmin Zhu @ SJTU 21
  • 22. Example members of System/360 Yanmin Zhu @ SJTU 22
  • 23. DEC PDP-8 • 1964 • First minicomputer (after miniskirt!) —Did not need air conditioned room —Small enough to sit on a lab bench • $16,000 —$100k+ for IBM 360 • Embedded applications & OEM • BUS STRUCTURE Yanmin Zhu @ SJTU 23
  • 24. Spectrum of Computers Supercomputer Mainframe Minicomputer Desktop Server Laptop Yanmin Zhu @ SJTU 24
  • 25. Central Control vs Bus Multiplexed switch structure BUS STRUCTURE Yanmin Zhu @ SJTU 25
  • 26. Intel Later generations • 1971 - 4004 —First microprocessor (All CPU components on a single chip) —4 bit • Followed in 1972 by 8008 —8 bit —Both designed for specific applications • 1974 - 8080 —Intel’s first general purpose microprocessor Yanmin Zhu @ SJTU 26
  • 27. Intel processors in 1970s and 1980s Yanmin Zhu @ SJTU 27
  • 28. Intel processors in 1990s and beyond Yanmin Zhu @ SJTU 28
  • 29. Performance Gap between Processor & Memory • Processor speed increased • Memory capacity increased • Memory speed lags behind processor speed What is the frequency of today’s memory? Yanmin Zhu @ SJTU 29
  • 30. Solutions • Change DRAM interface —Cache • Reduce frequency of memory access —More complex cache and cache on chip • Increase number of bits retrieved at one time —Make DRAM ―wider‖ rather than ―deeper‖ • Increase interconnection bandwidth —High speed buses —Hierarchy of buses Yanmin Zhu @ SJTU 30
  • 31. I/O Devices • Peripherals with intensive I/O demands —Large data throughput demands Yanmin Zhu @ SJTU 31
  • 32. Problem and Solutions • Problem moving data between processor and peripherals • Solutions: —Caching —Buffering —Higher-speed interconnection buses —More elaborate bus structures —Multiple-processor configurations Yanmin Zhu @ SJTU 32
  • 33. Key is Balance • Processor components • Main memory • I/O devices • Interconnection structures Balance the throughput and processing demands of processor, memory, IO, etc. Yanmin Zhu @ SJTU 33
  • 34. Speeding Microprocessor Up • Clock rate • Logic density • Pipelining • On board L1 & L2 cache • Branch prediction How to make • Data flow analysis faster • Speculative processors? execution Yanmin Zhu @ SJTU 34
  • 35. Increase hardware speed of processor • Fundamentally due to shrinking logic gate size —More gates, packed more tightly, and increasing clock rate —Propagation time for signals reduced Yanmin Zhu @ SJTU 35
  • 36. Problem with Clock Speed • Power —Power density increases with density of logic and clock speed —Dissipating heat —Some fundamental physical limits are being reached Yanmin Zhu @ SJTU 36
  • 37. Power Trends In CMOS IC technology Power  Capacitive load  Voltage2  Frequency ×30 5V → 1V ×1000 Yanmin Zhu @ SJTU 37
  • 38. Reducing Power • Suppose a new CPU has —85% of capacitive load of old CPU —15% voltage and 15% frequency reduction Pnew Cold  0.85  (Vold  0.85)2  Fold  0.85   0.85 4  0.52 Cold  Vold  Fold 2 Pold  The power wall  We can’t reduce voltage further  We can’t remove more heat  How else can we improve performance? Yanmin Zhu @ SJTU 38
  • 39. Problems with Logic Density • RC delay —Speed at which electrons flow limited by resistance and capacitance of metal wires connecting them —Delay increases as RC product increases —Wire interconnects thinner, increasing resistance —Wires closer together, increasing capacitance Yanmin Zhu @ SJTU 39
  • 40. Solution? Increase clock rate Power Wall More emphasis on Organization al and Architectural Approaches Increase Density RC Delay Yanmin Zhu @ SJTU 40
  • 41. Techniques developed Yanmin Zhu @ SJTU Intel Microprocessor History 41
  • 42. Increased Cache Capacity • Typically two or three levels of cache between processor and main memory • Chip density increased —More cache memory on chip – Faster cache access • Pentium chip devoted about 10% of chip area to cache • Pentium 4 devotes about 50% Yanmin Zhu @ SJTU 42
  • 43. More Complex Execution Logic • Enable parallel execution of instructions —Pipelining —Superscalar • Pipeline works like assembly line Example —Different stages of execution of in your different instructions at same time daily life? along pipeline • Superscalar allows multiple pipelines within single processor —Instructions that do not depend on one another can be executed in parallel Yanmin Zhu @ SJTU 43
  • 44. Diminishing Returns • Internal organization of processors complex —Can get a great deal of parallelism —Further significant increases likely to be relatively modest • Benefits from cache are reaching limit Yanmin Zhu @ SJTU 44
  • 45. Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency
  • 46. New Approach – Multiple Cores • Multiple processors on single chip — Large shared cache • Within a processor, increase in performance proportional to square root of increase in complexity • If software can use multiple processors, doubling number of processors almost doubles performance • So, use two simpler processors on the chip rather than one more complex processor Yanmin Zhu @ SJTU 46
  • 47. Multicore Challenges • Requires explicitly parallel programming —Compare with instruction level parallelism – Hardware executes multiple instructions at once –Hidden from the programmer —Hard to do – Programming for performance – Load balancing – Optimizing communication and synchronization Yanmin Zhu @ SJTU 47
  • 48. CISC vs. RISC • CISC: Complex Instruction Set Computer • RISC: Reduced Instruction Set Computer Intel x86 ARM CISC RISC Kill two each time Kill one each time Yanmin Zhu @ SJTU 48
  • 49. x86 Evolution (1) • 8080 — first general purpose microprocessor — 8 bit data path — Used in first personal computer – Altair • 8086 – 5MHz – 29,000 transistors — much more powerful — 16 bit — instruction cache, prefetch few instructions — 8088 (8 bit external bus) used in first IBM PC • 80286 — 16 Mbyte memory addressable — up from 1Mb • 80386 — 32 bit — Support for multitasking • 80486 — sophisticated powerful cache and instruction pipelining — built in maths co-processor Yanmin Zhu @ SJTU 49
  • 50. x86 Evolution (2) • Pentium — Superscalar — Multiple instructions executed in parallel • Pentium Pro — Increased superscalar organization — Aggressive register renaming — branch prediction — data flow analysis — speculative execution • Pentium II — MMX technology — graphics, video & audio processing • Pentium III — Additional floating point instructions for 3D graphics Yanmin Zhu @ SJTU 50
  • 51. x86 Evolution (3) • Pentium 4 — Note Arabic rather than Roman numerals — Further floating point and multimedia enhancements • Core — First x86 with dual core • Core 2 — 64 bit architecture • Core 2 Quad – 3GHz – 820 million transistors — Four processors on chip Yanmin Zhu @ SJTU 51
  • 52. x86 Evolution (4) • x86 architecture dominant outside embedded systems • Organization and technology changed dramatically • Instruction set architecture evolved with backwards compatibility • ~1 instruction per month added • 500 instructions available Yanmin Zhu @ SJTU 52
  • 53. Embedded Systems & ARM • ARM evolved from RISC design • Used mainly in embedded systems —Used within product —Not general purpose computer —Dedicated function —E.g. Anti-lock brakes in car (ABS) Yanmin Zhu @ SJTU 53
  • 54. Embedded System - Definition A combination of computer hardware and software, and perhaps additional mechanical or other parts, designed to perform a dedicated function. In many cases, embedded systems are part of a larger system or product, as in the case of an antilock braking system in a car Yanmin Zhu @ SJTU 54
  • 55. Embedded Systems Requirements • Different sizes —Different constraints, optimization, reuse • Different requirements —Safety, reliability, real-time, flexibility, legislation —Lifespan —Environmental conditions —Static v dynamic loads —Slow to fast speeds —Computation v I/O intensive —Descrete event v continuous dynamics Yanmin Zhu @ SJTU 55
  • 56. Possible Organization of an Embedded System Yanmin Zhu @ SJTU 56
  • 57. ARM Evolution • Designed by ARM Inc., Cambridge, England • Licensed to manufacturers • High speed, small die, low power consumption • PDAs, hand held games, phones —E.g. iPod, iPhone, HTC • Acorn produced ARM1 & ARM2 in 1985 and ARM3 in 1989 • Acorn, VLSI and Apple Computer founded ARM Ltd. Yanmin Zhu @ SJTU 57
  • 58. ARM Systems Categories • Embedded real time • Application platform —Linux, Palm OS, Symbian OS, Windows mobile • Secure applications Yanmin Zhu @ SJTU 58
  • 59. Computer Assessment • Performance • Cost • Size • Security • Reliability • Power consumption • Cool look Yanmin Zhu @ SJTU 59
  • 60. Defining Performance • Which airplane has the best performance? Boeing 777 Boeing 777 Boeing 747 Boeing 747 BAC/Sud BAC/Sud Concorde Concorde Douglas Douglas DC- DC-8-50 8-50 0 100 200 300 400 500 0 2000 4000 6000 8000 10000 Passenger Capacity Cruising Range (miles) Boeing 777 Boeing 777 Boeing 747 Boeing 747 BAC/Sud BAC/Sud Concorde Concorde Douglas Douglas DC- DC-8-50 8-50 0 500 1000 1500 0 100000 200000 300000 400000 Cruising Speed (mph) Passengers x mph Yanmin Zhu @ SJTU 60
  • 61. Response Time and Throughput • Response time —How long it takes to do a task • Throughput —Total work done per unit time – e.g., tasks/transactions/… per hour • How are response time and throughput affected by —Replacing the processor with a faster version? —Adding more processors? Yanmin Zhu @ SJTU 61
  • 62. Relative Performance – response time • Define Performance = 1/Execution Time • ―X is n times faster than Y‖ Performanc e X Performanc e Y  Execution time Y Execution time X  n  Example: time taken to run a program  10s on A, 15s on B  Execution TimeB / Execution TimeA = 15s / 10s = 1.5  So A is 1.5 times faster than B Yanmin Zhu @ SJTU 62
  • 63. Measuring Execution Time • Elapsed time —Total response time, including all aspects – Processing, I/O, OS overhead, idle time —Determines system performance • CPU time —Time spent processing a given job – Discounts I/O time, other jobs’ shares —Comprises user CPU time and system CPU time —Different programs are affected differently by CPU and system performance Yanmin Zhu @ SJTU 63
  • 64. CPU Clocking • Operation of digital hardware governed by a constant-rate clock Clock period Clock (cycles) Data transfer and computation Update state  Clock period: duration of a clock cycle  e.g., 250ps = 0.25ns = 250× –12s 10  Clock frequency (rate): cycles per second  e.g., 4.0GHz = 4000MHz = 4.0× 9Hz 10 Yanmin Zhu @ SJTU 64
  • 65. System Clock System clock cannot be arbitrarily fast • Signals in CPU take time to settle down to 1 or 0 • Signals may change at different speeds • Operations need to be synchronised Yanmin Zhu @ SJTU 65
  • 66. CPU Time CPU Time  CPU Clock Cycles  Clock Cycle Time CPU Clock Cycles  Clock Rate • Performance improved by —Reducing number of clock cycles —Increasing clock rate —Hardware designer must often trade off clock rate against cycle count Yanmin Zhu @ SJTU 66
  • 67. CPU Time Example • Computer A: 2GHz clock, 10s CPU time • Designing Computer B — Aim for 6s CPU time — Can do faster clock, but causes 1.2 × clock cycles • How fast must Computer B clock be? Clock CyclesB 1.2  Clock Cycles A Clock Rate B   CPU Time B 6s Clock Cycles A  CPU Time A  Clock Rate A  10s  2GHz  20  109 1.2  20  109 24  109 Clock Rate B    4GHz 6s 6s Yanmin Zhu @ SJTU 67
  • 68. Instruction Count and CPI CPI= consumer price index?? Clock Cycles  Instructio n Count  Cycles per Instructio n CPU Time  Instructio n Count  CPI  Clock Cycle Time Instructio n Count  CPI  Clock Rate • Instruction Count for a program —Determined by program, ISA and compiler • Average cycles per instruction —Determined by CPU hardware —Different instructions have different CPI – Average CPI affected by instruction mix Yanmin Zhu @ SJTU 68
  • 69. CPI Example • Computer A: Cycle Time = 250ps, CPI = 2.0 • Computer B: Cycle Time = 500ps, CPI = 1.2 • Same ISA • Which is faster, and by how much? CPU Time  Instructio n Count  CPI  Cycle Time A A A  I  2.0  250ps  I  500ps A is faster… CPU Time  Instructio n Count  CPI  Cycle Time B B B  I  1.2  500ps  I  600ps B  I  600ps  1.2 CPU Time CPU Time I  500ps …by this much A Yanmin Zhu @ SJTU 69
  • 70. CPI in More Detail • If different instruction classes take different numbers of cycles n Clock Cycles   (CPIi  Instructio n Count i ) i1  Weighted average CPI Clock Cycles n  Instructio n Count i  CPI     CPIi   Instructio n Count i1  Instructio n Count  Relative frequency Yanmin Zhu @ SJTU 70
  • 71. CPI Example • Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1  Sequence 1: IC = 5  Sequence 2: IC = 6  Clock Cycles  Clock Cycles = 2× + 1× + 2× 1 2 3 = 4× + 1× + 1× 1 2 3 = 10 =9  Avg. CPI = 10/5 = 2.0  Avg. CPI = 9/6 = 1.5 Yanmin Zhu @ SJTU 71
  • 72. CPU Time Summary The BIG Picture Instructio ns Clock cycles Seconds CPU Time    Program Instructio n Clock cycle • Performance depends on —Algorithm: affects IC, possibly CPI —Programming language: affects IC, CPI —Compiler: affects IC, CPI —Instruction set architecture: affects IC, CPI, Tc Yanmin Zhu @ SJTU 72
  • 73. Instruction Execution Rate Instructions executed per second! • MIPS: Millions of instructions per second • MFLOPS: Millions of floating point instructions per second Heavily dependent on instruction set compiler design processor implementation cache & memory hierarchy Yanmin Zhu @ SJTU 73
  • 74. Benchmarks • Benchmark: Programs designed to test performance — Written in high level language: Portable — Represents style of task: Systems, numerical, commercial — Easily measured — Widely distributed • E.g. System Performance Evaluation Corporation (SPEC) — CPU2006 for computation bound – 17 floating point programs in C, C++, Fortran – 12 integer programs in C, C++ – 3 million lines of code — Speed and rate metrics – Single task and throughput Yanmin Zhu @ SJTU 74
  • 75. SPEC Speed Metric -Single task • Base runtime defined for each benchmark using reference machine • Results are reported as ratio of reference time to system run time — Trefi execution time for benchmark i on reference machine — Tsuti execution time of benchmark i on test system • Overall performance calculated by averaging ratios for all 12 integer benchmarks — Use geometric mean: Appropriate for normalized numbers such as ratios Yanmin Zhu @ SJTU 75
  • 76. Amdahl’s Law • Gene Amdahl [AMDA67] • Potential speed up of program using multiple processors • Concluded that: —Code needs to be parallelizable —Speed up is bound, giving diminishing returns for more processors Yanmin Zhu @ SJTU 76
  • 77. Amdahl’s Law Formula • For program running on single processor — Fraction f of code infinitely parallelizable with no scheduling overhead — Fraction (1-f) of code inherently serial — T is total execution time for program on single processor — N is number of processors that fully exploit parallel portions of code Yanmin Zhu @ SJTU 77
  • 78. Amdahl’s Law Formula 100 data1 90 80 70 60 50 f/(1-f) 40 30 20 10 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 f • f small, parallel processors has little effect • N ->∞, speedup bound by 1/(1 – f) —Diminishing returns for using more processors Yanmin Zhu @ SJTU 78
  • 79. Homework #1 • Page 45-47 • 2.4, 2.9, 2.10, 2.13, 2.14 • Due: class time next Friday (18 March) • Writing on paper/exercise book (student no and name). • Policy: No late submission is allowed Yanmin Zhu @ SJTU 79