SlideShare uma empresa Scribd logo
1 de 29
William Stallings
Computer Organization
and Architecture
8th Edition


Chapter 18
Multicore Computers
Hardware Performance Issues
• Microprocessors have seen an exponential
  increase in performance
  —Improved organization
  —Increased clock frequency
• Increase in Parallelism
  —Pipelining
  —Superscalar
  —Simultaneous multithreading (SMT)
• Diminishing returns
  —More complexity requires more logic
  —Increasing chip area for coordinating and
   signal transfer logic
     – Harder to design, make and debug
Alternative Chip
Organizations
Intel Hardware
Trends
Increased Complexity
• Power requirements grow exponentially with chip
  density and clock frequency
   — Can use more chip area for cache
      – Smaller
      – Order of magnitude lower power requirements
• By 2015
   — 100 billion transistors on 300mm2 die
      – Cache of 100MB
      – 1 billion transistors for logic
• Pollack’s rule:
   — Performance is roughly proportional to square root of
     increase in complexity
      – Double complexity gives 40% more performance
• Multicore has potential for near-linear
  improvement
• Unlikely that one core can use all cache
  effectively
Power and Memory Considerations
Chip Utilization of Transistors
Software Performance Issues
• Performance benefits dependent on
  effective exploitation of parallel resources
• Even small amounts of serial code impact
  performance
  —10% inherently serial on 8 processor system
   gives only 4.7 times performance
• Communication, distribution of work and
  cache coherence overheads
• Some applications effectively exploit
  multicore processors
Effective Applications for Multicore Processors
• Database
• Servers handling independent transactions
• Multi-threaded native applications
   — Lotus Domino, Siebel CRM
• Multi-process applications
   — Oracle, SAP, PeopleSoft
• Java applications
   — Java VM is multi-thread with scheduling and memory
     management
   — Sun’s Java Application Server, BEA’s Weblogic, IBM
     Websphere, Tomcat
• Multi-instance applications
   — One application running multiple times
• E.g. Value Game Software
Multicore Organization
•   Number of core processors on chip
•   Number of levels of cache on chip
•   Amount of shared cache
•   Next slide examples of each organization:
•   (a) ARM11 MPCore
•   (b) AMD Opteron
•   (c) Intel Core Duo
•   (d) Intel Core i7
Multicore Organization Alternatives
Advantages of shared L2 Cache
• Constructive interference reduces overall miss
  rate
• Data shared by multiple cores not replicated at
  cache level
• With proper frame replacement algorithms mean
  amount of shared cache dedicated to each core is
  dynamic
  — Threads with less locality can have more cache
• Easy inter-process communication through
  shared memory
• Cache coherency confined to L1
• Dedicated L2 cache gives each core more rapid
  access
  — Good for threads with strong locality
• Shared L3 cache may also improve performance
Individual Core Architecture
• Intel Core Duo uses superscalar cores
• Intel Core i7 uses simultaneous multi-
  threading (SMT)
  —Scales up number of threads supported
     – 4 SMT cores, each supporting 4 threads appears as
       16 core
Intel x86 Multicore Organization -
Core Duo (1)
• 2006
• Two x86 superscalar, shared L2 cache
• Dedicated L1 cache per core
   —32KB instruction and 32KB data
• Thermal control unit per core
   —Manages chip heat dissipation
   —Maximize performance within constraints
   —Improved ergonomics
• Advanced Programmable Interrupt
  Controlled (APIC)
   —Inter-process interrupts between cores
   —Routes interrupts to appropriate core
   —Includes timer so OS can interrupt core
Intel x86 Multicore Organization -
Core Duo (2)
• Power Management Logic
   —Monitors thermal conditions and CPU activity
   —Adjusts voltage and power consumption
   —Can switch individual logic subsystems
• 2MB shared L2 cache
   —Dynamic allocation
   —MESI support for L1 caches
   —Extended to support multiple Core Duo in SMP
     – L2 data shared between local cores or external
• Bus interface
Intel x86 Multicore Organization -
Core i7
•   November 2008
•   Four x86 SMT processors
•   Dedicated L2, shared L3 cache
•   Speculative pre-fetch for caches
•   On chip DDR3 memory controller
    — Three 8 byte channels (192 bits) giving 32GB/s
    — No front side bus
• QuickPath Interconnection
    — Cache coherent point-to-point link
    — High speed communications between processor chips
    — 6.4G transfers per second, 16 bits per transfer
    — Dedicated bi-directional pairs
    — Total bandwidth 25.6GB/s
ARM11 MPCore
• Up to 4 processors each with own L1 instruction and data
  cache
• Distributed interrupt controller
• Timer per CPU
• Watchdog
   — Warning alerts for software failures
   — Counts down from predetermined values
   — Issues warning at zero
• CPU interface
   — Interrupt acknowledgement, masking and completion
     acknowledgement
• CPU
   — Single ARM11 called MP11
• Vector floating-point unit
   — FP co-processor
• L1 cache
• Snoop control unit
   — L1 cache coherency
ARM11
MPCore
Block
Diagram
ARM11 MPCore Interrupt Handling
• Distributed Interrupt Controller (DIC) collates
  from many sources
• Masking
• Prioritization
• Distribution to target MP11 CPUs
• Status tracking
• Software interrupt generation
• Number of interrupts independent of MP11 CPU
  design
• Memory mapped
• Accessed by CPUs via private interface through
  SCU
• Can route interrupts to single or multiple CPUs
• Provides inter-process communication
  — Thread on one CPU can cause activity by thread on
    another CPU
DIC Routing
•   Direct to specific CPU
•   To defined group of CPUs
•   To all CPUs
•   OS can generate interrupt to:
    —All but self
    —Self
    —Other specific CPU
• Typically combined with shared memory
  for inter-process communication
• 16 interrupt ids available for inter-process
  communication
Interrupt States
• Inactive
  —Non-asserted
  —Completed by that CPU but pending or active
   in others
• Pending
  —Asserted
  —Processing not started on that CPU
• Active
  —Started on that CPU but not complete
  —Can be pre-empted by higher priority interrupt
Interrupt Sources
• Inter-process Interrupts (IPI)
   — Private to CPU
   — ID0-ID15
   — Software triggered
   — Priority depends on target CPU not source
• Private timer and/or watchdog interrupt
   — ID29 and ID30
• Legacy FIQ line
   — Legacy FIQ pin, per CPU, bypasses interrupt distributor
   — Directly drives interrupts to CPU
• Hardware
   — Triggered by programmable events on associated
     interrupt lines
   — Up to 224 lines
   — Start at ID32
ARM11 MPCore Interrupt Distributor
Cache Coherency
• Snoop Control Unit (SCU) resolves most shared
  data bottleneck issues
• L1 cache coherency based on MESI
• Direct data Intervention
  — Copying clean entries between L1 caches without
    accessing external memory
  — Reduces read after write from L1 to L2
  — Can resolve local L1 miss from rmote L1 rather than L2
• Duplicated tag RAMs
  — Cache tags implemented as separate block of RAM
  — Same length as number of lines in cache
  — Duplicates used by SCU to check data availability before
    sending coherency commands
  — Only send to CPUs that must update coherent data
    cache
• Migratory lines
  — Allows moving dirty data between CPUs without writing
    to L2 and reading back from external memory
Recommended Reading
• Stallings chapter 18
• ARM web site
Intel Core i& Block Diagram
Intel Core Duo Block Diagram
Performance Effect of Multiple Cores
Recommended Reading
• Multicore Association web site
• ARM web site

Mais conteúdo relacionado

Mais procurados

Advanced Processor Power Point Presentation
Advanced Processor  Power Point  PresentationAdvanced Processor  Power Point  Presentation
Advanced Processor Power Point PresentationPrashantYadav931011
 
Interleaved memory
Interleaved memoryInterleaved memory
Interleaved memoryashishgy
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel ProgrammingUday Sharma
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING Zena Abo-Altaheen
 
CPU Scheduling algorithms
CPU Scheduling algorithmsCPU Scheduling algorithms
CPU Scheduling algorithmsShanu Kumar
 
Operating System Lecture Notes
Operating System Lecture NotesOperating System Lecture Notes
Operating System Lecture NotesFellowBuddy.com
 
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...NECST Lab @ Politecnico di Milano
 
A New Golden Age for Computer Architecture
A New Golden Age for Computer ArchitectureA New Golden Age for Computer Architecture
A New Golden Age for Computer ArchitectureYanbin Kong
 

Mais procurados (20)

Advanced Processor Power Point Presentation
Advanced Processor  Power Point  PresentationAdvanced Processor  Power Point  Presentation
Advanced Processor Power Point Presentation
 
My ppt hpc u4
My ppt hpc u4My ppt hpc u4
My ppt hpc u4
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
Scope of parallelism
Scope of parallelismScope of parallelism
Scope of parallelism
 
Os Threads
Os ThreadsOs Threads
Os Threads
 
RTOS - Real Time Operating Systems
RTOS - Real Time Operating SystemsRTOS - Real Time Operating Systems
RTOS - Real Time Operating Systems
 
Interleaved memory
Interleaved memoryInterleaved memory
Interleaved memory
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
 
Nehalem (microarchitecture)
Nehalem (microarchitecture)Nehalem (microarchitecture)
Nehalem (microarchitecture)
 
Cache memory
Cache memoryCache memory
Cache memory
 
Cache Memory- JMD.pptx
Cache Memory- JMD.pptxCache Memory- JMD.pptx
Cache Memory- JMD.pptx
 
Storage Management
Storage ManagementStorage Management
Storage Management
 
Rtos Concepts
Rtos ConceptsRtos Concepts
Rtos Concepts
 
CPU Scheduling algorithms
CPU Scheduling algorithmsCPU Scheduling algorithms
CPU Scheduling algorithms
 
Ch4 memory management
Ch4 memory managementCh4 memory management
Ch4 memory management
 
Operating System Lecture Notes
Operating System Lecture NotesOperating System Lecture Notes
Operating System Lecture Notes
 
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
DEEP-mon: Dynamic and Energy Efficient Power monitoring for container-based i...
 
A New Golden Age for Computer Architecture
A New Golden Age for Computer ArchitectureA New Golden Age for Computer Architecture
A New Golden Age for Computer Architecture
 

Semelhante a chap 18 multicore computers

Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptxMuhammad54342
 
BIL406-Chapter-2-Classifications of Parallel Systems.ppt
BIL406-Chapter-2-Classifications of Parallel Systems.pptBIL406-Chapter-2-Classifications of Parallel Systems.ppt
BIL406-Chapter-2-Classifications of Parallel Systems.pptKadri20
 
Mces MOD 1.pptx
Mces MOD 1.pptxMces MOD 1.pptx
Mces MOD 1.pptxRadhaC10
 
Intel® hyper threading technology
Intel® hyper threading technologyIntel® hyper threading technology
Intel® hyper threading technologyAmirali Sharifian
 
Introduction to embedded System.pptx
Introduction to embedded System.pptxIntroduction to embedded System.pptx
Introduction to embedded System.pptxPratik Gohel
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architecturesYoung Alista
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Fazli Amin
 
18 parallel processing
18 parallel processing18 parallel processing
18 parallel processingdilip kumar
 
5_Embedded Systems مختصر.pdf
5_Embedded Systems  مختصر.pdf5_Embedded Systems  مختصر.pdf
5_Embedded Systems مختصر.pdfaliamjd
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.pptAberaZeleke1
 
Computer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerComputer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerAmrutaMehata
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD) Ali Raza
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD) Ali Raza
 
Embedded systems 101 final
Embedded systems 101 finalEmbedded systems 101 final
Embedded systems 101 finalKhalid Elmeadawy
 

Semelhante a chap 18 multicore computers (20)

Multiprocessor.pptx
 Multiprocessor.pptx Multiprocessor.pptx
Multiprocessor.pptx
 
CPU Caches
CPU CachesCPU Caches
CPU Caches
 
BIL406-Chapter-2-Classifications of Parallel Systems.ppt
BIL406-Chapter-2-Classifications of Parallel Systems.pptBIL406-Chapter-2-Classifications of Parallel Systems.ppt
BIL406-Chapter-2-Classifications of Parallel Systems.ppt
 
Mces MOD 1.pptx
Mces MOD 1.pptxMces MOD 1.pptx
Mces MOD 1.pptx
 
Intel® hyper threading technology
Intel® hyper threading technologyIntel® hyper threading technology
Intel® hyper threading technology
 
Introduction to embedded System.pptx
Introduction to embedded System.pptxIntroduction to embedded System.pptx
Introduction to embedded System.pptx
 
Motivation for multithreaded architectures
Motivation for multithreaded architecturesMotivation for multithreaded architectures
Motivation for multithreaded architectures
 
parallel-processing.ppt
parallel-processing.pptparallel-processing.ppt
parallel-processing.ppt
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)
 
18 parallel processing
18 parallel processing18 parallel processing
18 parallel processing
 
5_Embedded Systems مختصر.pdf
5_Embedded Systems  مختصر.pdf5_Embedded Systems  مختصر.pdf
5_Embedded Systems مختصر.pdf
 
Architecture of high end processors
Architecture of high end processorsArchitecture of high end processors
Architecture of high end processors
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
 
Computer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerComputer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and Microcontroller
 
Dsp ajal
Dsp  ajalDsp  ajal
Dsp ajal
 
Cpu
CpuCpu
Cpu
 
OS-Part-01.pdf
OS-Part-01.pdfOS-Part-01.pdf
OS-Part-01.pdf
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
 
Parallel Processors (SIMD)
Parallel Processors (SIMD) Parallel Processors (SIMD)
Parallel Processors (SIMD)
 
Embedded systems 101 final
Embedded systems 101 finalEmbedded systems 101 final
Embedded systems 101 final
 

Mais de Sher Shah Merkhel

Mais de Sher Shah Merkhel (13)

13 risc
13 risc13 risc
13 risc
 
12 processor structure and function
12 processor structure and function12 processor structure and function
12 processor structure and function
 
11 instruction sets addressing modes
11  instruction sets addressing modes 11  instruction sets addressing modes
11 instruction sets addressing modes
 
10 instruction sets characteristics
10 instruction sets characteristics10 instruction sets characteristics
10 instruction sets characteristics
 
09 arithmetic 2
09 arithmetic 209 arithmetic 2
09 arithmetic 2
 
09 arithmetic
09 arithmetic09 arithmetic
09 arithmetic
 
08 operating system support
08 operating system support08 operating system support
08 operating system support
 
07 input output
07 input output07 input output
07 input output
 
06 external memory
06 external memory06 external memory
06 external memory
 
04 cache memory
04 cache memory04 cache memory
04 cache memory
 
03 top level view of computer function and interconnection
03 top level view of computer function and interconnection03 top level view of computer function and interconnection
03 top level view of computer function and interconnection
 
02 computer evolution and performance
02 computer evolution and performance02 computer evolution and performance
02 computer evolution and performance
 
01 introduction
01 introduction01 introduction
01 introduction
 

Último

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 

Último (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 

chap 18 multicore computers

  • 1. William Stallings Computer Organization and Architecture 8th Edition Chapter 18 Multicore Computers
  • 2. Hardware Performance Issues • Microprocessors have seen an exponential increase in performance —Improved organization —Increased clock frequency • Increase in Parallelism —Pipelining —Superscalar —Simultaneous multithreading (SMT) • Diminishing returns —More complexity requires more logic —Increasing chip area for coordinating and signal transfer logic – Harder to design, make and debug
  • 5. Increased Complexity • Power requirements grow exponentially with chip density and clock frequency — Can use more chip area for cache – Smaller – Order of magnitude lower power requirements • By 2015 — 100 billion transistors on 300mm2 die – Cache of 100MB – 1 billion transistors for logic • Pollack’s rule: — Performance is roughly proportional to square root of increase in complexity – Double complexity gives 40% more performance • Multicore has potential for near-linear improvement • Unlikely that one core can use all cache effectively
  • 6. Power and Memory Considerations
  • 7. Chip Utilization of Transistors
  • 8. Software Performance Issues • Performance benefits dependent on effective exploitation of parallel resources • Even small amounts of serial code impact performance —10% inherently serial on 8 processor system gives only 4.7 times performance • Communication, distribution of work and cache coherence overheads • Some applications effectively exploit multicore processors
  • 9. Effective Applications for Multicore Processors • Database • Servers handling independent transactions • Multi-threaded native applications — Lotus Domino, Siebel CRM • Multi-process applications — Oracle, SAP, PeopleSoft • Java applications — Java VM is multi-thread with scheduling and memory management — Sun’s Java Application Server, BEA’s Weblogic, IBM Websphere, Tomcat • Multi-instance applications — One application running multiple times • E.g. Value Game Software
  • 10. Multicore Organization • Number of core processors on chip • Number of levels of cache on chip • Amount of shared cache • Next slide examples of each organization: • (a) ARM11 MPCore • (b) AMD Opteron • (c) Intel Core Duo • (d) Intel Core i7
  • 12. Advantages of shared L2 Cache • Constructive interference reduces overall miss rate • Data shared by multiple cores not replicated at cache level • With proper frame replacement algorithms mean amount of shared cache dedicated to each core is dynamic — Threads with less locality can have more cache • Easy inter-process communication through shared memory • Cache coherency confined to L1 • Dedicated L2 cache gives each core more rapid access — Good for threads with strong locality • Shared L3 cache may also improve performance
  • 13. Individual Core Architecture • Intel Core Duo uses superscalar cores • Intel Core i7 uses simultaneous multi- threading (SMT) —Scales up number of threads supported – 4 SMT cores, each supporting 4 threads appears as 16 core
  • 14. Intel x86 Multicore Organization - Core Duo (1) • 2006 • Two x86 superscalar, shared L2 cache • Dedicated L1 cache per core —32KB instruction and 32KB data • Thermal control unit per core —Manages chip heat dissipation —Maximize performance within constraints —Improved ergonomics • Advanced Programmable Interrupt Controlled (APIC) —Inter-process interrupts between cores —Routes interrupts to appropriate core —Includes timer so OS can interrupt core
  • 15. Intel x86 Multicore Organization - Core Duo (2) • Power Management Logic —Monitors thermal conditions and CPU activity —Adjusts voltage and power consumption —Can switch individual logic subsystems • 2MB shared L2 cache —Dynamic allocation —MESI support for L1 caches —Extended to support multiple Core Duo in SMP – L2 data shared between local cores or external • Bus interface
  • 16. Intel x86 Multicore Organization - Core i7 • November 2008 • Four x86 SMT processors • Dedicated L2, shared L3 cache • Speculative pre-fetch for caches • On chip DDR3 memory controller — Three 8 byte channels (192 bits) giving 32GB/s — No front side bus • QuickPath Interconnection — Cache coherent point-to-point link — High speed communications between processor chips — 6.4G transfers per second, 16 bits per transfer — Dedicated bi-directional pairs — Total bandwidth 25.6GB/s
  • 17. ARM11 MPCore • Up to 4 processors each with own L1 instruction and data cache • Distributed interrupt controller • Timer per CPU • Watchdog — Warning alerts for software failures — Counts down from predetermined values — Issues warning at zero • CPU interface — Interrupt acknowledgement, masking and completion acknowledgement • CPU — Single ARM11 called MP11 • Vector floating-point unit — FP co-processor • L1 cache • Snoop control unit — L1 cache coherency
  • 19. ARM11 MPCore Interrupt Handling • Distributed Interrupt Controller (DIC) collates from many sources • Masking • Prioritization • Distribution to target MP11 CPUs • Status tracking • Software interrupt generation • Number of interrupts independent of MP11 CPU design • Memory mapped • Accessed by CPUs via private interface through SCU • Can route interrupts to single or multiple CPUs • Provides inter-process communication — Thread on one CPU can cause activity by thread on another CPU
  • 20. DIC Routing • Direct to specific CPU • To defined group of CPUs • To all CPUs • OS can generate interrupt to: —All but self —Self —Other specific CPU • Typically combined with shared memory for inter-process communication • 16 interrupt ids available for inter-process communication
  • 21. Interrupt States • Inactive —Non-asserted —Completed by that CPU but pending or active in others • Pending —Asserted —Processing not started on that CPU • Active —Started on that CPU but not complete —Can be pre-empted by higher priority interrupt
  • 22. Interrupt Sources • Inter-process Interrupts (IPI) — Private to CPU — ID0-ID15 — Software triggered — Priority depends on target CPU not source • Private timer and/or watchdog interrupt — ID29 and ID30 • Legacy FIQ line — Legacy FIQ pin, per CPU, bypasses interrupt distributor — Directly drives interrupts to CPU • Hardware — Triggered by programmable events on associated interrupt lines — Up to 224 lines — Start at ID32
  • 23. ARM11 MPCore Interrupt Distributor
  • 24. Cache Coherency • Snoop Control Unit (SCU) resolves most shared data bottleneck issues • L1 cache coherency based on MESI • Direct data Intervention — Copying clean entries between L1 caches without accessing external memory — Reduces read after write from L1 to L2 — Can resolve local L1 miss from rmote L1 rather than L2 • Duplicated tag RAMs — Cache tags implemented as separate block of RAM — Same length as number of lines in cache — Duplicates used by SCU to check data availability before sending coherency commands — Only send to CPUs that must update coherent data cache • Migratory lines — Allows moving dirty data between CPUs without writing to L2 and reading back from external memory
  • 25. Recommended Reading • Stallings chapter 18 • ARM web site
  • 26. Intel Core i& Block Diagram
  • 27. Intel Core Duo Block Diagram
  • 28. Performance Effect of Multiple Cores
  • 29. Recommended Reading • Multicore Association web site • ARM web site