SlideShare uma empresa Scribd logo
1 de 67
Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007
Single-core computer
Single-core CPU chip the single core
Multi-core architectures ,[object Object],Core 1 Core 2 Core 3 Core 4 Multi-core CPU chip
Multi-core CPU chip ,[object Object],[object Object],core 1 core 2 core 3 core 4
The cores run in parallel core 1 core 2 core 3 core 4 thread 1 thread 2 thread 3 thread 4
Within each core, threads are time-sliced (just like on a uniprocessor) core 1 core 2 core 3 core 4 several  threads several  threads several  threads several  threads
Interaction with the Operating System ,[object Object],[object Object],[object Object]
Why multi-core ? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Instruction-level parallelism ,[object Object],[object Object],[object Object]
Thread-level parallelism (TLP) ,[object Object],[object Object],[object Object],[object Object],[object Object]
General context: Multiprocessors ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Lemieux cluster, Pittsburgh  supercomputing  center
Multiprocessor memory types ,[object Object],[object Object]
Multi-core processor is a special kind of a multiprocessor: All processors are on the same chip ,[object Object],[object Object]
What applications benefit  from multi-core? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Each can run on its own core
More examples ,[object Object],[object Object],[object Object],[object Object]
A technique complementary to multi-core: Simultaneous multithreading   ,[object Object],[object Object],[object Object],[object Object],Source: Intel BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus
Simultaneous multithreading (SMT) ,[object Object],[object Object],[object Object]
Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point
Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 2: integer operation
SMT processor: both threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point Thread 2: integer operation
But: Can’t simultaneously use  the same functional unit BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2 This scenario is impossible with SMT on a single core (assuming a single integer unit) IMPOSSIBLE
SMT not a “true” parallel processor ,[object Object],[object Object],[object Object],[object Object]
Multi-core:  threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus Thread 1 Thread 2
Multi-core:  threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus Thread 3 Thread 4
Combining Multi-core and SMT ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SMT Dual-core: all four threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode  ROM BTB L2 Cache and Control Bus Thread 1 Thread 3 Thread 2 Thread 4
Comparison: multi-core vs SMT ,[object Object]
Comparison: multi-core vs SMT ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The memory hierarchy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
“Fish” machines ,[object Object],[object Object],[object Object],[object Object],memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 hyper-threads
Designs with private L2 caches memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache Both L1 and L2 are private Examples: AMD Opteron,  AMD Athlon, Intel Pentium D L3 cache L3 cache A design with L3 caches Example: Intel Itanium 2
Private vs shared caches? ,[object Object]
Private vs shared caches ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The cache coherence problem ,[object Object],[object Object]
The cache coherence problem Suppose variable x initially contains 15213 One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 1 reads x One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 2 reads x One or more  levels of  cache x=15213 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 1 writes to x, setting it to 21660 One or more  levels of  cache x=21660 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip assuming  write-through  caches Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 2 attempts to read x… gets a stale copy One or more  levels of  cache x=21660 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
Solutions for cache coherence ,[object Object],[object Object],[object Object]
Inter-core bus One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache Main memory multi-core chip inter-core bus Core 1 Core 2 Core 3 Core 4
Invalidation protocol with snooping ,[object Object],[object Object]
The cache coherence problem Revisited: Cores 1 and 2 have both read x One or more  levels of  cache x=15213 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 1 writes to x, setting it to 21660 One or more  levels of  cache x=21660 One or more  levels of  cache x=15213 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip assuming  write-through  caches INVALIDATED sends invalidation request inter-core bus Core 1 Core 2 Core 3 Core 4
The cache coherence problem After invalidation: One or more  levels of  cache x=21660 One or more  levels of  cache One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
The cache coherence problem Core 2 reads x. Cache misses,   and loads the new copy. One or more  levels of  cache x=21660 One or more  levels of  cache x=21660 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
Alternative to invalidate protocol: update protocol Core 1 writes x=21660: One or more  levels of  cache x=21660 One or more  levels of  cache x= 21660 One or more  levels of  cache One or more  levels of  cache Main memory x=21660 multi-core chip assuming  write-through  caches UPDATED broadcasts updated value inter-core bus Core 1 Core 2 Core 3 Core 4
Which do you think is better? Invalidation or update?
Invalidation vs update ,[object Object],[object Object],[object Object],[object Object]
Invalidation protocols ,[object Object],[object Object],[object Object]
Programming for multi-core ,[object Object],[object Object],[object Object],[object Object]
Thread safety very important ,[object Object],[object Object],[object Object]
However: Need to use synchronization even if only time-slicing on a uniprocessor ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Need to use synchronization even if only time-slicing on a uniprocessor ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],gives counter=2 gives counter=1
Assigning threads to the cores ,[object Object],[object Object],[object Object],[object Object]
Affinity masks are bit vectors ,[object Object],1 0 1 1 core 3 core 2 core 1 core 0 ,[object Object]
Affinity masks when multi-core and SMT combined ,[object Object],[object Object],1 core 3 core 2 core 1 core 0 1 0 0 1 0 1 1 thread 1 ,[object Object],[object Object],thread 0 thread 1 thread 0 thread 1 thread 0 thread 1 thread 0
Default Affinities ,[object Object],[object Object],[object Object]
Process migration is costly ,[object Object],[object Object],[object Object],[object Object]
Hard affinities ,[object Object],[object Object]
When to set your own affinities ,[object Object],[object Object],[object Object],Source: Sensable.com
Kernel scheduler API ,[object Object],[object Object],[object Object],[object Object]
Kernel scheduler API ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Windows Task Manager core 2 core 1
Legal licensing issues ,[object Object],[object Object]
Conclusion ,[object Object],[object Object],[object Object]

Mais conteúdo relacionado

Mais procurados

Chapter 9 - Virtual Memory
Chapter 9 - Virtual MemoryChapter 9 - Virtual Memory
Chapter 9 - Virtual MemoryWayne Jones Jnr
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platformsSyed Zaid Irshad
 
Chapter 12 - Mass Storage Systems
Chapter 12 - Mass Storage SystemsChapter 12 - Mass Storage Systems
Chapter 12 - Mass Storage SystemsWayne Jones Jnr
 
Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi coremukul bhardwaj
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUBNusrat Mary
 
Pipelining
PipeliningPipelining
PipeliningAmin Omi
 
OS Memory Management
OS Memory ManagementOS Memory Management
OS Memory Managementanand hd
 
0 introduction to computer architecture
0 introduction to computer architecture0 introduction to computer architecture
0 introduction to computer architectureaamc1100
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Deepak Shankar
 
Instruction Set Architecture
Instruction Set ArchitectureInstruction Set Architecture
Instruction Set ArchitectureJaffer Haadi
 

Mais procurados (20)

Pcie basic
Pcie basicPcie basic
Pcie basic
 
PCI express
PCI expressPCI express
PCI express
 
Chapter 9 - Virtual Memory
Chapter 9 - Virtual MemoryChapter 9 - Virtual Memory
Chapter 9 - Virtual Memory
 
Parallel processing Concepts
Parallel processing ConceptsParallel processing Concepts
Parallel processing Concepts
 
Physical organization of parallel platforms
Physical organization of parallel platformsPhysical organization of parallel platforms
Physical organization of parallel platforms
 
Chapter 12 - Mass Storage Systems
Chapter 12 - Mass Storage SystemsChapter 12 - Mass Storage Systems
Chapter 12 - Mass Storage Systems
 
13. multiprocessing
13. multiprocessing13. multiprocessing
13. multiprocessing
 
Introduction to multi core
Introduction to multi coreIntroduction to multi core
Introduction to multi core
 
5 Process Scheduling
5 Process Scheduling5 Process Scheduling
5 Process Scheduling
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
 
Pipelining
PipeliningPipelining
Pipelining
 
Cache memory
Cache memoryCache memory
Cache memory
 
OS Memory Management
OS Memory ManagementOS Memory Management
OS Memory Management
 
0 introduction to computer architecture
0 introduction to computer architecture0 introduction to computer architecture
0 introduction to computer architecture
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
 
Instruction Set Architecture
Instruction Set ArchitectureInstruction Set Architecture
Instruction Set Architecture
 
Multicore Processors
Multicore ProcessorsMulticore Processors
Multicore Processors
 
Cuda
CudaCuda
Cuda
 
Multiprocessor
MultiprocessorMultiprocessor
Multiprocessor
 
Multi core processors
Multi core processorsMulti core processors
Multi core processors
 

Destaque

IBM z/OS V2R2 Networking Technologies Update
IBM z/OS V2R2 Networking Technologies UpdateIBM z/OS V2R2 Networking Technologies Update
IBM z/OS V2R2 Networking Technologies UpdateAnderson Bassani
 
Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFOfer Rosenberg
 
Embedded Solutions 2010: Intel Multicore by Eastronics
Embedded Solutions 2010:  Intel Multicore by Eastronics Embedded Solutions 2010:  Intel Multicore by Eastronics
Embedded Solutions 2010: Intel Multicore by Eastronics New-Tech Magazine
 
IBM z/OS V2R2 Performance and Availability Topics
IBM z/OS V2R2 Performance and Availability TopicsIBM z/OS V2R2 Performance and Availability Topics
IBM z/OS V2R2 Performance and Availability TopicsAnderson Bassani
 
Cache & CPU performance
Cache & CPU performanceCache & CPU performance
Cache & CPU performanceso61pi
 
可靠分布式系统基础 Paxos的直观解释
可靠分布式系统基础 Paxos的直观解释可靠分布式系统基础 Paxos的直观解释
可靠分布式系统基础 Paxos的直观解释Yanpo Zhang
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesTanel Poder
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF SuperpowersBrendan Gregg
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf toolsBrendan Gregg
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at NetflixBrendan Gregg
 
Computex 2014 AMD Press Conference
Computex 2014 AMD Press ConferenceComputex 2014 AMD Press Conference
Computex 2014 AMD Press ConferenceAMD
 
AMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureAMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureLow Hong Chuan
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and ToolsBrendan Gregg
 
KVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStackKVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStackBoden Russell
 

Destaque (20)

IBM z/OS V2R2 Networking Technologies Update
IBM z/OS V2R2 Networking Technologies UpdateIBM z/OS V2R2 Networking Technologies Update
IBM z/OS V2R2 Networking Technologies Update
 
Intel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOFIntel's Presentation in SIGGRAPH OpenCL BOF
Intel's Presentation in SIGGRAPH OpenCL BOF
 
Ludden q3 2008_boston
Ludden q3 2008_bostonLudden q3 2008_boston
Ludden q3 2008_boston
 
Embedded Solutions 2010: Intel Multicore by Eastronics
Embedded Solutions 2010:  Intel Multicore by Eastronics Embedded Solutions 2010:  Intel Multicore by Eastronics
Embedded Solutions 2010: Intel Multicore by Eastronics
 
IBM z/OS V2R2 Performance and Availability Topics
IBM z/OS V2R2 Performance and Availability TopicsIBM z/OS V2R2 Performance and Availability Topics
IBM z/OS V2R2 Performance and Availability Topics
 
z/OS V2R2 Enhancements
z/OS V2R2 Enhancementsz/OS V2R2 Enhancements
z/OS V2R2 Enhancements
 
Multicore computers
Multicore computersMulticore computers
Multicore computers
 
Cache & CPU performance
Cache & CPU performanceCache & CPU performance
Cache & CPU performance
 
可靠分布式系统基础 Paxos的直观解释
可靠分布式系统基础 Paxos的直观解释可靠分布式系统基础 Paxos的直观解释
可靠分布式系统基础 Paxos的直观解释
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
SMP/Multithread
SMP/MultithreadSMP/Multithread
SMP/Multithread
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Velocity 2015 linux perf tools
Velocity 2015 linux perf toolsVelocity 2015 linux perf tools
Velocity 2015 linux perf tools
 
Linux Profiling at Netflix
Linux Profiling at NetflixLinux Profiling at Netflix
Linux Profiling at Netflix
 
Computex 2014 AMD Press Conference
Computex 2014 AMD Press ConferenceComputex 2014 AMD Press Conference
Computex 2014 AMD Press Conference
 
AMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores ArchitectureAMD Ryzen CPU Zen Cores Architecture
AMD Ryzen CPU Zen Cores Architecture
 
Linux Performance Analysis and Tools
Linux Performance Analysis and ToolsLinux Performance Analysis and Tools
Linux Performance Analysis and Tools
 
KVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStackKVM and docker LXC Benchmarking with OpenStack
KVM and docker LXC Benchmarking with OpenStack
 

Semelhante a Multi-core architectures

multi-core Processor.ppt for IGCSE ICT and Computer Science Students
multi-core Processor.ppt for IGCSE ICT and Computer Science Studentsmulti-core Processor.ppt for IGCSE ICT and Computer Science Students
multi-core Processor.ppt for IGCSE ICT and Computer Science StudentsMKKhaing
 
Osa-multi-core.ppt
Osa-multi-core.pptOsa-multi-core.ppt
Osa-multi-core.pptSrikumarTB
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningRenaldas Zioma
 
Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiAnkit Raj
 
Processors and its Types
Processors and its TypesProcessors and its Types
Processors and its TypesNimrah Shahbaz
 
Multi-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureMulti-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureUmair Amjad
 
5.6 Basic computer structure microprocessors
5.6 Basic computer structure   microprocessors5.6 Basic computer structure   microprocessors
5.6 Basic computer structure microprocessorslpapadop
 
CA presentation of multicore processor
CA presentation of multicore processorCA presentation of multicore processor
CA presentation of multicore processorZeeshan Aslam
 
Lecture 4.pptx
Lecture 4.pptxLecture 4.pptx
Lecture 4.pptxinfomerlin
 
Intro To .Net Threads
Intro To .Net ThreadsIntro To .Net Threads
Intro To .Net Threadsrchakra
 
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architectureFinal draft intel core i5 processors architecture
Final draft intel core i5 processors architectureJawid Ahmad Baktash
 
fundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdffundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdfshubhangisonawane6
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.pptAberaZeleke1
 
Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2Anshul Sharma
 

Semelhante a Multi-core architectures (20)

27 multicore
27 multicore27 multicore
27 multicore
 
27 multicore
27 multicore27 multicore
27 multicore
 
multi-core Processor.ppt for IGCSE ICT and Computer Science Students
multi-core Processor.ppt for IGCSE ICT and Computer Science Studentsmulti-core Processor.ppt for IGCSE ICT and Computer Science Students
multi-core Processor.ppt for IGCSE ICT and Computer Science Students
 
Osa-multi-core.ppt
Osa-multi-core.pptOsa-multi-core.ppt
Osa-multi-core.ppt
 
Trip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine LearningTrip down the GPU lane with Machine Learning
Trip down the GPU lane with Machine Learning
 
Multicore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash PrajapatiMulticore processor by Ankit Raj and Akash Prajapati
Multicore processor by Ankit Raj and Akash Prajapati
 
Processors and its Types
Processors and its TypesProcessors and its Types
Processors and its Types
 
Multi-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architectureMulti-core processor and Multi-channel memory architecture
Multi-core processor and Multi-channel memory architecture
 
5.6 Basic computer structure microprocessors
5.6 Basic computer structure   microprocessors5.6 Basic computer structure   microprocessors
5.6 Basic computer structure microprocessors
 
Multi-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IKMulti-Core on Chip Architecture *doc - IK
Multi-Core on Chip Architecture *doc - IK
 
CA presentation of multicore processor
CA presentation of multicore processorCA presentation of multicore processor
CA presentation of multicore processor
 
Memory Mapping Cache
Memory Mapping CacheMemory Mapping Cache
Memory Mapping Cache
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
Lecture 4.pptx
Lecture 4.pptxLecture 4.pptx
Lecture 4.pptx
 
Intro To .Net Threads
Intro To .Net ThreadsIntro To .Net Threads
Intro To .Net Threads
 
Final draft intel core i5 processors architecture
Final draft intel core i5 processors architectureFinal draft intel core i5 processors architecture
Final draft intel core i5 processors architecture
 
fundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdffundamentals of digital communication Unit 5_microprocessor.pdf
fundamentals of digital communication Unit 5_microprocessor.pdf
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2
 

Mais de nextlib

Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Archnextlib
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conferencenextlib
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New Worldnextlib
 
Social Graph
Social GraphSocial Graph
Social Graphnextlib
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Predictionnextlib
 
Closures for Java
Closures for JavaClosures for Java
Closures for Javanextlib
 
A Content-Driven Reputation System for the Wikipedia
A Content-Driven Reputation System for the WikipediaA Content-Driven Reputation System for the Wikipedia
A Content-Driven Reputation System for the Wikipedianextlib
 
SVD review
SVD reviewSVD review
SVD reviewnextlib
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlersnextlib
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategynextlib
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學nextlib
 
Comparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering SystemsComparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering Systemsnextlib
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007nextlib
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Designnextlib
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲nextlib
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...nextlib
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Bigtable
BigtableBigtable
Bigtablenextlib
 

Mais de nextlib (20)

Nio
NioNio
Nio
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conference
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New World
 
Social Graph
Social GraphSocial Graph
Social Graph
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Prediction
 
Closures for Java
Closures for JavaClosures for Java
Closures for Java
 
A Content-Driven Reputation System for the Wikipedia
A Content-Driven Reputation System for the WikipediaA Content-Driven Reputation System for the Wikipedia
A Content-Driven Reputation System for the Wikipedia
 
SVD review
SVD reviewSVD review
SVD review
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlers
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategy
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學
 
Comparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering SystemsComparing State-of-the-Art Collaborative Filtering Systems
Comparing State-of-the-Art Collaborative Filtering Systems
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Design
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Bigtable
BigtableBigtable
Bigtable
 

Último

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Último (20)

A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Multi-core architectures

  • 1. Multi-core architectures Jernej Barbic 15-213, Spring 2007 May 3, 2007
  • 3. Single-core CPU chip the single core
  • 4.
  • 5.
  • 6. The cores run in parallel core 1 core 2 core 3 core 4 thread 1 thread 2 thread 3 thread 4
  • 7. Within each core, threads are time-sliced (just like on a uniprocessor) core 1 core 2 core 3 core 4 several threads several threads several threads several threads
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19. Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point
  • 20. Without SMT, only a single thread can run at any given time BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 2: integer operation
  • 21. SMT processor: both threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1: floating point Thread 2: integer operation
  • 22. But: Can’t simultaneously use the same functional unit BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2 This scenario is impossible with SMT on a single core (assuming a single integer unit) IMPOSSIBLE
  • 23.
  • 24. Multi-core: threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 2
  • 25. Multi-core: threads can run on separate cores BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 3 Thread 4
  • 26.
  • 27. SMT Dual-core: all four threads can run concurrently BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus BTB and I-TLB Decoder Trace Cache Rename/Alloc Uop queues Schedulers Integer Floating Point L1 D-Cache D-TLB uCode ROM BTB L2 Cache and Control Bus Thread 1 Thread 3 Thread 2 Thread 4
  • 28.
  • 29.
  • 30.
  • 31.
  • 32. Designs with private L2 caches memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache memory L2 cache L1 cache L1 cache C O R E 1 C O R E 0 L2 cache Both L1 and L2 are private Examples: AMD Opteron, AMD Athlon, Intel Pentium D L3 cache L3 cache A design with L3 caches Example: Intel Itanium 2
  • 33.
  • 34.
  • 35.
  • 36. The cache coherence problem Suppose variable x initially contains 15213 One or more levels of cache One or more levels of cache One or more levels of cache One or more levels of cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 37. The cache coherence problem Core 1 reads x One or more levels of cache x=15213 One or more levels of cache One or more levels of cache One or more levels of cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 38. The cache coherence problem Core 2 reads x One or more levels of cache x=15213 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 39. The cache coherence problem Core 1 writes to x, setting it to 21660 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches Core 1 Core 2 Core 3 Core 4
  • 40. The cache coherence problem Core 2 attempts to read x… gets a stale copy One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 41.
  • 42. Inter-core bus One or more levels of cache One or more levels of cache One or more levels of cache One or more levels of cache Main memory multi-core chip inter-core bus Core 1 Core 2 Core 3 Core 4
  • 43.
  • 44. The cache coherence problem Revisited: Cores 1 and 2 have both read x One or more levels of cache x=15213 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=15213 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 45. The cache coherence problem Core 1 writes to x, setting it to 21660 One or more levels of cache x=21660 One or more levels of cache x=15213 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches INVALIDATED sends invalidation request inter-core bus Core 1 Core 2 Core 3 Core 4
  • 46. The cache coherence problem After invalidation: One or more levels of cache x=21660 One or more levels of cache One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 47. The cache coherence problem Core 2 reads x. Cache misses, and loads the new copy. One or more levels of cache x=21660 One or more levels of cache x=21660 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip Core 1 Core 2 Core 3 Core 4
  • 48. Alternative to invalidate protocol: update protocol Core 1 writes x=21660: One or more levels of cache x=21660 One or more levels of cache x= 21660 One or more levels of cache One or more levels of cache Main memory x=21660 multi-core chip assuming write-through caches UPDATED broadcasts updated value inter-core bus Core 1 Core 2 Core 3 Core 4
  • 49. Which do you think is better? Invalidation or update?
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65. Windows Task Manager core 2 core 1
  • 66.
  • 67.