Enviar pesquisa
Carregar
Mpmc
•
0 gostou
•
3,133 visualizações
Akshay Nagpurkar
Seguir
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 153
Baixar agora
Baixar para ler offline
Recomendados
MPMC Microprocessor
MPMC Microprocessor
A.S. Krishna
Virtual instrumentation
Virtual instrumentation
gomathy S
Microprocessor Motorola 68020
Microprocessor Motorola 68020
Aniket Raj
Microprocessors - 80386DX
Microprocessors - 80386DX
PriyaDYP
CHAPTER1.2-1.pptx
CHAPTER1.2-1.pptx
SadamoTaga
Intel+80286
Intel+80286
Dhanwantari Sali
Pentium processor
Pentium processor
Pranjali Deshmukh
486 or 80486 DX Architecture
486 or 80486 DX Architecture
Muthusamy Arumugam
Recomendados
MPMC Microprocessor
MPMC Microprocessor
A.S. Krishna
Virtual instrumentation
Virtual instrumentation
gomathy S
Microprocessor Motorola 68020
Microprocessor Motorola 68020
Aniket Raj
Microprocessors - 80386DX
Microprocessors - 80386DX
PriyaDYP
CHAPTER1.2-1.pptx
CHAPTER1.2-1.pptx
SadamoTaga
Intel+80286
Intel+80286
Dhanwantari Sali
Pentium processor
Pentium processor
Pranjali Deshmukh
486 or 80486 DX Architecture
486 or 80486 DX Architecture
Muthusamy Arumugam
Pentium
Pentium
Akshay Nagpurkar
80486 and pentium
80486 and pentium
Vikshit Ganjoo
Pentinum 2
Pentinum 2
Prateek Pandey
Comparison of pentium processor with 80386 and 80486
Comparison of pentium processor with 80386 and 80486
Tech_MX
Pentium (80586) Microprocessor By Er. Swapnil Kaware
Pentium (80586) Microprocessor By Er. Swapnil Kaware
Prof. Swapnil V. Kaware
80486
80486
Jeanie Delos Arcos
Register of 80386
Register of 80386
aviban
Intel 80486 Microprocessor
Intel 80486 Microprocessor
Darpan Dekivadiya
Advanced microprocessor
Advanced microprocessor
Shehrevar Davierwala
Evolution of microprocessors and 80486 Microprocessor.
Evolution of microprocessors and 80486 Microprocessor.
Ritwik MG
80286 microprocessor
80286 microprocessor
Avin Mathew
Computer architecture the pentium architecture
Computer architecture the pentium architecture
Mazin Alwaaly
Architecture of pentium family
Architecture of pentium family
University of Gujrat, Pakistan
Pentium 8086 Instruction Format
Pentium 8086 Instruction Format
Anush H.P Bharadhwaj
Al2ed chapter3
Al2ed chapter3
Abdullelah Al-Fahad
Introduction to Microprocessors
Introduction to Microprocessors
Seble Nigussie
Chapter 1 microprocessor introduction
Chapter 1 microprocessor introduction
Shubham Singh
Module 4 advanced microprocessors
Module 4 advanced microprocessors
Deepak John
mpmc (Microprocessor and microcontroller) notes
mpmc (Microprocessor and microcontroller) notes
Nexus
Unit 5
Unit 5
Saurabh Yadav
Microprocessor and Microcontroller Lab Manual
Microprocessor and Microcontroller Lab Manual
Santhosh Kumar
MPMC Unit-1
MPMC Unit-1
A.S. Krishna
Mais conteúdo relacionado
Mais procurados
Pentium
Pentium
Akshay Nagpurkar
80486 and pentium
80486 and pentium
Vikshit Ganjoo
Pentinum 2
Pentinum 2
Prateek Pandey
Comparison of pentium processor with 80386 and 80486
Comparison of pentium processor with 80386 and 80486
Tech_MX
Pentium (80586) Microprocessor By Er. Swapnil Kaware
Pentium (80586) Microprocessor By Er. Swapnil Kaware
Prof. Swapnil V. Kaware
80486
80486
Jeanie Delos Arcos
Register of 80386
Register of 80386
aviban
Intel 80486 Microprocessor
Intel 80486 Microprocessor
Darpan Dekivadiya
Advanced microprocessor
Advanced microprocessor
Shehrevar Davierwala
Evolution of microprocessors and 80486 Microprocessor.
Evolution of microprocessors and 80486 Microprocessor.
Ritwik MG
80286 microprocessor
80286 microprocessor
Avin Mathew
Computer architecture the pentium architecture
Computer architecture the pentium architecture
Mazin Alwaaly
Architecture of pentium family
Architecture of pentium family
University of Gujrat, Pakistan
Pentium 8086 Instruction Format
Pentium 8086 Instruction Format
Anush H.P Bharadhwaj
Al2ed chapter3
Al2ed chapter3
Abdullelah Al-Fahad
Introduction to Microprocessors
Introduction to Microprocessors
Seble Nigussie
Chapter 1 microprocessor introduction
Chapter 1 microprocessor introduction
Shubham Singh
Module 4 advanced microprocessors
Module 4 advanced microprocessors
Deepak John
Mais procurados
(18)
Pentium
Pentium
80486 and pentium
80486 and pentium
Pentinum 2
Pentinum 2
Comparison of pentium processor with 80386 and 80486
Comparison of pentium processor with 80386 and 80486
Pentium (80586) Microprocessor By Er. Swapnil Kaware
Pentium (80586) Microprocessor By Er. Swapnil Kaware
80486
80486
Register of 80386
Register of 80386
Intel 80486 Microprocessor
Intel 80486 Microprocessor
Advanced microprocessor
Advanced microprocessor
Evolution of microprocessors and 80486 Microprocessor.
Evolution of microprocessors and 80486 Microprocessor.
80286 microprocessor
80286 microprocessor
Computer architecture the pentium architecture
Computer architecture the pentium architecture
Architecture of pentium family
Architecture of pentium family
Pentium 8086 Instruction Format
Pentium 8086 Instruction Format
Al2ed chapter3
Al2ed chapter3
Introduction to Microprocessors
Introduction to Microprocessors
Chapter 1 microprocessor introduction
Chapter 1 microprocessor introduction
Module 4 advanced microprocessors
Module 4 advanced microprocessors
Destaque
mpmc (Microprocessor and microcontroller) notes
mpmc (Microprocessor and microcontroller) notes
Nexus
Unit 5
Unit 5
Saurabh Yadav
Microprocessor and Microcontroller Lab Manual
Microprocessor and Microcontroller Lab Manual
Santhosh Kumar
MPMC Unit-1
MPMC Unit-1
A.S. Krishna
Mpmc lab
Mpmc lab
anveshthatikonda
Microprocessors and microcontrollers short answer questions and answers
Microprocessors and microcontrollers short answer questions and answers
Abhijith Augustine
8086 class notes-Y.N.M
8086 class notes-Y.N.M
Dr.YNM
8051 Microcontroller Notes
8051 Microcontroller Notes
Dr.YNM
8086 instruction set with types
8086 instruction set with types
Ravinder Rautela
Microwave Engineering Lecture Notes
Microwave Engineering Lecture Notes
FellowBuddy.com
Mp &mc programs
Mp &mc programs
Haritha Hary
microprocessor Questions with solution
microprocessor Questions with solution
dit
Networking - Everything That You Wanted to Know
Networking - Everything That You Wanted to Know
Mark Troncone MBA, PMP, CBAP, ITILv3, CSM
Microprocessors 1-8086
Microprocessors 1-8086
Shubham Chaurasia
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
vtunotesbysree
Microwave devices and circuits (samuel liao)
Microwave devices and circuits (samuel liao)
Sudhanshu Tripathi
Microwave engineering full
Microwave engineering full
lieulieuw
Fundamentals of Power System protection by Y.G.Paithankar and S.R.Bhide
Fundamentals of Power System protection by Y.G.Paithankar and S.R.Bhide
Sourabh Ghosh
Embedded system ppt
Embedded system ppt
Vivek Chamorshikar
Digital Communication Notes written by Arun Kumar G, Associate Professor, Dep...
Digital Communication Notes written by Arun Kumar G, Associate Professor, Dep...
Arunkumar Gowdru
Destaque
(20)
mpmc (Microprocessor and microcontroller) notes
mpmc (Microprocessor and microcontroller) notes
Unit 5
Unit 5
Microprocessor and Microcontroller Lab Manual
Microprocessor and Microcontroller Lab Manual
MPMC Unit-1
MPMC Unit-1
Mpmc lab
Mpmc lab
Microprocessors and microcontrollers short answer questions and answers
Microprocessors and microcontrollers short answer questions and answers
8086 class notes-Y.N.M
8086 class notes-Y.N.M
8051 Microcontroller Notes
8051 Microcontroller Notes
8086 instruction set with types
8086 instruction set with types
Microwave Engineering Lecture Notes
Microwave Engineering Lecture Notes
Mp &mc programs
Mp &mc programs
microprocessor Questions with solution
microprocessor Questions with solution
Networking - Everything That You Wanted to Know
Networking - Everything That You Wanted to Know
Microprocessors 1-8086
Microprocessors 1-8086
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
Microwave devices and circuits (samuel liao)
Microwave devices and circuits (samuel liao)
Microwave engineering full
Microwave engineering full
Fundamentals of Power System protection by Y.G.Paithankar and S.R.Bhide
Fundamentals of Power System protection by Y.G.Paithankar and S.R.Bhide
Embedded system ppt
Embedded system ppt
Digital Communication Notes written by Arun Kumar G, Associate Professor, Dep...
Digital Communication Notes written by Arun Kumar G, Associate Professor, Dep...
Semelhante a Mpmc
Pentium
Pentium
Akshay Nagpurkar
microprocessor unit1 2022.pptx
microprocessor unit1 2022.pptx
22X041SARAVANANS
EC 8691 Microprocessor and Microcontroller.pptx
EC 8691 Microprocessor and Microcontroller.pptx
GobinathAECEJRF1101
PENTIUM - PRO MICROPROCESSORS MP SY.pptx
PENTIUM - PRO MICROPROCESSORS MP SY.pptx
SanjayBhosale20
Microprocessor Unit -1 SE computer-II.pptx
Microprocessor Unit -1 SE computer-II.pptx
akshathsingh2003
80386 & 80486
80386 & 80486
RakeshKumarSharma46
VJITSk 6713 user manual
VJITSk 6713 user manual
kot seelam
PIC MICROCONTROLLERS -CLASS NOTES
PIC MICROCONTROLLERS -CLASS NOTES
Dr.YNM
Evolution Of Microprocessors
Evolution Of Microprocessors
harinder
Evolution of microprocessors
Evolution of microprocessors
harinder
U I - 4. 80386 Real mode.pptx
U I - 4. 80386 Real mode.pptx
SangeetaShekhawatTri
Microprocessors & Microcomputers Lecture Notes
Microprocessors & Microcomputers Lecture Notes
FellowBuddy.com
80286 microprocessors
80286 microprocessors
Rajesh Reddy G
introduction of microprocessor
introduction of microprocessor
Reetika Singh
Doc32002
Doc32002
Alfredo Santillan
Describr the features of pentium microppr
Describr the features of pentium microppr
edwardkiwalabye1
Solution manual the 8051 microcontroller based embedded systems
Solution manual the 8051 microcontroller based embedded systems
manishpatel_79
I. Introduction to Microprocessor System.ppt
I. Introduction to Microprocessor System.ppt
HAriesOa1
Microprocessors and Applications
Microprocessors and Applications
rachurivlsi
EEE226a.ppt
EEE226a.ppt
SaifulAhmad27
Semelhante a Mpmc
(20)
Pentium
Pentium
microprocessor unit1 2022.pptx
microprocessor unit1 2022.pptx
EC 8691 Microprocessor and Microcontroller.pptx
EC 8691 Microprocessor and Microcontroller.pptx
PENTIUM - PRO MICROPROCESSORS MP SY.pptx
PENTIUM - PRO MICROPROCESSORS MP SY.pptx
Microprocessor Unit -1 SE computer-II.pptx
Microprocessor Unit -1 SE computer-II.pptx
80386 & 80486
80386 & 80486
VJITSk 6713 user manual
VJITSk 6713 user manual
PIC MICROCONTROLLERS -CLASS NOTES
PIC MICROCONTROLLERS -CLASS NOTES
Evolution Of Microprocessors
Evolution Of Microprocessors
Evolution of microprocessors
Evolution of microprocessors
U I - 4. 80386 Real mode.pptx
U I - 4. 80386 Real mode.pptx
Microprocessors & Microcomputers Lecture Notes
Microprocessors & Microcomputers Lecture Notes
80286 microprocessors
80286 microprocessors
introduction of microprocessor
introduction of microprocessor
Doc32002
Doc32002
Describr the features of pentium microppr
Describr the features of pentium microppr
Solution manual the 8051 microcontroller based embedded systems
Solution manual the 8051 microcontroller based embedded systems
I. Introduction to Microprocessor System.ppt
I. Introduction to Microprocessor System.ppt
Microprocessors and Applications
Microprocessors and Applications
EEE226a.ppt
EEE226a.ppt
Mais de Akshay Nagpurkar
4.osi model
4.osi model
Akshay Nagpurkar
L6 mecse ncc
L6 mecse ncc
Akshay Nagpurkar
Tcp ip
Tcp ip
Akshay Nagpurkar
1 ip address
1 ip address
Akshay Nagpurkar
1.network topology
1.network topology
Akshay Nagpurkar
1.lan man wan
1.lan man wan
Akshay Nagpurkar
Dcunit4 transmission media
Dcunit4 transmission media
Akshay Nagpurkar
Ppl for students unit 4 and 5
Ppl for students unit 4 and 5
Akshay Nagpurkar
Ppl for students unit 1,2 and 3
Ppl for students unit 1,2 and 3
Akshay Nagpurkar
Ppl for students unit 4 and 5
Ppl for students unit 4 and 5
Akshay Nagpurkar
234 rb trees2x2
234 rb trees2x2
Akshay Nagpurkar
Ppl home assignment_unit4
Ppl home assignment_unit4
Akshay Nagpurkar
Ppl home assignment_unit5
Ppl home assignment_unit5
Akshay Nagpurkar
3 multiplexing-wdm
3 multiplexing-wdm
Akshay Nagpurkar
2 multiplexing
2 multiplexing
Akshay Nagpurkar
1 multiplexing
1 multiplexing
Akshay Nagpurkar
Pcm pulse codemodulation-2
Pcm pulse codemodulation-2
Akshay Nagpurkar
Modulation techniq of modem
Modulation techniq of modem
Akshay Nagpurkar
Ppl home assignment_unit3
Ppl home assignment_unit3
Akshay Nagpurkar
Ppl home assignment_unit2
Ppl home assignment_unit2
Akshay Nagpurkar
Mais de Akshay Nagpurkar
(20)
4.osi model
4.osi model
L6 mecse ncc
L6 mecse ncc
Tcp ip
Tcp ip
1 ip address
1 ip address
1.network topology
1.network topology
1.lan man wan
1.lan man wan
Dcunit4 transmission media
Dcunit4 transmission media
Ppl for students unit 4 and 5
Ppl for students unit 4 and 5
Ppl for students unit 1,2 and 3
Ppl for students unit 1,2 and 3
Ppl for students unit 4 and 5
Ppl for students unit 4 and 5
234 rb trees2x2
234 rb trees2x2
Ppl home assignment_unit4
Ppl home assignment_unit4
Ppl home assignment_unit5
Ppl home assignment_unit5
3 multiplexing-wdm
3 multiplexing-wdm
2 multiplexing
2 multiplexing
1 multiplexing
1 multiplexing
Pcm pulse codemodulation-2
Pcm pulse codemodulation-2
Modulation techniq of modem
Modulation techniq of modem
Ppl home assignment_unit3
Ppl home assignment_unit3
Ppl home assignment_unit2
Ppl home assignment_unit2
Último
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Antenna Manufacturer Coco
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Pixlogix Infotech
Slack Application Development 101 Slides
Slack Application Development 101 Slides
praypatel2
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
Anna Loughnan Colquhoun
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Khem
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Results
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
wesley chun
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
debabhi2
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
RTylerCroy
Último
(20)
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
Slack Application Development 101 Slides
Slack Application Development 101 Slides
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
🐬 The future of MySQL is Postgres 🐘
🐬 The future of MySQL is Postgres 🐘
Mpmc
1.
Microprocessors
and Microcontrollers Third Year BE Computers Pawar Virendra D. Mo. No.:9423582261 1/153 MPMC© Pawar Virendra D.
2.
Syllabus EC4813 : Microprocessors
and Microcontrollers Microprocessors and Microcontrollers Prerequisites : Understanding of Microprocessors, Peripheral Chips, Analogue Sensors, Conversion, Interfacing Techniques. Aim : This course covers the design of hardware and software code using a modern microcontroller. It emphasizes on assembly language programming of the microcontroller including device drivers, exception and interrupt handling, and interfacing with higher- level languages. Objectives: 1. To exhibit knowledge of the architecture of microcontrollers and apply program control structures to microcontrollers; 2. To develop the ability to use assembly language to program a microcontroller and demonstrate the capability to program the microcontroller to communicate with external circuitry using parallel ports; 3. To demonstrate the capability to program the microcontroller to communicate with external circuitry using serial ports and timer ports. Unit 1 : Introduction to Pentium microprocessor ( 7 Hrs ) Pentium Microprocessor: History ,Feature & Architecture, Pin Description , Functional Description Real Mode, Risc Super Scalar, Pipe lining , Instruction Pairing, Branch Prediction, Inst Data Cache. FPU Unit 2 : Bus Cycles and Memory Organization: ( 7 Hrs ) Bus Cycles & Memory Organisation : Init & Configuration, Bus Operations-RST, Bus Operations-RST, Mem/Io Organisation, Data Transfer Mechanism , 8/16/32 bit Data Bus I, Programmers Model, Register Set, Instru Set , Data Types, Instructions Unit 3 : Protected Mode: ( 6 Hrs ) Protected Mode :Intro Segmentation, Supp Registers ,Rel Int Desc, Mem Man thru Segmentation , Logical to linear translation, protection by segmentation, Privilege Level protection, related instructions, inter - privilege level transfer of control, paging-support registers, descriptors ,linear-physical add trans, TLB, page level protection ,virtual memory Unit 4 : Multitasking, Interrupts, Exceptions and I/O ( 6 Hrs ) Multitasking, Interrupts, Exception I/O :Multi Tasking Support Reg , Rel Des, Task Switch I/O per BitMap, Virtual Mode, Add Gen, Priv Level, Inst &Reg ,enter/Leaving V86 M, Interrupt Structure Real/Prot V86 Mode, I/O Handling, comparison of 3 modes. Unit 5 : 8051 Micro controller ( 7 Hrs ) Family Architecture , ,Data / Programme Memory , Reg set Reg Bank SFR, Ext Data / Mem Programme Mem, Interrupt Structure , Timer Prog ,Serial Port Prog , Misc Features, Min System Unit 6 : PIC Micro-Controller ( 7 Hrs ) PIC Micro-Controller :OverView ,Features, Pin Out, Capture /Compare /Pulse width modulation Mode , Block Dia Prog Model, Rest /Clocking, Mem Org, Prog/Data, Flash Eprom, Add Mode/Inst Set Prog , I/o, Interrupt , Timer, ADC Outcomes: Upon completion of the course, the student should be able to: 2/153 MPMC© Pawar Virendra D.
3.
1. Describe and
use the functional blocks utilized in a basic microcontroller based system. 2. Describe the programmer's model of the CPU's instruction set and various addressing modes. 3. Proficiently use the various instruction set and functional groups, when programming. 4. Integrate structured programming techniques and sub-routines into microcontroller based hardware topologies. 5. Develop I/O port, ADC hardware, and software interfacing techniques. 6. Describe the use of sensors, interfacing, and signal conditioning when utilizing the microcontroller in control and monitor applications. Text Books: 1. Antonakos J., "The Pentium Microprocessor", Pearson Education, 2004, 2nd Edition. 2. Deshmukh A., "Microcontrollers - Theory and Applications", Tata McGraw-Hill, 2004, Reference Books: 1. Mazidi M., Gillispie J., " The 8051 Microcontroller and embedded systems", Pearson education, 2002, ISBN - 81-7808-574-7 2 Intel Pentium Data Sheets 3. Ayala K., "The 8051 Microcontroller", Penram International, 1996, ISBN 81 -900828- 4-1 4. Intel 8 bit Microcontroller manual 5. Microchip manual for PIC 16CXX and 16FXX 3/153 MPMC© Pawar Virendra D.
4.
INTRODUCTION 16-bit Processors and
Segmentation (1978) The IA-32 architecture family was preceded by 16-bit processors, the 8086 and 8088. The 8086 has 16-bit registers and a 16-bit external data bus, with 20-bit addressing giving a 1-MByte address space. The 8088 is similar to the 8086 except it has an 8-bit external data bus. The 8086/8088 introduced segmentation to the IA-32 architecture. With segmentation, a 16-bit segment register contains a pointer to a memory segment of up to 64 KBytes. Using four segment registers at a time, 8086/8088 processors are able to address up to 256 KBytes without switching between segments. The 20-bit addresses that can be formed using a segment register and an additional 16-bit pointer provide a total address range of 1 MByte. The Intel® 286 Processor (1982) The Intel 286 processor introduced protected mode operation into the IA-32 architecture. Protected mode uses the segment register content as selectors or pointers into descriptor tables. Descriptors provide 24-bit base addresses with a physical memory size of up to 16 Mbytes , support for virtual memory management on a segment swapping basis, and a number of protection mechanisms. These mechanisms include: • Segment limit checking • Read-only and execute-only segment options • Four privilege levels The Intel386™ Processor (1985) The Intel386 processor was the first 32-bit processor in the IA-32 architecture family. It introduced 32-bit registers for use both to hold operands and for addressing. The lower half of each 32-bit Intel386 register retains the properties of the 16-bit registers of earlier generations, permitting backward compatibility. The processor also provides a virtual- 8086 mode that allows for even greater efficiency when executing programs created for 8086/8088 processors. In addition, the Intel386 processor has support for: • A 32-bit address bus that supports up to 4-GBytes of physical memory • A segmented-memory model and a flat memory model • Paging, with a fixed 4-KByte page size providing a method for virtual memory management • Support for parallel stages The Intel486™ Processor (1989) The Intel486™ processor added more parallel execution capability by expanding the Intel386 processor’s instruction decode and execution units into five pipelined stages. Each stage operates in parallel with the others on up to five instructions in different stages of execution. In addition, the processor added: • An 8-KByte on-chip first-level cache that increased the percent of instructions that could execute at the scalar rate of one per clock 4/153 MPMC© Pawar Virendra D.
5.
• An integrated
x87 FPU • Power saving and system management capabilities The Intel® Pentium® Processor (1993) The introduction of the Intel Pentium processor added a second execution pipeline to achieve superscalar performance (two pipelines, known as u and v, together can execute two instructions per clock). The on-chip first-level cache doubled, with 8 KBytes devoted to code and another 8 KBytes devoted to data. The data cache uses the MESI protocol to support more efficient write-back cache in addition to the write-through cache previously used by the Intel486 processor. Branch prediction with an on-chip branch table was added to increase performance in looping constructs. In addition, the processor added: • Extensions to make the virtual-8086 mode more efficient and allow for 4-MByte as well as 4-KByte pages • Internal data paths of 128 and 256 bits add speed to internal data transfers • Burst able external data bus was increased to 64 bits • An APIC to support systems with multiple processors • A dual processor mode to support glueless two processor systems PROCESSOR FEATURES OVERVIEW The Pentium processor supports the features of previous Intel Architecture processors and provides significant enhancements including the following: • Superscalar Architecture • Dynamic Branch Prediction • Pipelined Floating-Point Unit • Improved Instruction Execution Time • Separate Code and Data Caches. • Writeback MESI Protocol in the Data Cache • 64-Bit Data Bus • Bus Cycle Pipelining • Address Parity • Internal Parity Checking • Functional Redundancy Checking2 and Lock Step operation2 • Execution Tracing • Performance Monitoring • IEEE 1149.1 Boundary Scan • System Management Mode • Virtual Mode Extensions • Upgradable with a Pentium OverDrive processor2 • Dual processing support • Advanced SL Power Management Features • Fractional Bus Operation • On-Chip Local APIC Device • Functional Redundancy Checking and Lock Step operation 5/153 MPMC© Pawar Virendra D.
6.
• Support for
the Intel 82498/82493 and 82497/82492 cache chipset products • Upgradability with a Pentium OverDrive processor • Split line accesses to the code cache COMPONENT INTRODUCTION The application instruction set of the Pentium processor family includes the complete instruction set of existing Intel Architecture processors to ensure backward compatibility, with extensions to accommodate the additional functionality of the Pentium processor. All application software written for the Intel386™ and Intel486™ microprocessors will run on the Pentium processor without modification. The on-chip memory management unit (MMU) is completely compatible with the Intel386 and Intel486 CPUs. The two instruction pipelines and the floating-point unit on the Pentium processor are capable of independent operation. Each pipeline issues frequently used instructions in a single clock. Together, the dual pipes can issue two integer instructions in one clock, or one floating-point instruction (under certain circumstances, 2 floating-point instructions) 6/153 MPMC© Pawar Virendra D.
7.
in one clock.
Branch prediction is implemented in the Pentium processor. To support this, the Pentium processor implements two prefetch buffers, one to prefetch code in a linear fashion, and one that prefetches code according to the Branch Target Buffer (BTB) so the needed code is almost always prefetched before it is needed for execution. The Pentium processor includes separate code and data caches integrated on chip to meet its performance goals.. The caches on the Pentium processor are each 8 Kbytes in size and 2-way set-associative. Each cache has a dedicated Translation Lookaside Buffer (TLB) to translate linear addresses to physical addresses. The Pentium processor data cache is configurable to be writeback or writethrough on a line-by-line basis and follows the MESI protocol. The data cache tags are triple ported to support two data transfers and an inquire cycle in the same clock. The code cache is an inherently write protected cache. The code cache tags of the Pentium processor are also triple ported to support snooping and split-line accesses. The Pentium processor has a 64-bit data bus. Burst read and burst writeback cycles are supported by the Pentium processor. In addition, bus cycle pipelining has been added to allow two bus cycles to be in progress simultaneously. The Pentium processor Memory Management Unit contains optional extensions to the architecture which allow 4 MB page sizes. The Pentium processor has added significant data integrity and error detection capability. Data parity checking is still supported on a byte-by-byte basis. Address parity checking, and internal parity checking features have been added along with a new exception, the machine check exception. The Pentium processor has implemented functional redundancy checking to provide maximum error detection of the processor and the interface to the processor. When functional redundancy checking is used, a second processor, the “checker” is used to execute in lock step with the “master” processor. The checker samples the master’s outputs and compares those values with the values it computes internally, and asserts an error signal if a mismatch occurs. The Pentium processor with MMX technology does not support functional redundancy checking. As more and more functions are integrated on chip, the complexity of board level testing is increased. To address this, the Pentium processor has increased test and debug capability by implementing IEEE Boundary Scan (Standard 1149.1). System management mode has been implemented along with some extensions to the SMM architecture. Enhancements to the Virtual 8086 mode have been made to increase performanceby reducing the number of times it is necessary to trap to a Virtual 8086 monitor. including the two instruction pipelines, the “u” pipe and the “v” pipe. The u-pipe can execute all integer and floating-point instructions. The v-pipe can execute simple integer instructions and the FXCH floating-point instruction. 7/153 MPMC© Pawar Virendra D.
8.
The separate code
and data caches are shown. The data cache has two ports, one for each of the two pipes (the tags are triple ported to allow simultaneous inquire cycles). The data cache has a dedicated to translate linear addresses to the physical addresses used by the data cache. The code cache, branch target buffer and prefetch buffers are responsible for getting raw instructions into the execution units of the Pentium processor. Instructions are fetched from the code cache or from the external bus. Branch addresses are remembered by the branch target buffer. The code cache TLB translates linear addresses to physical addresses used by the code cache. The decode unit contains two parallel decoders which decode and issue up to the next two sequential instructions into the execution pipeline. The control ROM contains the microcode which controls the sequence of operations performed by the processor. The control unit has direct control over both pipelines. The Pentium processor contains a pipelined floating-point unit that provides a significant floating-point performance advantage over previous generations of Intel Architecture- based processors. The Pentium processor includes features to support multi-processor systems, namely an on chip Advanced Programmable Interrupt Controller (APIC). This APIC implementation supports multiprocessor interrupt management (with symmetric interrupt distribution across all processors), multiple I/O subsystem support, 8259A compatibility, and inter-processor interrupt support. The dual processor configuration allows two Pentium processors to share a single L2 cache for a low-cost symmetric multi-processor system. The two processors appear to the system as a single Pentium processor. Multiprocessor operating systems properly schedule computing tasks between the two processors. This scheduling of tasks is transparent to software applications and the end-user. Logic built into the processors support a “glueless” interface for easy system design. Through a private bus, the two Pentium processors arbitrate for the external bus and maintain cache coherency. The Pentium processor can also be used in a conventional multi-processor system in which one L2 cache is dedicated to each processor. The Pentium processor is produced on Intel’s advanced silicon technology. The Pentium processor also includes SL enhanced power management features. When the clock to the Pentium processor is stopped, power dissipation is virtually eliminated. The low VCC operating voltages and SL enhanced power management features make the Pentium processor a good choice for energy-efficient desktop designs. 8/153 MPMC© Pawar Virendra D.
9.
PIN DESCRIPTION Symbol
Type Name and Function A31-A3 I/O As outputs, the address lines of the processor along with the byte enables define the physical area of memory or I/O accessed. The external system drives the inquire address to the processor on A31-A5. D63-D0 I/O These are the 64 data lines for the processor. Lines D7-D0 define the least significant byte of the data bus; lines D63-D56 define the most significant byte of the data bus. When the CPU is driving the data lines, they are driven during the T2, T12, or T2P clocks for that cycle. During reads, the CPU samples the data bus when BRDY# is returned. ADS# O The address status indicates that a new valid bus cycle is currently being driven by the Pentium processor BE7#-BE5# O The byte enable pins are used to determine which bytes must BE4#-BE0# I/O be written to external memory, or which bytes were requested by the CPU for the current cycle. The byte enables are driven in the same clock as the address lines (A31-3). BOFF# I The backoff input is used to abort all outstanding bus cycles that have not yet completed. In response to BOFF#, the Pentium processor will float all pins normally floated during bus hold in the next clock. Theprocessor remains in bus hold until BOFF# is negated, at which time the Pentium processor restarts the aborted bus cycle(s) in their entirety. BRDY# I The burst ready input indicates that the external system has presented valid data on the data pins in response to a read or that the external system has accepted the Pentium processor data in response to a write request. This signal is sampled in the T2, T12 and T2P bus states. CACHE# O For Pentium processor initiated cycles the cache pin indicates internal cacheability of the cycle (if a read), and indicates a burst write back cycle (if a write). If this pin is driven inactive during a read cycle, the Pentium processor will not cache the returned data, regardless of the state of the KEN# pin. This pin is also used to determine the cycle length (number of transfers in the cycle). CPUTYP I CPU type distinguishes the Primary processor from the Dual processor. In a single processor environment, or when the Pentium processor is acting as the Primary processor in a dual processing system, CPUTYP should be strapped to VSS. The Dual processor should have CPUTYP strapped to VCC. For the Pentium OverDrive processor, CPUTYP will be used to determine whether the bootup handshake protocol will be used (in a dual socket system) or not (in a single socket system). FLUSH# I When asserted, the cache flush input forces the Pentium processor to write back all modified lines in the data cache 9/153 MPMC© Pawar Virendra D.
10.
and invalidate its
internal caches. A Flush Acknowledge special cycle will be generated by the Pentium processor indicating completion of the write back and invalidation. If FLUSH# is sampled low when RESET transitions from high to low, tristate test mode is entered. If two Pentium processor are operating in dual processing mode and FLUSH# is asserted, the Dual processor will perform a flush first (without a flush acknowledge cycle), then the Primary processor will perform a flush followed by a flush acknowledge cycle. NOTE: If the FLUSH# signal is asserted in dual processing mode, it must be deasserted at least one clock prior to BRDY# of the FLUSH Acknowledge cycle to avoid DP arbitration problems. FRCMC# I The functional redundancy checking master/checker mode input is used to determine whether the Pentium processor is configured in master mode or checker mode. When configured as a master, the Pentium processor drives its output pins as required by the bus protocol. When configured as a checker, the Pentium processor tristates all outputs (except IERR# and TDO) and samples the output pins. The configuration as a master/checker is set after RESET and may not be changed other than by a subsequent RESET. HOLD I In response to the bus hold request, the Pentium processor will float most of its output and input/output pins and assert HLDA after completing all outstanding bus cycles. The Pentium processor will maintain its bus in this state until HOLD is de-asserted. HOLD is not recognized during LOCK cycles. The Pentium processor will recognize HOLD during reset. HOLDA O The bus hold acknowledge pin goes active in response to a hold request driven to the processor on the HOLD pin. It indicates that the Pentium processor has floated most of the output pins and relinquished the bus to another local bus master. When leaving bus hold, HLDA will be driven inactive and the Pentium processor will resume driving the bus. If the Pentium processor has a bus cycle pending, it will be driven in the same clock that HLDA is de-asserted. INIT I The Pentium processor initialization input pin forces the Pentium processor to begin execution in a known state. The processor state after INIT is the same as the state after RESET except that the internal caches, write buffers, and floating point registers retain the values they had prior to INIT. INIT may NOT be used in lieu of RESET after power-up. If INIT is sampled high when RESET transitions from high to low, the Pentium processor will perform built-in self test prior to the start of program execution. 10/153 MPMC© Pawar Virendra D.
11.
INV
I The invalidation input determines the final cache line state (S or I) in case of an inquire cycle hit. It is sampled together with the address for the inquire cycle in the clock EADS# is sampled active. KEN# I The cache enable pin is used to determine whether the current cycle is cacheable or not and is consequently used to determine cycle length. When the Pentium processor generates a cycle that can be cached (CACHE# asserted) and KEN# is active, the cycle will be transformed into a burst line fill cycle. LOCK# O The bus lock pin indicates that the current bus cycle is locked. The Pentium processor will not allow a bus hold when LOCK# is asserted (but AHOLD and BOFF# are allowed). LOCK# goes active in the first clock of the first locked bus cycle and goes inactive after the BRDY# is returned for the last locked bus cycle. LOCK# is guaranteed to be de-asserted for at least one clock between back-to-back locked cycles. NA# I An active next address input indicates that the external memory system is ready to accept a new bus cycle although all data transfers for the current cycle have not yet completed. The Pentium processor will issue ADS# for a pending cycle two clocks after NA# is asserted. The Pentium processor supports up to 2 outstanding bus cycles. RESET I RESET forces the Pentium processor to begin execution at a known state. All the Pentium processor internal caches will be invalidated upon the RESET. Modified lines in the data cache are not written back. FLUSH#, FRCMC# and INIT are sampled when RESET transitions from high to low to determine if tristate test mode or checker mode will be entered, or if BIST will be run. 11/153 MPMC© Pawar Virendra D.
12.
REAL MODE RISC A Complex
Instruction Set Computer (CISC) provides a large and powerful range of instructions, which is less flexible to implement. For example, the 8086 microprocessor family has these instructions: JA Jump if Above JAE Jump if Above or Equal JB Jump if Below By contrast, the Reduced Instruction Set Computer (RISC) concept is to identify the sub- components and use those. As these are much simpler, they can be implemented directly in silicon, so will run at the maximum possible speed. Nothing is 'translated' Most modern CISC processors, such as the Pentium, uses a fast RISC core with an interpreter sitting between the core and the instruction. So when you are running Windows95 on a PC, it is not that much different to trying to get W95 running on the software PC emulator. Just imagine the power hidden inside the Pentium... . This is not to say that CISC processors cannot have a large number of registers, some do. However for it's use, a typical RISC processor requires more registers to give it additional flexibility. Gone are the days when you had two general purpose registers and an 'accumulator'. One thing RISC does offer, though, is register independence The 8086 offers you fourteen registers, but with caveats: The first four (A, B, C, and D) are Data registers (a.k.a. scratch-pad registers). They are 16bit and accessed as two 8 bit registers, thus register A is really AH (A, high-order byte) and AL (A low-order byte). These can be used as general purpose registers, but they can also have dedicated functions - Accumulator, Base, Count, and Data. The advantages of RISC against CISC are those today: • RISC processors are much simpler to build, by this again results in the following advantages: o easier to build, i.e. you can use already existing production facilities o much less expensive, just compare the price of a XScale with that of a Pentium III at 1 GHz... o less power consumption, which again gives two advantages: much longer use of battery driven devices no need for cooling of the device, which again gives to advantages: 12/153 MPMC© Pawar Virendra D.
13.
smaller design of
the whole device no noise RISC processors are much simpler to program which doesn't only help the assembler programmer, but the compiler designer, too. You'll hardly find any compiler which uses all the functions of a Pentium III optimally SUPER SCALAR A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate. A superscalar processor executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to redundant functional units on the processor. Each functional unit is not a separate CPU core but an execution resource within a single CPU such as an arithmetic logic unit, a bit shifter, or a multiplier. While a superscalar CPU is typically also pipelined, pipelining and superscalar architecture are considered different performance enhancement techniques. The superscalar technique is traditionally associated with several identifying characteristics (within a given CPU core): • Instructions are issued from a sequential instruction stream • CPU hardware dynamically checks for data dependencies between instructions at run time (versus software checking at compile time) • The CPU accepts multiple instructions per clock cycle The simplest processors are scalar processors. Each instruction executed by a scalar processor typically manipulates one or two data items at a time. By contrast, each instruction executed by a vector processor operates simultaneously on many data items. An analogy is the difference between scalar and vector arithmetic. A superscalar processor is sort of a mixture of the two. Each instruction processes one data item, but there are multiple redundant functional units within each CPU thus multiple instructions can be processing separate data items concurrently. Superscalar CPU design emphasizes improving the instruction dispatcher accuracy, and allowing it to keep the multiple functional units in use at all times. This has become increasingly important when the number of units increased. While early superscalar CPUs would have two ALUs and a single FPU, a modern design such as the PowerPC 970 includes four ALUs, two FPUs, and two SIMD units. If the dispatcher is ineffective at keeping all of these units fed with instructions, the performance of the system will suffer. 13/153 MPMC© Pawar Virendra D.
14.
A superscalar processor
usually sustains an execution rate in excess of one instruction per machine cycle. But merely processing multiple instructions concurrently does not make an architecture superscalar, since pipelined, multiprocessor or multi-core architectures also achieve that, but with different methods. In a superscalar CPU the dispatcher reads instructions from memory and decides which ones can be run in parallel, dispatching them to redundant functional units contained inside a single CPU. Therefore a superscalar processor can be envisioned having multiple parallel pipelines, each of which is processing instructions simultaneously from a single instruction thread. Existing binary executable programs have varying degrees of intrinsic parallelism. In some cases instructions are not dependent on each other and can be executed simultaneously. In other cases they are inter-dependent: one instruction impacts either resources or results of the other. The instructions a = b + c; d = e + f can be run in parallel because none of the results depend on other calculations. However, the instructions a = b + c; b = e + f might not be runnable in parallel, depending on the order in which the instructions complete while they move through the units. When the number of simultaneously issued instructions increases, the cost of dependency checking increases extremely rapidly. This is exacerbated by the need to check dependencies at run time and at the CPU's clock rate. This cost includes additional logic gates required to implement the checks, 14/153 MPMC© Pawar Virendra D.
15.
PIPELINE AND INSTRUCTION
FLOW The integer instructions traverse a five stage pipeline in the Pentium processor The pipeline stages are as follows: PF Prefetch D1 Instruction Decode D2 Address Generate EX Execute - ALU and Cache Access WB Writeback The Pentium processor is a superscalar machine, built around two general purpose integer pipelines and a pipelined floating-point unit capable of executing two instructions in parallel. Both pipelines operate in parallel allowing integer instructions to execute in a single clock in each pipeline. Figure depicts instruction flow in the Pentium processor. The pipelines in the Pentium processor are called the “u” and “v” pipes and the process of issuing two instructions in parallel is termed “pairing.” The u-pipe can execute any instruction in the Intel architecture, while the v-pipe can execute “simple” instructions as defined in the “Instruction Pairing Rules” section of this chapter. When instructions are paired, the instruction issued to the v-pipe is always the next sequential instruction after the one issued to the u-pipe. Pentium® Processor Pipeline Execution The Pentium processor pipeline has been optimized to achieve higher throughput compared to previous generations of Intel Architecture processors. The first stage of the pipeline is the Prefetch (PF) stage in which instructions are prefetched from the on-chip instruction cache or memory. Because the Pentium processor has separate caches for instructions and data, prefetches do not conflict with data references for access to the cache. If the requested line is not in the code cache, a memory reference is made. In the PF stage of the Pentium processor, two independent pairs of line-size (32-byte) prefetch buffers operate in conjunction with the Branch Target Buffer. This allows one prefetch buffer to prefetch instructions sequentially, while the other prefetches according to the branch target buffer predictions. The pipeline stage after 15/153 MPMC© Pawar Virendra D.
16.
the PF stage
in the Pentium processor is Decode 1 (D1) in which two parallel decoders attempt to decode and issue the next two sequential instructions. The decoders determine whether one or two instructions can be issued contingent upon the “Instruction Pairing Rules.” The Pentium processor requires an extra D1 clock to decode instruction prefixes. Prefixes are issued to the u-pipe at the rate of one per clock without pairing. After all prefixes have been issued, the base instruction will then be issued and paired according to the pairing rules. The D1 stage is followed by Decode2 (D2) in which addresses of memory resident operands are calculated. In instructions containing both a displacement and an immediate, or instructions containing a base and index addressing mode , The Pentium processor removes both of these restrictions and is able to issue instructions in these categories in a single clock. The Pentium processor uses the Execute (EX) stage of the pipeline for both ALU operations and for data cache access; therefore those instructions specifying both an ALU operation and a data cache access will require more than one clock in this stage. In EX all u-pipe instructions and all v-pipe instructions except conditional branches are verified for correct branch prediction. Microcode is designed to utilize both pipelines and thus those instructions requiring microcode execute faster. The final stage is Writeback (WB) where instructions are enabled to modify processor state and complete execution. In this stage, v-pipe conditional branches are verified for correct branch prediction. During their progression through the pipeline, instructions may be stalled due to certain conditions. Both the u-pipe and v-pipe instructions enter and leave the D1 and D2 stages in unison. When an instruction in one pipe is stalled, then the instruction in the other pipe is also stalled at the same pipeline stage. Thus both the u- pipe and the v-pipe instructions enter the EX stage in unison. Once in EX if the u-pipe instruction is stalled, then the v-pipe instruction (if any) is also stalled. If the v-pipe instruction is stalled then the instruction paired with it in the u-pipe is not allowed to advance. No successive instructions are allowed to enter the EX stage of either pipeline until the instructions in both pipelines have advanced to WB. INSTRUCTION PREFETCH In the Pentium processor PF stage, two independent pairs of line-size (32-byte) prefetch buffers operate in conjunction with the branch target buffer. Only one prefetch buffer actively requests prefetches at any given time. Prefetches are requested sequentially until a branch instruction is fetched. When a branch instruction is fetched, the branch target buffer (BTB) predicts whether the branch will be taken or not. If the branch is predicted not taken, prefetch requests continue linearly. On a predicted taken branch the other prefetch buffer is enabled and begins to prefetch as though the branch was taken. If a branch is discovered mis-predicted, the instruction pipelines are flushed and prefetching activity starts over. Integer Instruction Pairing Rules The Pentium processor can issue one or two instructions every clock. In order to issue two instructions simultaneously they must satisfy the following conditions: • Both instructions in the pair must be “simple” as defined below 16/153 MPMC© Pawar Virendra D.
17.
Simple instructions are
entirely hardwired; they do not require any microcode control and, in general, execute in one clock. The exceptions are the ALU mem, reg and ALU reg, mem • There must be no read-after-write or write-after-write register dependencies between them • Neither instruction may contain both a displacement and an immediate • Instructions with prefixes can only occur in the u-pipe. • Instruction prefixes are treated as separate 1-byte instructions. Sequencing hardware is used to allow them to function as simple instructions. The following integer instructions are considered simple and may be paired: 1. mov reg, reg/mem/imm 2. mov mem, reg/imm 3. alu reg, reg/mem/imm 4. alu mem, reg/imm 5. inc reg/mem 6. dec reg/mem 7. push reg/mem 8. pop reg 9. lea reg,mem 10. jmp/call/jcc near 11. nop 12. test reg, reg/mem 13. test acc, imm In addition, conditional and unconditional branches may be paired only if they occur as the second instruction in the pair. They may not be paired with the next sequential instruction. Also, SHIFT/ROT by 1 and SHIFT by imm may pair as the first instruction in a pair. The register dependencies that prohibit instruction pairing include implicit dependencies via registers or flags not explicitly encoded in the instruction. For example, an ALU instruction in the u-pipe (which sets the flags) may not be paired with an ADC or an SBB instruction in the v-pipe. There are two exceptions to this rule. The first is the commonly occurring sequence of compare and branch which may be paired. The second exception is pairs of pushes or pops. Although these instructions have an implicit dependency on the stack pointer, special hardware is included to allow these common operations to proceed in parallel. Although in general two paired instructions may proceed in parallel independently, there is an exception for paired “read-modify-write” instructions. Read-modify-write instructions are ALU operations with an operand in memory. When two of these instructions are paired there is a sequencing delay of two clocks in addition to the three clocks required to execute the individual instructions. Although instructions may execute in parallel their behavior as seen by the programmer is exactly the same as if they were executed sequentially. 17/153 MPMC© Pawar Virendra D.
18.
BRANCH PREDICTION Branch
Target Buffer (BTB) The Pentium processor uses a Branch Target Buffer (BTB) to predict the outcome of branch instructions which minimizes pipeline stalls due to prefetch delays. The Pentium processor accesses the BTB with the address of the instruction in the D1 stage. It contains a Branch prediction state machine with four states: (1) strongly not taken, (2) weakly not taken, (3) weakly taken, and (4) strongly taken. In the event of a correct prediction, a branch will execute without pipeline stalls or flushes. Branches which miss the BTB are assumed to be not taken. Conditional and unconditional near branches and near calls execute in 1 clock and may be executed in parallel with other integer instructions. A mispredicted branch (whether a BTB hit or miss) or a correctly predicted branch with the wrong target address will cause the pipelines to be flushed and the correct target to be fetched. Incorrectly predicted unconditional branches will incur an additional three clock delay, incorrectly predicted conditional branches in the u-pipe will incur an additional three clock delay, and incorrectly predicted conditional branches in the v-pipe will incur an additional four clock delay. NT H: History T H: 11 H: 10 P: Prediction T P: T P: T T: Taken T NT: Not Taken T NT T T NT H: 00 H: 01 T P: NT P: T NT T The benefits of branch prediction are illustrated in the following example. Consider the following loop from a benchmark program for computing prime numbers: for(k=i+prime;k<=SIZE;k+=prime) flags[k]=FALSE; A popular compiler generates the following assembly code: (prime is allocated to ecx, k is allocated to edx, and al contains the value FALSE) inner_loop: mov byte ptr flags[edx],al add edx,ecx cmp edx, SIZE jle inner_loop Each iteration of this loop will execute in 6 clocks on the Intel486 CPU. On the Pentium processor, the mov is paired with the add; the cmp with the jle. With branch prediction, each loop iteration executes in 2 clocks. 18/153 MPMC© Pawar Virendra D.
19.
CACHE ON-CHIP CACHES The Pentium
processor implements two internal caches for a total integrated cache size of 16 Kbytes: an 8 Kbyte data cache and a separate 8 Kbyte code cache. These caches are transparent to application software to maintain compatibility with previous The data cache fully supports the MESI (modified/exclusive/shared/invalid) writeback cache consistency protocol. The code cache is inherently write protected to prevent code from being inadvertently corrupted, and as a consequence supports a subset of the MESI protocol, the S (shared) and I (invalid) states. The caches have been designed for maximum flexibility and performance. The data cache is configurable as writeback or writethrough on a line-by-line basis. Memory areas can be defined as non-cacheable by software and external hardware. Cache writeback and invalidations can be initiated by hardware or software. Protocols for cache consistency and line replacement are implemented in hardware, easing system devise On the Pentium processor , each of the caches are 8 Kbytes in size and each is organized as a 2-way set associative cache. There are 128 sets in each cache, each set containing 2 lines (each line has its own tag address). Each cache line is 32 bytes wide. The In the Pentium processor , replacement in both the data and instruction caches is handled by the LRU mechanism which requires one bit per set in each of the caches. Cache Structure The instruction and data caches can be accessed simultaneously. The instruction cache can provide up to 32 bytes of raw opcodes and the data cache can provide data for two data references all in the same clock. This capability is implemented partially through the tag structure. The tags in the data cache are triple ported. One of the ports is dedicated to snooping while the other two are used to lookup two independent addresses corresponding to data references from each of the pipelines. The instruction cache tags of the Pentium processor are also triple ported. Again, one port is dedicated to support snooping and the other two ports facilitate split line accesses (simultaneously accessing upper half of one line and lower half of the next line. Each of the caches are parity protected. The operating modes of the caches are controlled by the CD (cache disable) and NW (not writethrough) bits in CR0. TLB (Translation lookaside Buffers). Each of the caches are accessed with physical addresses and each cache has its own TLB (translation lookaside buffer) to translate linear addresses to physical addresses. The TLBs associated with the instruction cache are single ported whereas the data cache TLBs are fully dual ported to be able to translate two independent linear addresses for two data references simultaneously. 19/153 MPMC© Pawar Virendra D.
20.
The goal of
an effective memory system is that the effective access time that the processor sees is very close to to, the access time of the cache. Most accesses that the processor makes to the cache are contained within this level. The achievement of this goal depends on many factors: the architecture of the processor, the behavioral properties of the programs being executed, and the size and organization of the cache. Caches work on the basis of the locality of program behavior. There are three principles involved: 1. Spatial Locality - Given an access to a particular location in memory, there is a high probability that other accesses will be made to either that or neighboring locations within the lifetime of the program. 2. Temporal Locality - This is complementary to spatial locality. Given a sequence of references to n locations, there is a high probability that references following this sequence will be made into the sequence. Elements of the sequence will again be referenced during the lifetime of the program. 3. Sequentiality- Given that a reference has been made to a particular location s it is likely that within the next several references a reference to the location of s + 1 will be made. Sequentiality is a restricted type of spatial locality and can be regarded as a subset of it. Some common terms Processor reference that are found in the cache are called cache hits. References not found in the cache are called cache misses. On a cache miss, the cache control mechanism must fetch the missing data from memory and place it in the cache. Usually the cache fetches a spatial locality called the line from memory. The physical word is the basic unit of access in the memory. The processor-cache interface can be characterized by a number of parameters. Those that directly affect processor performance include: 1. Access time for a reference found in the cache (a hit) - property of the cache size and organization. 2. Access time for a reference not found in the cache (a miss) - property of the memory organization. 3. Time to initially compute a real address given a virtual address (not-in-TLB-time) - property of the address translation facility, which, though strictly speaking, is not part of the cache, resembles the cache in most aspects and is discussed in this chapter. Data Cache Consistency Protocol (MESI Protocol) The Pentium processor Cache Consistency Protocol is a set of rules by which states are 20/153 MPMC© Pawar Virendra D.
21.
assigned to cached
entries (lines). The rules apply for memory read/write cycles only. I/O and special cycles are not run through the data cache. Every line in the Pentium processor data cache is assigned a state dependent on both Pentium processor generated activities and activities generated by other bus masters (snooping). The Pentium processor Data Cache Protocol consists of four states that define whether a line is valid (HIT/MISS), if it is available in other caches, and if it has been MODIFIED. The four states are the M (Modified), E (Exclusive), S (Shared) and the I (Invalid) states and the protocol is referred to as the MESI protocol. A definition of the states is given below: M - Modified: An M-state line is available in ONLY one cache and it is also MODIFIED (different from main memory). An M-state line can be accessed (read/written to) without sending a cycle out on the bus. E - Exclusive: An E-state line is also available in ONLY one cache in the system, but the line is not MODIFIED (i.e., it is the same as main memory). An E-state line can be accessed (read/written to) without generating a bus cycle. A write to an E-state line will cause the line to become MODIFIED. S - Shared: This state indicates that the line is potentially shared with other caches (i.e. the same line may exist in more than one cache). A read to an S-state line will not generate bus activity, but a write to a SHARED line will generate a write through cycle on the bus. The write through cycle may invalidate this line in other caches. A write to an S-state line will update the cache. I - Invalid: This state indicates that the line is not available in the cache. A read to this line will be a MISS and may cause the Pentium processor to execute a LINE FILL (fetch the whole line into the cache from main memory). A write to an INVALID line will cause the Pentium processor to execute a write-through cycle on the bus. Inquire Cycles (Snooping) The purpose of inquire cycles is to check whether the address being presented is contained within the caches in the Pentium processor. ------------------------------------------------------------------------ ---------------------- 21/153 MPMC© Pawar Virendra D.
22.
Cache Organization Within the
cache, there are three basic types of organization: 1. Direct Mapped 2. Fully Associative 3. Set Associative In fully associative mapping, when a request is made to the cache, the requested address is compared in a directory against all entries in the directory. If the requested address is found (a directory hit), the corresponding location in the cache is fetched and returned to the processor; otherwise, a miss occurs. 22/153 MPMC© Pawar Virendra D.
23.
Fully Associative Cache In
a direct mapped cache, lower order line address bits are used to access the directory. Since multiple line addresses map into the same location in the cache directory, the upper line address bits (tag bits) must be compared with the directory address to ensure a hit. If a comparison is not valid, the result is a cache miss, or simply a miss. The address given to the cache by the processor actually is subdivided into several pieces, each of which has a different role in accessing data. 23/153 MPMC© Pawar Virendra D.
24.
Direct Mapped Cache The
set associative cache operates in a fashion somewhat similar to the direct-mapped cache. Bits from the line address are used to address a cache directory. However, now there are multiple choices: two, four, or more complete line addresses may be present in the directory. Each of these line addresses corresponds to a location in a sub-cache. The collection of these sub-caches forms the total cache array. In a set associative cache, as in the direct-maped cache, all of these sub-arrays can be accessed simultaneously, together with the cache directory. If any of the entries in the cache directory match the reference address, and there is a hit, the particular sub-cache array is selected and out gated back to the processor. Set Associative Cache 24/153 MPMC© Pawar Virendra D.
25.
Cache Calculation
Tag Line / Set Byte/Block Cache Main 512 bytes Memory 16Kb 2 4 Lines 16 Bytes / 210 Lines line 16 bytes / line 2 Sets Line Size = 16 = 24 Byte / Block = 4 Total Number of address lines to address main memory = 16 Kb = 214 Total number of lines in Cache = 512 = 29 Set or Ways = 2 512 = = 28 2 28 Line or Set Size = 4 = 24 Line /Set Size = 4 2 Total Number lines in main memory Tag Size = Total Number of lines in cache set 10 2 = = 26 Tag size = 6 24 214 (Total ) = 2 6 (Tag ) * 2 4 ( Line / Set ) * 2 4 ( Block / Byte) 25/153 MPMC© Pawar Virendra D.
26.
THE X87 FPU FLOATING-POINT
UNIT The floating-point unit (FPU) of the Pentium processor is integrated with the integer unit on the first five stages of the U pipe line The fifth stage FB becomes X1. It is heavily pipelined. The FPU is designed to be able to accept one floating point .operation every clock. It can receive up to two floating-point instructions every clock, one of which must be an exchange instruction. Floating-Point Pipeline Stages The Pentium processor FPU has 8 pipeline stages, the first five of which it shares with the integer unit. Integer instructions pass through only the first 5 stages. Integer instructions use the fifth (X1) stage as a WB (write-back) stage. The 8 FP pipeline stages, and the activities that are performed in them are summarized below: PF Prefetch; D1 Instruction Decode; D2 Address generation; EX Memory and register read; conversion of FP data to external memory format and memory write; X1 Floating-Point Execute stage one; conversion of external memory format to internal FP data format and write operand to FP register file; bypass 1 (bypass 1 described in the “Bypasses” section). X2 Floating-Point Execute stage two; WF Perform rounding and write floating-point result to register file; bypass 2 (bypass 2 described in the “Bypasses” section). ER Error Reporting/Update Status Word. FPU Bypasses The Pentium processor stack architecture instruction set requires that all instructions have one source operand on the top of the stack. Since most instructions also have their destination as the top of the stack, most instructions see a “top of stack bottleneck.” New source operands must be brought to the top of the stack before we can issue an arithmetic instruction on them. This calls for extra usage of the exchange instruction, which allows the programmer to bring an available operand to the top of the stack. The following section describes the floating-point register file bypasses that exist on the Pentium processor. The register file has two write ports and two read ports. The read ports are used to read data out of the register file in the E stage. One write port is used to write data into the register file in the X1 stage, and the other in the WF stage. A bypass allows data that is about to be written into the register file to be available as an operand that is to be read from the register file by any succeeding floating-point instruction. A bypass is specified by a pair of ports (a write port and a read port) that get circumvented. Using the bypass, data is made available even before actually writing it to the register file. 26/153 MPMC© Pawar Virendra D.
27.
The following procedures
are implemented: 1. Bypass the X1 stage register file write port and the E stage register file read port. 2. Bypass the WF stage register file write port and the E stage register file read port. With bypass 1, the result of a floating-point load (that writes to the register file in the X1 stage) can bypass the X1 stage write and be sent directly to the operand fetch stage or E stage of the next instruction. With bypass 2, the result of any arithmetic operation can bypass the WF stage write to the register file, and be sent directly to the desired execution unit as an operand for the next instruction. PROGRAMMING WITH THE x87 FPU The x87 Floating-Point Unit (FPU) provides high-performance floating-point processing capabilities for use in graphics processing, scientific, engineering, and business applications. It supports the floating-point, integer, and packed BCD integer data types and the floating-point processing algorithms and exception handling architecture defined in the IEEE Standard 754 for Binary Floating-Point Arithmetic. X87 FPU EXECUTION ENVIRONMENT The x87 FPU represents a separate execution environment within the IA-32. This execution environment consists of eight data registers (called the x87 FPU data registers) and the following special-purpose registers: • Status register • Control register • Tag word register • Last instruction pointer register • Last data (operand) pointer register • Opcode register These registers are described in the following sections. x87 FPU Data Registers The x87 FPU data registers consist of eight 80-bit registers. Values are stored in these registers in the double extended-precision floating-point format. When floating-point, integer, or packed BCD integer values are loaded from memory into any of the x87 FPU data registers, the values are automatically converted into double extended precision floating-point format (if they are not already in that format). When computation results are subsequently transferred back into memory from any of the x87 FPU registers, the results can be left in the double extended-precision floating-point format or converted back into a shorter floating-point format, an integer format, or the packed BCD integer format. 27/153 MPMC© Pawar Virendra D.
28.
x87 FPU Execution
Environment The x87 FPU instructions treat the eight x87 FPU data registers as a register stack .All addressing of the data registers is relative to the register on the top of the stack. The register number of the current top-of-stack register is stored in the TOP (stack TOP) field in the x87 FPU status word. Load operations decrement TOP by one and load a value into the new top of- stack register, and store operations store the value from the current TOP register in memory and then increment TOP by one. (For the x87 FPU, a load operation is equivalent to a push and a store operation is equivalent to a pop.) Note that load and store operations are also available that do not push and pop the stack. x87 FPU Data Register Stack 28/153 MPMC© Pawar Virendra D.
29.
If a load
operation is performed when TOP is at 0, register wraparound occurs and the new value of TOP is set to 7. The floating-point stack-overflow exception indicates when wraparound might cause an unsaved value to be overwritten . Many floating-point instructions have several addressing modes that permit the programmer to implicitly operate on the top of the stack, or to explicitly operate on specific registers relative to the TOP. Assemblers support these register addressing modes, using the expression ST(0), or simply ST, to represent the current stack top and ST(i) to specify the ith register from TOP in the stack (0 ≤ i ≤ 7). For example, if TOP contains 011B (register 3 is the top of the stack), the following instruction would add the contents of two registers in the stack (registers 3 and 5): FADD ST, ST(2); Figure shows an example of how the stack structure of the x87 FPU registers and instructions are typically used to perform a series of computations. Here, a two- dimensional dot product is computed, as follows: 1. The first instruction (FLD value1) decrements the stack register pointer (TOP) and loads the value 5.6 from memory into ST(0). The result of this operation is shown in snapshot (a). 2. The second instruction multiplies the value in ST(0) by the value 2.4 from memory and stores the result in ST(0), shown in snap-shot (b). 3. The third instruction decrements TOP and loads the value 3.8 in ST(0). 4. The fourth instruction multiplies the value in ST(0) by the value 10.3 from memory and stores the result in ST(0), shown in snap-shot (c). 5. The fifth instruction adds the value and the value in ST(1) and stores the result in ST(0), shown in snap-shot (d). Example x87 FPU Dot Product Computation 29/153 MPMC© Pawar Virendra D.
30.
MICROPROCESSOR INITIALIZATION AND CONFIGURATION Before
normal operation of the Pentium processor can begin, the Pentium processor must be initialized by driving the RESET pin active. The RESET pin forces the Pentium processor to begin execution in a known state. Several features are optionally invoked at the falling edge of RESET: Built-in-Self-Test (BIST), Functional Redundancy Checking and Tristate Test Mode. In addition to the standard RESET pin, the Pentium processor has implemented an initialization pin (INIT) that allows the processor to begin execution in a known state without disrupting the contents of the internal caches or the floating-point state. POWER UP SPECIFICATIONS During power up, RESET must be asserted while VCC is approaching nominal operating voltage to prevent internal bus contention which could negatively affect the reliability of the processor. It is recommended that CLK begin toggling within 150 ms after VCC reaches its proper operating level. This recommendation is only to ensure long term reliability of the device. In order for RESET to be recognized, the CLK input needs to be toggling. RESET must remain asserted for 1 millisecond after VCC and CLK have reached their AC/DC specifications. TEST AND CONFIGURATION FEATURES (BIST, FRC, TRISTATE TEST MODE) The INIT, FLUSH#, and FRCMC# inputs are sampled when RESET transitions from high to low to determine if BIST will be run, or if tristate test mode or checker mode will be entered (respectively). If RESET is driven synchronously, these signals must be at their valid level and meet setup and hold times on the clock before the falling edge of RESET. If RESET is asserted asynchronously, these signals must be at their valid level two clocks before and after RESET transitions from high to low. Built In Self-Test Self-test is initiated by driving the INIT pin high when RESET transitions from high to low. No bus cycles are run by the Pentium processor during self test. The duration of self test is approximately 219 core clocks. Approximately 70% of the devices in the Pentium processor are tested by BIST. The Pentium processor BIST consists of two parts: hardware self-test and microcode self-test. During the hardware portion of BIST, the microcode ROM and all large PLAs are tested. All possible input combinations of the microcode ROM and PLAs are tested. The constant ROMs, BTB, TLBs, and all caches are tested by the microcode portion of BIST. The array tests (caches, TLBs and BTB) have two passes. On the first pass, data patterns are written to arrays, read back and checked for mismatches. The second pass writes the complement of the initial data pattern, reads it back, and checks for mismatches. The constant ROMs are tested by using the microcode to add various constants and check the result against a stored value. 30/153 MPMC© Pawar Virendra D.
31.
Upon successful completion
of BIST, the cumulative result of all tests are stored in the EAX register. If EAX contains 0h, then all checks passed; any non-zero result indicates a faulty unit Tristate Test Mode When the FLUSH# pin is sampled low when RESET transitions from high to low, the Pentium processor enters tristate test mode. The Pentium processor floats all of its output pins and bidirectional pins including pins which are never floated during normal operation (except TDO). Tristate test mode can be initiated in order to facilitate testing by external circuitry to test board interconnects. The Pentium processor remains in tristate test mode until the RESET pin is asserted again. Functional Redundancy Checking The functional redundancy checking master/checker configuration input is sampled when RESET is high to determine whether the Pentium processor is configured in master mode (FRCMC# high) or checker mode (FRCMC# low). The final master/checker configuration of the Pentium processor is determined the clock before the falling edge of RESET. When configured as a master, the Pentium processor drives its output pins as required by the bus protocol. When configured as a checker, the Pentium processor tristates all outputs (except IERR#, PICD0, PICD1 and TDO) and samples the output pins (that would normally be driven in master mode). If the sampled value differs from the value computed internally, the Pentium processor asserts IERR# to indicate an error. INITIALIZATION WITH RESET, INIT AND BIST Two pins, RESET and INIT, are used to reset the Pentium processor in different manners. A “cold” or “power on” RESET refers to the assertion of RESET while power is initially being applied to the Pentium processor. A “warm” RESET refers to the assertion of RESET or INIT while VCC and CLK remain within specified operating limits. Table 3-1 shows the effect of asserting RESET and/or INIT. Toggling either the RESET pin or the INIT pin individually forces the Pentium processor to begin execution at address FFFFFFF0h. The internal instruction cache and data cache are invalidated when RESET is asserted (modified lines in the data cache are NOT written back). The instruction cache and data cache are not altered when the INIT pin is asserted without RESET. In both cases, the branch target buffer (BTB) and translation lookaside buffers (TLBs) are invalidated. After RESET (with or without BIST) or INIT, the Pentium processor will start executing instructions at location FFFFFFF0H. When the first Intersegment Jump or Call instruction is executed, address lines A20-A31 will be driven low for CS-relative memory cycles and the Pentium processor will only execute 31/153 MPMC© Pawar Virendra D.
32.
instructions in the
lower one Mbyte of physical memory. This allows the system designer to use a ROM at the top of physical memory to initialize the system. RESET is internally hardwired and forces the Pentium processor to terminate all execution and bus cycle activity within 2 clocks. No instruction or bus activity will occur as long as RESET is active. INIT is implemented as an edge triggered interrupt and will be recognized when an instruction boundary is reached. As soon as the Pentium processor completes the INIT sequence, instruction execution and bus cycle activity will continue at address FFFFFFF0h even if the INIT pin is not deasserted. At the conclusion of RESET (with or without self-test) or INIT, the DX register will contain a component identifier. The upper byte will contain 05h and the lower byte will contain a stepping identifier. 32/153 MPMC© Pawar Virendra D.
33.
BUS CYCLES The Pentium
processor bus is designed to support a 528-Mbyte/sec data transfer rate at 66 MHz. All data transfers occur as a result of one or more bus cycles. PHYSICAL MEMORY AND I/O INTERFACE Pentium processor memory is accessible in 8-, 16-, 32-, and 64-bit quantities. Pentium processor I/O is accessible in 8-, 16-, and 32-bit quantities. The Pentium processor can directly address up to 4 Gbytes of physical memory, and up to 64 Kbytes of I/O. In hardware, memory space is organized as a sequence of 64-bit quantities. Each 64-bit location has eight individually addressable bytes at consecutive memory addresses Memory Organization The I/O space is organized as a sequence of 32-bit quantities. Each 32-bit quantity has four individually addressable bytes at consecutive memory addresses. See Figure for a conceptual diagram of the I/O space. I/O Space Organization 33/153 MPMC© Pawar Virendra D.
34.
Sixty-four-bit memories are
organized as arrays of physical quadwords (8-byte words). Physical quadwords begin at addresses evenly divisible by 8. The quadwords are addressable by physical address lines A31-A3. Thirty-two-bit memories are organized as arrays of physical dwords (4-byte words). Physical dwords begin at addresses evenly divisible by 4. The dwords are addressable by physical address lines A31-A3 and A2. A2 can be decoded from the byte enables . Sixteen-bit memories are organized as arrays of physical words (2-byte words). Physical words begin at addresses evenly divisible by 2. DATA TRANSFER MECHANISM All data transfers occur as a result of one or more bus cycles. Logical data operands of byte, word, dword, and quadword lengths may be transferred. Data may be accessed at any byte boundary, but two cycles may be required for misaligned data transfers. The Pentium processor considers a 2-byte or 4-byte operand that crosses a 4-byte boundary to be misaligned. In addition, an 8-byte operand that crosses an 8-byte boundary is misaligned. The Pentium processor address signals are split into two components. High-order address bits are provided by the address lines A31-A3. The byte enables BE7#- BE0# form the low-order address and selects the appropriate byte of the 8-byte data bus. For both memory and I/O accesses, the byte enable outputs indicate which of the associated data bus bytes are driven valid for write cycles and on which bytes data is expected back for read cycles. Non-contiguous byte enable patterns will never occur. Generating A2-A0 from BE7-0# Interfacing With 8-, 16-, 32-, and 64-Bit Memories In 64-bit physical memories such as, each 8-byte quadword begins at a byte address that is a multiple of eight. A31-A3 are used as an 8-byte quadword select and BE7#- BE0# select individual bytes within the word. 34/153 MPMC© Pawar Virendra D.
35.
Pentium® Processor with
64-Bit Memory The Figure shows the Pentium processor data bus interface to 32-, 16- and 8-bit wide memories. External byte swapping logic is needed on the data lines so that data is supplied to and received from the Pentium processor on the correct data pins see Table. For memory widths smaller than 64 bits, byte assembly logic is needed to return all bytes of data requested by the Pentium processor in one cycle. Addressing 32-, 16- and 8-Bit Memories 35/153 MPMC© Pawar Virendra D.
36.
Data Bus Interface
to 32-, 16- and 8-Bit Memories Operand alignment and size dictate when two cycles are required for a data transfer. 36/153 MPMC© Pawar Virendra D.
37.
BUS STATE DEFINITION This
section describes the Pentium processor bus states in detail. See Figure for the bus state diagram. Ti: This is the bus idle state. In this state, no bus cycles are being run. The Pentium processor may or may not be driving the address and status pins, depending on the state of the HLDA,AHOLD, and BOFF# inputs. An asserted BOFF# or RESET will always force the state machine back to this state. HLDA will only be driven in this state. T1: This is the first clock of a bus cycle. Valid address and status are driven out and ADS# is asserted. There is one outstanding bus cycle. T2: This is the second and subsequent clock of the first outstanding bus cycle. In state T2, data is driven out (if the cycle is a write), or data is expected (if the cycle is a read), and the BRDY# pin is sampled. There is one outstanding bus cycle. T12: This state indicates there are two outstanding bus cycles, and that the Pentium processor is starting the second bus cycle at the same time that data is being transferred for the first. In T12, the Pentium processor drives the address and status and asserts ADS# for the second outstanding bus cycle, while data is transferred and BRDY# is sampled for the first outstanding cycle. T2P: This state indicates there are two outstanding bus cycles, and that both are in their second and subsequent clocks. In T2P, data is being transferred and BRDY# is sampled for the first outstanding cycle. The address, status and ADS# for the second outstanding cycle were driven sometime in the past (in state T12). TD: This state indicates there is one outstanding bus cycle, that its address, status and ADS# have already been driven sometime in the past (in state T12), and that the data and BRDY# pins are not being sampled because the data bus requires one dead clock to turn around between consecutive reads and writes, or writes and reads. The Pentium processor enters TD if in the previous clock there were two outstanding cycles, the last BRDY# was returned, and a dead clock is needed. The timing diagrams in the next section give examples when a dead clock is needed. Table gives a brief summary of bus activity during each bus state. Figure shows the Pentium processor bus state diagram. Pentium® Processor Bus Activity 37/153 MPMC© Pawar Virendra D.
38.
Pentium® Processor Bus
Control State Machine 38/153 MPMC© Pawar Virendra D.
39.
BUS CYCLES The Pentium
processor requests data transfer cycles, bus cycles, and bus operations. A data transfer cycle is one data item, up to 8 bytes in width, being returned to the Pentium processor or accepted from the Pentium processor with BRDY# asserted. A bus cycle begins with the Pentium processor driving an address and status and asserting ADS#, and ends when the last BRDY# is returned. A bus cycle may have 1 or 4 data transfers. A burst cycle is a bus cycle with 4 data transfers. A bus operation is a sequence of bus cycles to carry out a specific function, such as a locked read-modify-write or an interrupt acknowledge. Single-Transfer Cycle The Pentium processor supports a number of different types of bus cycles. The simplest type of bus cycle is a single-transfer non-cacheable 64-bit cycle, either with or without wait states. Non-pipelined read and write cycles with 0 wait states are shown in Figure Non Pipelined Read or Write 39/153 MPMC© Pawar Virendra D.
40.
The Pentium processor
initiates a cycle by asserting the address status signal (ADS#) in the first clock. The clock in which ADS# is asserted is by definition the first clock in the bus cycle. The ADS# output indicates that a valid bus cycle definition and address is available on the cycle definition pins and the address bus. The CACHE# output is deasserted (high) to indicate that the cycle will be a single transfer cycle. For a zero wait state transfer, BRDY# is returned by the external system in the second clock of the bus cycle. BRDY# indicates that the external system has presented valid data on the data pins in response to a read or the external system has accepted data in response to a write. The Pentium processor samples the BRDY# input in the second and subsequent clocks of a bus Cycle If the system is not ready to drive or accept data, wait states can be added to these cycles by not returning BRDY# to the processor at the end of the second clock. Cycles of this type, with one and two wait states added are shown in Figure .Note that BRDY# must be driven inactive at the end of the second clock. Burst Cycles For bus cycles that require more than a single data transfer (cacheable cycles and writeback cycles), the Pentium processor uses the burst data transfer. In burst transfers, a new data item can be sampled or driven by the Pentium processor in consecutive clocks. In addition the addresses of the data items in burst cycles all fall within the same 32-byte aligned area (corresponding to an internal Pentium processor cache line). The implementation of burst cycles is via the BRDY# pin. While running a bus cycle of more than one data transfer, the Pentium processor requires that the memory system perform a burst transfer and follow the burst order see Table. Given the first address in the burst sequence, the address of subsequent transfers must be calculated by external hardware. This requirement exists because the Pentium processor address and byte- enables are asserted for the first transfer and are not re-driven for each transfer. The burst sequence is optimized for two bank memory subsystems and is shown in Table Pentium Processor Burst Order 40/153 MPMC© Pawar Virendra D.
41.
BURST READ CYCLES When
initiating any read, the Pentium processor will present the address and byte enables for the data item requested. When the cycle is converted into a cache linefill, the first data item returned should correspond to the address sent out by the Pentium processor; however, the byte enables should be ignored, and valid data must be returned on all 64 data lines. In addition, the address of the subsequent transfers in the burst sequence must be calculated by external hardware since the address and byte enables are not re-driven for each transfer. Figure shows a cacheable burst read cycle. Note that in this case the initial cycle generated by the Pentium processor might have been satisfied by a single data transfer, but was transformed into a multiple-transfer cache fill by KEN# being returned active on the clock that the first BRDY# is returned. In this case KEN# has such an effect because the cycle is internally cacheable in the Pentium processor (CACHE# pin is driven active). KEN# is only sampled once during a cycle to determine cacheability. Basic Burst Read Cycle 41/153 MPMC© Pawar Virendra D.
42.
BURST WRITE CYCLES Figure
shows the timing diagram of basic burst write cycle. KEN# is ignored in burst write cycle. If the CACHE# pin is active (low) during a write cycle, it indicates that the cycle will be a burst writeback cycle. Burst write cycles are always writebacks of modified lines in the data cache. Writeback cycles have several causes: 1. Writeback due to replacement of a modified line in the data cache. 2. Writeback due to an inquire cycle that hits a modified line in the data cache. 3. Writeback due to an internal snoop that hits a modified line in the data cache. 4. Writebacks caused by asserting the FLUSH# pin. 5. Writebacks caused by executing the WBINVD instruction. The only write cycles that are burstable by the Pentium processor are writeback cycles. All other write cycles will be 64 bits or less, single transfer bus cycles. Basic Burst Write Cycle For writeback cycles, the lower five bits of the first burst address always starts at zero; therefore, the burst order becomes 0, 8h, 10h, and 18h. Again, note that the address of the subsequent transfers in the burst sequence must be calculated by external hardware since the Pentium processor does not drive the address and byte enables for each transfer. 42/153 MPMC© Pawar Virendra D.
43.
Locked Operations The Pentium
processor architecture provides a facility to perform atomic accesses of memory. For example, a programmer can change the contents of a memory-based variable and be assured that the variable was not accessed by another bus master between the read of the variable and the update of that variable. This functionality is provided for select instructions using a LOCK prefix, and also for instructions which implicitly perform locked read modify write cycles such as the XCHG (exchange) instruction when one of its operands is memory based. Locked cycles are also generated when a segment descriptor or page table entry is updated and during interrupt acknowledge cycles. In hardware, the LOCK functionality is implemented through the LOCK# pin, which indicates to the outside world that the Pentium processor is performing a read-modify- write sequence of cycles, and that the Pentium processor should be allowed atomic access for the location that was accessed with the first locked cycle. Locked operations begin with a read cycle and end with a write cycle. Note that the data width read is not necessarily the data width written. For example, for descriptor access bit updates the Pentium processor fetches eight bytes and writes one byte. A locked operation is a combination of one or multiple read cycles followed by one or multiple write cycles. Programmer generated locked cycles and locked page table / directory accesses are treated differently and are described in the following sections. Snooping (Inquire) When operating in an MP system, IA-32 processors (beginning with the Intel486 processor) have the ability to snoop other processor’s accesses to system memory and to their internal caches. They use this snooping ability to keep their internal caches consistent both with system memory and with the caches in other processors on the bus. For example, in the Pentium and P6 family processors, if through snooping one processor detects that another processor intends to write to a memory location that it currently has cached in shared state, the snooping processor will invalidate its cache line forcing it to perform a cache line fill the next time it accesses the same memory location. . 43/153 MPMC© Pawar Virendra D.
44.
REGISTER SET
Alternate General Purpose Register Names 44/153 MPMC© Pawar Virendra D.
45.
• I/O ports
— The IA-32 architecture supports a transfers of data to and from input/output (I/O) ports. • Control registers — The five control registers (CR0 through CR4) determine the operating mode of the processor and the characteristics of the currently executing task. • Memory management registers — The GDTR, IDTR, task register, and LDTR specify the locations of data structures used in protected mode memory management. • Debug registers — The debug registers (DR0 through DR7) control and allow monitoring of the processor’s debugging operations. BASIC PROGRAM EXECUTION REGISTERS The processor provides 16 basic program execution registers for use in general system and application programming (see Figure ). These registers can be grouped as follows: • General-purpose registers. These eight registers are available for storing operands and pointers. • Segment registers. These registers hold up to six segment selectors. • EFLAGS (program status and control) register. The EFLAGS register report on the status of the program being executed and allows limited (application-program level) control of the processor. • EIP (instruction pointer) register. The EIP register contains a 32-bit pointer to the next instruction to be executed. • EAX — Accumulator for operands and results data • EBX — Pointer to data in the DS segment • ECX — Counter for string and loop operations • EDX — I/O pointer • ESI — Pointer to data in the segment pointed to by the DS register; source pointer for string operations • EDI — Pointer to data (or destination) in the segment pointed to by the ES register; destination pointer for string operations • ESP — Stack pointer (in the SS segment) • EBP — Pointer to data on the stack (in the SS segment) As shown in Figure 3-5, the lower 16 bits of the general-purpose registers map directly to the register set found in the 8086 and Intel 286 processors and can be referenced with the names AX, BX, CX, DX, BP, SI, DI, and SP. Each of the lower two bytes of the EAX, EBX, ECX, and EDX registers can be referenced by the names AH, BH, CH, and DH (high bytes) and AL, BL, CL, and DL (low bytes). DATA TYPES This chapter introduces data types defined for the IA-32 architecture. FUNDAMENTAL DATA TYPES The fundamental data types of IA-32 architecture are bytes, words, doublewords, quadwords, and double quadwords (see Figure ). A byte is eight bits, a word is 2 bytes 45/153 MPMC© Pawar Virendra D.
46.
(16 bits), a
doubleword is 4 bytes (32 bits), a quadword is 8 bytes (64 bits), and a double quadword is 16 bytes (128 bits). A subset of the IA-32 architecture instructions operates on these fundamental data types without any additional operand typing. Figure shows the byte order of each of the fundamental data types when referenced as operands in memory. The low byte (bits 0 through 7) of each data type occupies the lowest address in memory and that address is also the address of the operand. Bytes, Words, Doublewords, Quadwords, and Double Quadwords in Memory 46/153 MPMC© Pawar Virendra D.
47.
Alignment Words, Doublewords, Quadwords,
and Double Quadwords Words, doublewords, and quadwords do not need to be aligned in memory on natural boundaries. The natural boundaries for words, double words, and quadwords are even- numbered addresses, addresses evenly divisible by four, and addresses evenly divisible by eight, respectively. However, to improve the performance of programs, data structures (especially stacks) should be aligned on natural boundaries whenever possible. The reason for this is that the processor requires two memory accesses to make an unaligned memory access; aligned accesses require only one memory access. A word or doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered unaligned and requires two separate memory bus cycles for access. Some instructions that operate on double quadwords require memory operands to be aligned on a natural boundary. These instructions generate a general-protection exception (#GP) if an unaligned operand is specified. A natural boundary for a double quadword is any address evenly divisible by 16. Other instructions that operate on double quadwords permit unaligned access (without generating a general-protection exception). However, additional memory bus cycles are required to access unaligned data from memory. NUMERIC DATA TYPES Although bytes, words, and doublewords are the fundamental data types of the IA-32 architecture, some instructions support additional interpretations of these data types to allow operations to be performed on numeric data types (signed and unsigned integers, and floating-point numbers). See Figure 47/153 MPMC© Pawar Virendra D.
48.
Numeric Data Types OPERAND
ADDRESSING IA-32 machine-instructions act on zero or more operands. Some operands are specified explicitly and others are implicit. The data for a source operand can be located in: • the instruction itself (an immediate operand) • a register • a memory location • an I/O port When an instruction returns data to a destination operand, it can be returned to: • a register • a memory location • an I/O port Immediate Operands Some instructions use data encoded in the instruction itself as a source operand. These operands are called immediate operands (or simply immediates). For example, the following ADD instruction adds an immediate value of 14 to the contents of the EAX register: ADD EAX, 14 48/153 MPMC© Pawar Virendra D.
49.
All arithmetic instructions
(except the DIV and IDIV instructions) allow the source operand to be an immediate value. The maximum value allowed for an immediate operand varies among instructions, but can never be greater than the maximum value of an unsigned doubleword integer (232). Register Operands Source and destination operands can be any of the following registers, depending on the instruction being executed: • 32-bit general-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, or EBP) • 16-bit general-purpose registers (AX, BX, CX, DX, SI, DI, SP, or BP) • 8-bit general-purpose registers (AH, BH, CH, DH, AL, BL, CL, or DL) • segment registers (CS, DS, SS, ES, FS, and GS) • EFLAGS register • x87 FPU registers (ST0 through ST7, status word, control word, tag word, data operand pointer, and instruction pointer) in a pair Some instructions (such as the DIV and MUL instructions) use quadword operands contained of 32-bit registers. Register pairs are represented with a colon separating them. For example, in the register pair EDX:EAX, EDX contains the high order bits and EAX contains the low order bits of a quadword operand. Several instructions (such as the PUSHFD and POPFD instructions) are provided to load and store the contents of the EFLAGS register or to set or clear individual flags in this register. Other instructions (such as the Jcc instructions) use the state of the status flags in the EFLAGS register as condition codes for branching or other decision making operations. The processor contains a selection of system registers that are used to control memory management, interrupt and exception handling, task management, processor management, and debugging activities. Some of these system registers are accessible by an application program, the operating system, or the executive through a set of system instructions. When accessing a system register with a system instruction, the register is generally an implied operand of the instruction. Memory Operands Source and destination operands in memory are referenced by means of a segment selector and an offset (see Figure). Segment selectors specify the segment containing the operand. Offsets specify the linear or effective address of the operand. Offsets can be 32 bits (represented by the notation m16:32) or 16 bits (represented by the notation m16:16). Memory Operand Address Specifying a Segment Selector The segment selector can be specified either implicitly or explicitly. The most common method of specifying a segment selector is to load it in a segment register and then allow 49/153 MPMC© Pawar Virendra D.
50.
the processor to
select the register implicitly, depending on the type of operation being performed. The processor automatically chooses a segment according to the rules given in Table When storing data in memory or loading data from memory, the DS segment default can be overridden to allow other segments to be accessed. Within an assembler, the segment override is generally handled with a colon “:” operator. For example, the following MOV instruction moves a value from register EAX into the segment pointed to by the ES register. The offset into the segment is contained in the EBX register: MOV ES:[EBX], EAX; Default Segment Selection Rules At the machine level, a segment override is specified with a segment-override prefix, which is a byte placed at the beginning of an instruction. The following default segment selections cannot be overridden: • Instruction fetches must be made from the code segment. • Destination strings in string instructions must be stored in the data segment pointed to by the ES register. • Push and pop operations must always reference the SS segment. Some instructions require a segment selector to be specified explicitly. In these cases, the 16-bit segment selector can be located in a memory location or in a 16-bit register. For example, the following MOV instruction moves a segment selector located in register BX into segment register DS: MOV DS, BX Segment selectors can also be specified explicitly as part of a 48-bit far pointer in memory. Here, the first doubleword in memory contains the offset and the next word contains the segment selector. Specifying an Offset The offset part of a memory address can be specified directly as a static value (called a displacement) or through an address computation made up of one or more of the following components: • Displacement — An 8-, 16-, or 32-bit value. • Base — The value in a general-purpose register. • Index — The value in a general-purpose register. • Scale factor — A value of 2, 4, or 8 that is multiplied by the index value. 50/153 MPMC© Pawar Virendra D.
51.
The offset which
results from adding these components is called an effective address. Each of these components can have either a positive or negative (2s complement) value, with the exception of the scaling factor. Figure 3-11 shows all the possible ways that these components can be combined to create an effective address in the selected segment. Offset (or Effective Address) Computation The uses of general-purpose registers as base or index components are restricted in the following manner: • The ESP register cannot be used as an index register. • When the ESP or EBP register is used as the base, the SS segment is the default segment. In all other cases, the DS segment is the default segment. The base, index, and displacement components can be used in any combination, and any of these components can be null. A scale factor may be used only when an index also is used. Each possible combination is useful for data structures commonly used by programmers in high-level languages and assembly language. The following addressing modes suggest uses for common combinations of address components. • Displacement A displacement alone represents a direct (uncomputed) offset to the operand. Because the displacement is encoded in the instruction, this form of an address is sometimes called an absolute or static address. It is commonly used to access a statically allocated scalar operand. • Base A base alone represents an indirect offset to the operand. Since the value in the base register can change, it can be used for dynamic storage of variables and data structures. • Base + Displacement A base register and a displacement can be used together for two distinct purposes: • As an index into an array when the element size is not 2, 4, or 8 bytes—The displacement component encodes the static offset to the beginning of the array. The base register holds the results of a calculation to determine the offset to a specific element within the array. • To access a field of a record: the base register holds the address of the beginning of the record, while the displacement is a static offset to the field. An important special case of this combination is access to parameters in a procedure activation record. A procedure activation record is the stack frame created when a procedure is entered. Here, the EBP register is the best choice for the base register, 51/153 MPMC© Pawar Virendra D.
52.
because it automatically
selects the stack segment. This is a compact encoding for this common function. • (Index ∗ Scale) + Displacement This address mode offers an efficient way to index into a static array when the element size is 2, 4, or 8 bytes. The displacement locates the beginning of the array, the index register holds the subscript of the desired array element, and the processor automatically converts the subscript into an index by applying the scaling factor. • Base + Index + Displacement Using two registers together supports either a twodimensional array (the displacement holds the address of the beginning of the array) or one of several instances of an array of records (the displacement is an offset to a field within the record). • Base + (Index ∗ Scale) + Displacement Using all the addressing components together allows efficient indexing of a two-dimensional array when the elements of the array are 2, 4, or 8 bytes in size. I/O Port Addressing The processor supports an I/O address space that contains up to 65,536 8-bit I/O ports. Ports that are 16-bit and 32-bit may also be defined in the I/O address space. An I/O port can be addressed with either an immediate operand or a value in the DX register. 52/153 MPMC© Pawar Virendra D.
Baixar agora