SlideShare uma empresa Scribd logo
1 de 153
Baixar para ler offline
Microprocessors
             and
        Microcontrollers
          Third Year BE Computers




                          Pawar Virendra D.
                           Mo. No.:9423582261



1/153         MPMC©           Pawar Virendra D.
Syllabus

EC4813 : Microprocessors and Microcontrollers
Microprocessors and Microcontrollers Prerequisites :
Understanding of Microprocessors, Peripheral Chips, Analogue Sensors, Conversion,
Interfacing Techniques.
Aim : This course covers the design of hardware and software code using a modern
microcontroller. It emphasizes on assembly language programming of the microcontroller
including device drivers, exception and interrupt handling, and interfacing with higher-
level languages.
Objectives:
1. To exhibit knowledge of the architecture of microcontrollers and apply program
control structures to microcontrollers;
2. To develop the ability to use assembly language to program a microcontroller and
demonstrate the capability to program the microcontroller to communicate with external
circuitry using parallel ports;
3. To demonstrate the capability to program the microcontroller to communicate with
external circuitry using serial ports and timer ports.
Unit 1 : Introduction to Pentium microprocessor ( 7 Hrs ) Pentium Microprocessor:
History ,Feature & Architecture, Pin Description , Functional Description Real Mode,
Risc Super Scalar, Pipe lining , Instruction Pairing, Branch Prediction, Inst Data Cache.
FPU
Unit 2 : Bus Cycles and Memory Organization: ( 7 Hrs ) Bus Cycles & Memory
Organisation : Init & Configuration, Bus Operations-RST, Bus Operations-RST, Mem/Io
Organisation, Data Transfer Mechanism , 8/16/32 bit Data Bus I, Programmers Model,
Register Set, Instru Set , Data Types, Instructions
Unit 3 : Protected Mode: ( 6 Hrs ) Protected Mode :Intro Segmentation, Supp Registers
,Rel Int Desc, Mem Man thru Segmentation , Logical to linear translation, protection by
segmentation, Privilege Level protection, related instructions, inter - privilege level
transfer of control, paging-support registers, descriptors ,linear-physical add trans, TLB,
page level protection ,virtual memory
Unit 4 : Multitasking, Interrupts, Exceptions and I/O ( 6 Hrs ) Multitasking,
Interrupts, Exception I/O :Multi Tasking Support Reg , Rel Des, Task Switch I/O per
BitMap, Virtual Mode, Add Gen, Priv Level, Inst &Reg ,enter/Leaving V86 M, Interrupt
Structure Real/Prot V86 Mode, I/O Handling, comparison of 3 modes.
Unit 5 : 8051 Micro controller ( 7 Hrs ) Family Architecture , ,Data / Programme
Memory , Reg set Reg Bank SFR, Ext Data / Mem Programme Mem, Interrupt Structure
, Timer Prog ,Serial Port Prog , Misc Features, Min System
Unit 6 : PIC Micro-Controller ( 7 Hrs ) PIC Micro-Controller :OverView ,Features,
Pin Out, Capture /Compare /Pulse width modulation Mode , Block Dia Prog Model, Rest
/Clocking, Mem Org, Prog/Data, Flash Eprom, Add Mode/Inst Set Prog , I/o, Interrupt ,
Timer, ADC
Outcomes: Upon completion of the course, the student should be able to:




2/153                         MPMC©                             Pawar Virendra D.
1. Describe and use the functional blocks utilized in a basic microcontroller based
system.
2. Describe the programmer's model of the CPU's instruction set and various addressing
modes.
3. Proficiently use the various instruction set and functional groups, when programming.
4. Integrate structured programming techniques and sub-routines into microcontroller
based hardware topologies.
5. Develop I/O port, ADC hardware, and software interfacing techniques.
6. Describe the use of sensors, interfacing, and signal conditioning when utilizing the
microcontroller in control and monitor applications.
 Text Books:
1. Antonakos J., "The Pentium Microprocessor", Pearson Education, 2004, 2nd Edition.
2. Deshmukh A., "Microcontrollers - Theory and Applications", Tata McGraw-Hill,
2004,
Reference Books:
1. Mazidi M., Gillispie J., " The 8051 Microcontroller and embedded systems", Pearson
   education, 2002, ISBN - 81-7808-574-7
2 Intel Pentium Data Sheets
3. Ayala K., "The 8051 Microcontroller", Penram International, 1996, ISBN 81 -900828-
   4-1
4. Intel 8 bit Microcontroller manual
5. Microchip manual for PIC 16CXX and 16FXX




3/153                        MPMC©                            Pawar Virendra D.
INTRODUCTION


16-bit Processors and Segmentation (1978)
The IA-32 architecture family was preceded by 16-bit processors, the 8086 and 8088.
The 8086 has 16-bit registers and a 16-bit external data bus, with 20-bit addressing giving
a 1-MByte address space. The 8088 is similar to the 8086 except it has an 8-bit external
data bus. The 8086/8088 introduced segmentation to the IA-32 architecture. With
segmentation, a 16-bit segment register contains a pointer to a memory segment of up to
64 KBytes. Using four segment registers at a time, 8086/8088 processors are able to
address up to 256 KBytes without switching between segments. The 20-bit addresses that
can be formed using a segment register and an additional 16-bit pointer provide a total
address range of 1 MByte.

The Intel® 286 Processor (1982)
The Intel 286 processor introduced protected mode operation into the IA-32 architecture.
Protected mode uses the segment register content as selectors or pointers into descriptor
tables. Descriptors provide 24-bit base addresses with a physical memory size of up to 16
Mbytes , support for virtual memory management on a segment swapping basis, and a
number of protection mechanisms. These mechanisms include:
• Segment limit checking
• Read-only and execute-only segment options
• Four privilege levels

The Intel386™ Processor (1985)
The Intel386 processor was the first 32-bit processor in the IA-32 architecture family. It
introduced 32-bit registers for use both to hold operands and for addressing. The lower
half of each 32-bit Intel386 register retains the properties of the 16-bit registers of earlier
generations, permitting backward compatibility. The processor also provides a virtual-
8086 mode that allows for even greater efficiency when executing programs created for
8086/8088 processors.
In addition, the Intel386 processor has support for:
• A 32-bit address bus that supports up to 4-GBytes of physical memory
• A segmented-memory model and a flat memory model
• Paging, with a fixed 4-KByte page size providing a method for virtual memory
management
• Support for parallel stages

The Intel486™ Processor (1989)
The Intel486™ processor added more parallel execution capability by expanding the
Intel386 processor’s instruction decode and execution units into five pipelined stages.
Each stage operates in parallel with the others on up to five instructions in different
stages of execution.
In addition, the processor added:
• An 8-KByte on-chip first-level cache that increased the percent of instructions that
could execute at the scalar rate of one per clock



4/153                          MPMC©                               Pawar Virendra D.
• An integrated x87 FPU
• Power saving and system management capabilities

The Intel® Pentium® Processor (1993)
The introduction of the Intel Pentium processor added a second execution pipeline to
achieve superscalar performance (two pipelines, known as u and v, together can execute
two instructions per clock). The on-chip first-level cache doubled, with 8 KBytes devoted
to code and another 8 KBytes devoted to data. The data cache uses the MESI protocol to
support more efficient write-back cache in addition to the write-through cache previously
used by the Intel486 processor. Branch prediction with an on-chip branch table was
added to increase performance in looping constructs.
In addition, the processor added:
• Extensions to make the virtual-8086 mode more efficient and allow for 4-MByte as well
  as 4-KByte pages
• Internal data paths of 128 and 256 bits add speed to internal data transfers
• Burst able external data bus was increased to 64 bits
• An APIC to support systems with multiple processors
• A dual processor mode to support glueless two processor systems


PROCESSOR FEATURES OVERVIEW
The Pentium processor supports the features of previous Intel Architecture processors and
provides significant enhancements including the following:
• Superscalar Architecture
• Dynamic Branch Prediction
• Pipelined Floating-Point Unit
• Improved Instruction Execution Time
• Separate Code and Data Caches.
• Writeback MESI Protocol in the Data Cache
• 64-Bit Data Bus
• Bus Cycle Pipelining
• Address Parity
• Internal Parity Checking
• Functional Redundancy Checking2 and Lock Step operation2
• Execution Tracing
• Performance Monitoring
• IEEE 1149.1 Boundary Scan
• System Management Mode
• Virtual Mode Extensions
• Upgradable with a Pentium OverDrive processor2
• Dual processing support
• Advanced SL Power Management Features
• Fractional Bus Operation
• On-Chip Local APIC Device
• Functional Redundancy Checking and Lock Step operation


5/153                        MPMC©                             Pawar Virendra D.
• Support for the Intel 82498/82493 and 82497/82492 cache chipset products
• Upgradability with a Pentium OverDrive processor
• Split line accesses to the code cache




COMPONENT INTRODUCTION
The application instruction set of the Pentium processor family includes the complete
instruction set of existing Intel Architecture processors to ensure backward compatibility,
with extensions to accommodate the additional functionality of the Pentium processor.
All application software written for the Intel386™ and Intel486™ microprocessors will
run on the Pentium processor without modification. The on-chip memory management
unit (MMU) is completely compatible with the Intel386 and Intel486 CPUs.

The two instruction pipelines and the floating-point unit on the Pentium processor are
capable of independent operation. Each pipeline issues frequently used instructions in a
single clock. Together, the dual pipes can issue two integer instructions in one clock, or
one floating-point instruction (under certain circumstances, 2 floating-point instructions)



6/153                         MPMC©                             Pawar Virendra D.
in one clock. Branch prediction is implemented in the Pentium processor. To support this,
the Pentium processor implements two prefetch buffers, one to prefetch code in a linear
fashion, and one that prefetches code according to the Branch Target Buffer (BTB) so the
needed code is almost always prefetched before it is needed for execution.

The Pentium processor includes separate code and data caches integrated on chip to meet
its performance goals.. The caches on the Pentium processor are each 8 Kbytes in size
and 2-way set-associative. Each cache has a dedicated Translation Lookaside Buffer
(TLB) to translate linear addresses to physical addresses. The Pentium processor data
cache is configurable to be writeback or writethrough on a line-by-line basis and follows
the MESI protocol. The data cache tags are triple ported to support two data transfers and
an inquire cycle in the same clock. The code cache is an inherently write protected cache.
The code cache tags of the Pentium processor are also triple ported to support snooping
and split-line accesses.

The Pentium processor has a 64-bit data bus. Burst read and burst writeback cycles are
supported by the Pentium processor. In addition, bus cycle pipelining has been added to
allow two bus cycles to be in progress simultaneously. The Pentium processor Memory
Management Unit contains optional extensions to the architecture which allow 4 MB
page sizes.

The Pentium processor has added significant data integrity and error detection capability.
Data parity checking is still supported on a byte-by-byte basis. Address parity checking,
and internal parity checking features have been added along with a new exception, the
machine check exception.

The Pentium processor has implemented functional redundancy checking to provide
maximum error detection of the processor and the interface to the processor. When
functional redundancy checking is used, a second processor, the “checker” is used to
execute in lock step with the “master” processor. The checker samples the master’s
outputs and compares those values with the values it computes internally, and asserts an
error signal if a mismatch occurs. The Pentium processor with MMX technology does not
support functional redundancy checking.

As more and more functions are integrated on chip, the complexity of board level testing
is increased. To address this, the Pentium processor has increased test and debug
capability by implementing IEEE Boundary Scan (Standard 1149.1). System
management mode has been implemented along with some extensions to the SMM
architecture.

Enhancements to the Virtual 8086 mode have been made to increase performanceby
reducing the number of times it is necessary to trap to a Virtual 8086 monitor. including
the two instruction pipelines, the “u” pipe and the “v” pipe. The u-pipe can execute all
integer and floating-point instructions. The v-pipe can execute simple integer instructions
and the FXCH floating-point instruction.




7/153                         MPMC©                             Pawar Virendra D.
The separate code and data caches are shown. The data cache has two ports, one for each
of the two pipes (the tags are triple ported to allow simultaneous inquire cycles). The data
cache has a dedicated to translate linear addresses to the physical addresses used by the
data cache.
The code cache, branch target buffer and prefetch buffers are responsible for getting raw
instructions into the execution units of the Pentium processor. Instructions are fetched
from the code cache or from the external bus. Branch addresses are remembered by the
branch target buffer. The code cache TLB translates linear addresses to physical
addresses used by the code cache.
The decode unit contains two parallel decoders which decode and issue up to the next
two sequential instructions into the execution pipeline. The control ROM contains the
microcode which controls the sequence of operations performed by the processor. The
control unit has direct control over both pipelines.

The Pentium processor contains a pipelined floating-point unit that provides a significant
floating-point performance advantage over previous generations of Intel Architecture-
based processors.

The Pentium processor includes features to support multi-processor systems, namely an
on chip Advanced Programmable Interrupt Controller (APIC). This APIC
implementation supports multiprocessor interrupt management (with symmetric interrupt
distribution across all processors), multiple I/O subsystem support, 8259A compatibility,
and inter-processor interrupt support.

The dual processor configuration allows two Pentium processors to share a single L2
cache for a low-cost symmetric multi-processor system. The two processors appear to the
system as a single Pentium processor. Multiprocessor operating systems properly
schedule computing tasks between the two processors. This scheduling of tasks is
transparent to software applications and the end-user. Logic built into the processors
support a “glueless” interface for easy system design. Through a private bus, the two
Pentium processors arbitrate for the external bus and maintain cache coherency. The
Pentium processor can also be used in a conventional multi-processor system in which
one L2 cache is dedicated to each processor.


The Pentium processor is produced on Intel’s advanced silicon technology. The Pentium
processor also includes SL enhanced power management features. When the clock to the
Pentium processor is stopped, power dissipation is virtually eliminated. The low VCC
operating voltages and SL enhanced power management features make the Pentium
processor a good choice for energy-efficient desktop designs.




8/153                         MPMC©                              Pawar Virendra D.
PIN DESCRIPTION
Symbol       Type Name and Function
A31-A3       I/O  As outputs, the address lines of the processor along with the
                  byte enables define the physical area of memory or I/O
                  accessed. The external system drives the inquire address to the
                  processor on A31-A5.
D63-D0       I/O  These are the 64 data lines for the processor. Lines D7-D0
                  define the least significant byte of the data bus; lines D63-D56
                  define the most significant byte of the data bus. When the CPU
                  is driving the data lines, they are driven during the T2, T12, or
                  T2P clocks for that cycle. During reads, the CPU samples the
                  data bus when BRDY# is returned.
ADS#         O    The address status indicates that a new valid bus cycle is
                  currently being driven by the Pentium processor
BE7#-BE5#    O    The byte enable pins are used to determine which bytes must
BE4#-BE0#    I/O  be written to external memory, or which bytes were requested
                  by the CPU for the current cycle. The byte enables are driven
                  in the same clock as the address lines (A31-3).
BOFF#        I    The backoff input is used to abort all outstanding bus cycles
                  that have not yet completed. In response to BOFF#, the
                  Pentium processor will float all pins normally floated during
                  bus hold in the next clock. Theprocessor remains in bus hold
                  until BOFF# is negated, at which time the Pentium processor
                  restarts the aborted bus cycle(s) in their entirety.
BRDY#        I    The burst ready input indicates that the external system has
                  presented valid data on the data pins in response to a read or
                  that the external system has accepted the Pentium processor
                  data in response to a write request. This signal is sampled in the
                  T2, T12 and T2P bus states.
CACHE#       O    For Pentium processor initiated cycles the cache pin indicates
                  internal cacheability of the cycle (if a read), and indicates a
                  burst write back cycle (if a write). If this pin is driven inactive
                  during a read cycle, the Pentium processor will not cache the
                  returned data, regardless of the state of the KEN# pin. This pin
                  is also used to determine the cycle length (number of transfers
                  in the cycle).
CPUTYP       I    CPU type distinguishes the Primary processor from the Dual
                  processor. In a single processor environment, or when the
                  Pentium processor is acting as the Primary processor in a dual
                  processing system, CPUTYP should be strapped to VSS. The
                  Dual processor should have CPUTYP strapped to VCC. For the
                  Pentium OverDrive processor, CPUTYP will be used to
                  determine whether the bootup handshake protocol will be used
                  (in a dual socket system) or not (in a single socket system).
FLUSH#       I    When asserted, the cache flush input forces the Pentium
                  processor to write back all modified lines in the data cache


9/153                       MPMC©                           Pawar Virendra D.
and invalidate its internal caches. A Flush Acknowledge
             special cycle will be generated by the Pentium processor
             indicating completion of the write back and invalidation.
             If FLUSH# is sampled low when RESET transitions from high
             to low, tristate test mode is entered. If two Pentium processor
             are operating in dual processing mode and FLUSH# is asserted,
             the Dual processor will perform a flush first (without a flush
             acknowledge cycle), then the Primary processor will perform a
             flush followed by a flush acknowledge cycle.
             NOTE:
             If the FLUSH# signal is asserted in dual processing mode, it
             must be deasserted at least one clock prior to BRDY# of the
             FLUSH Acknowledge cycle to avoid DP arbitration problems.
FRCMC#   I   The functional redundancy checking master/checker mode
             input is used to determine whether the Pentium processor is
             configured in master mode or checker mode. When configured
             as a master, the Pentium processor drives its output pins as
             required by the bus protocol. When configured as a checker,
             the Pentium processor tristates all outputs (except IERR# and
             TDO) and samples the output pins. The configuration as a
             master/checker is set after RESET and may not be changed
             other than by a subsequent RESET.
HOLD     I   In response to the bus hold request, the Pentium processor
             will float most of its output and input/output pins and assert
             HLDA after completing all outstanding bus cycles. The
             Pentium processor will maintain its bus in this state until
             HOLD is de-asserted. HOLD is not recognized during LOCK
             cycles. The Pentium processor will recognize HOLD during
             reset.
HOLDA    O   The bus hold acknowledge pin goes active in response to a
             hold request driven to the processor on the HOLD pin. It
             indicates that the Pentium processor has floated most of the
             output pins and relinquished the bus to another local bus
             master. When leaving bus hold, HLDA will be driven inactive
             and the Pentium processor will resume driving the bus. If the
             Pentium processor has a bus cycle pending, it will be driven in
             the same clock that HLDA is de-asserted.
INIT     I   The Pentium processor initialization input pin
             forces the Pentium processor to begin execution in a known
             state. The processor state after INIT is the same as the state
             after RESET except that the internal caches, write buffers, and
             floating point registers retain the values they had prior to INIT.
             INIT may NOT be used in lieu of RESET after power-up. If
             INIT is sampled high when RESET transitions from high to
             low, the Pentium processor will perform built-in self test prior
             to the start of program execution.


10/153            MPMC©                              Pawar Virendra D.
INV      I   The invalidation input determines the final cache line state (S
             or I) in case of an inquire cycle hit. It is sampled together with
             the address for the inquire cycle in the clock EADS# is
             sampled active.
KEN#     I   The cache enable pin is used to determine whether the current
             cycle is cacheable or not and is consequently used to determine
             cycle length. When the Pentium processor generates a cycle
             that can be cached (CACHE# asserted) and KEN# is active, the
             cycle will be transformed into a burst line fill cycle.
LOCK#    O   The bus lock pin indicates that the current bus cycle is locked.
             The Pentium processor         will not allow a bus hold when
             LOCK# is asserted (but AHOLD and BOFF# are allowed).
             LOCK# goes active in the first clock of the first locked bus
             cycle and goes inactive after the BRDY# is returned for the last
             locked bus cycle. LOCK# is guaranteed to be de-asserted for at
             least one clock between back-to-back locked cycles.
NA#      I   An active next address input indicates that the external
             memory system is ready to accept a new bus cycle although all
             data transfers for the current cycle have not yet completed. The
             Pentium processor will issue ADS# for a pending cycle two
             clocks after NA# is asserted. The Pentium processor supports
             up to 2 outstanding bus cycles.
RESET    I   RESET forces the Pentium processor to begin execution at a
             known state. All the Pentium processor internal caches will be
             invalidated upon the RESET. Modified lines in the data cache
             are not written back. FLUSH#, FRCMC# and INIT are
             sampled when RESET transitions from high to low to
             determine if tristate test mode or checker mode will be entered,
             or if BIST will be run.




11/153            MPMC©                              Pawar Virendra D.
REAL MODE




RISC

A Complex Instruction Set Computer (CISC) provides a large and powerful range of
instructions, which is less flexible to implement. For example, the 8086 microprocessor
family has these instructions:

          JA        Jump if Above
          JAE       Jump if Above or Equal
          JB        Jump if Below

By contrast, the Reduced Instruction Set Computer (RISC) concept is to identify the sub-
components and use those. As these are much simpler, they can be implemented directly
in silicon, so will run at the maximum possible speed. Nothing is 'translated'

Most modern CISC processors, such as the Pentium, uses a fast RISC core with an
interpreter sitting between the core and the instruction. So when you are running
Windows95 on a PC, it is not that much different to trying to get W95 running on the
software PC emulator. Just imagine the power hidden inside the Pentium... .

This is not to say that CISC processors cannot have a large number of registers, some do.
However for it's use, a typical RISC processor requires more registers to give it additional
flexibility. Gone are the days when you had two general purpose registers and an
'accumulator'.

One thing RISC does offer, though, is register independence

The 8086 offers you fourteen registers, but with caveats:
The first four (A, B, C, and D) are Data registers (a.k.a. scratch-pad registers). They are
16bit and accessed as two 8 bit registers, thus register A is really AH (A, high-order byte)
and AL (A low-order byte). These can be used as general purpose registers, but they can
also have dedicated functions - Accumulator, Base, Count, and Data.

The advantages of RISC against CISC are those today:

   •     RISC processors are much simpler to build, by this again results in the following
         advantages:
            o easier to build, i.e. you can use already existing production facilities
            o much less expensive, just compare the price of a XScale with that of a
                Pentium III at 1 GHz...
            o less power consumption, which again gives two advantages:
                       much longer use of battery driven devices
                       no need for cooling of the device, which again gives to advantages:


12/153                          MPMC©                              Pawar Virendra D.
smaller design of the whole device
                              no noise

RISC processors are much simpler to program which doesn't only help the assembler
programmer, but the compiler designer, too. You'll hardly find any compiler which uses
all the functions of a Pentium III optimally


SUPER SCALAR

A superscalar CPU architecture implements a form of parallelism called instruction
level parallelism within a single processor. It therefore allows faster CPU throughput than
would otherwise be possible at a given clock rate. A superscalar processor executes more
than one instruction during a clock cycle by simultaneously dispatching multiple
instructions to redundant functional units on the processor. Each functional unit is not a
separate CPU core but an execution resource within a single CPU such as an arithmetic
logic unit, a bit shifter, or a multiplier.

While a superscalar CPU is typically also pipelined, pipelining and superscalar
architecture are considered different performance enhancement techniques.

The superscalar technique is traditionally associated with several identifying
characteristics (within a given CPU core):

   •     Instructions are issued from a sequential instruction stream
   •     CPU hardware dynamically checks for data dependencies between instructions at
         run time (versus software checking at compile time)
   •     The CPU accepts multiple instructions per clock cycle

The simplest processors are scalar processors. Each instruction executed by a scalar
processor typically manipulates one or two data items at a time. By contrast, each
instruction executed by a vector processor operates simultaneously on many data items.
An analogy is the difference between scalar and vector arithmetic. A superscalar
processor is sort of a mixture of the two. Each instruction processes one data item, but
there are multiple redundant functional units within each CPU thus multiple instructions
can be processing separate data items concurrently.

Superscalar CPU design emphasizes improving the instruction dispatcher accuracy, and
allowing it to keep the multiple functional units in use at all times. This has become
increasingly important when the number of units increased. While early superscalar
CPUs would have two ALUs and a single FPU, a modern design such as the PowerPC
970 includes four ALUs, two FPUs, and two SIMD units. If the dispatcher is ineffective
at keeping all of these units fed with instructions, the performance of the system will
suffer.




13/153                         MPMC©                               Pawar Virendra D.
A superscalar processor usually sustains an execution rate in excess of one instruction per
machine cycle. But merely processing multiple instructions concurrently does not make
an architecture superscalar, since pipelined, multiprocessor or multi-core architectures
also achieve that, but with different methods.

In a superscalar CPU the dispatcher reads instructions from memory and decides which
ones can be run in parallel, dispatching them to redundant functional units contained
inside a single CPU. Therefore a superscalar processor can be envisioned having multiple
parallel pipelines, each of which is processing instructions simultaneously from a single
instruction thread.

Existing binary executable programs have varying degrees of intrinsic parallelism. In
some cases instructions are not dependent on each other and can be executed
simultaneously. In other cases they are inter-dependent: one instruction impacts either
resources or results of the other. The instructions a = b + c; d = e + f can be run in
parallel because none of the results depend on other calculations. However, the
instructions a = b + c; b = e + f might not be runnable in parallel, depending on the
order in which the instructions complete while they move through the units.

When the number of simultaneously issued instructions increases, the cost of dependency
checking increases extremely rapidly. This is exacerbated by the need to check
dependencies at run time and at the CPU's clock rate. This cost includes additional logic
gates required to implement the checks,




14/153                         MPMC©                              Pawar Virendra D.
PIPELINE AND INSTRUCTION FLOW
The integer instructions traverse a five stage pipeline in the Pentium processor

The pipeline stages are as follows:
PF Prefetch
D1 Instruction Decode
D2 Address Generate
EX Execute - ALU and Cache Access
WB Writeback
The Pentium processor is a superscalar machine, built around two general purpose integer
pipelines and a pipelined floating-point unit capable of executing two instructions in
parallel. Both pipelines operate in parallel allowing integer instructions to execute in a
single clock in each pipeline. Figure depicts instruction flow in the Pentium processor.
The pipelines in the Pentium processor are called the “u” and “v” pipes and the process
of issuing two instructions in parallel is termed “pairing.” The u-pipe can execute any
instruction in the Intel architecture, while the v-pipe can execute “simple” instructions as
defined in the “Instruction Pairing Rules” section of this chapter. When instructions are
paired, the instruction issued to the v-pipe is always the next sequential instruction after
the one issued to the u-pipe.




                        Pentium® Processor Pipeline Execution

The Pentium processor pipeline has been optimized to achieve higher throughput
compared to previous generations of Intel Architecture processors.
The first stage of the pipeline is the Prefetch (PF) stage in which instructions are
prefetched from the on-chip instruction cache or memory. Because the Pentium processor
has separate caches for instructions and data, prefetches do not conflict with data
references for access to the cache. If the requested line is not in the code cache, a memory
reference is made. In the PF stage of the Pentium processor, two independent pairs of
line-size (32-byte) prefetch buffers operate in conjunction with the Branch Target
Buffer. This allows one prefetch buffer to prefetch instructions sequentially, while the
other prefetches according to the branch target buffer predictions. The pipeline stage after



15/153                          MPMC©                             Pawar Virendra D.
the PF stage in the Pentium processor is Decode 1 (D1) in which two parallel decoders
attempt to decode and issue the next two sequential instructions. The decoders determine
whether one or two instructions can be issued contingent upon the “Instruction Pairing
Rules.” The Pentium processor requires an extra D1 clock to decode instruction
prefixes. Prefixes are issued to the u-pipe at the rate of one per clock without pairing.
After all prefixes have been issued, the base instruction will then be issued and paired
according to the pairing rules.
The D1 stage is followed by Decode2 (D2) in which addresses of memory resident
operands are calculated. In instructions containing both a displacement and an immediate,
or instructions containing a base and index addressing mode , The Pentium processor
removes both of these restrictions and is able to issue instructions in these categories in a
single clock.
The Pentium processor uses the Execute (EX) stage of the pipeline for both ALU
operations and for data cache access; therefore those instructions specifying both an ALU
operation and a data cache access will require more than one clock in this stage. In EX all
u-pipe instructions and all v-pipe instructions except conditional branches are verified for
correct branch prediction. Microcode is designed to utilize both pipelines and thus those
instructions requiring microcode execute faster.
The final stage is Writeback (WB) where instructions are enabled to modify processor
state and complete execution. In this stage, v-pipe conditional branches are verified for
correct branch prediction. During their progression through the pipeline, instructions may
be stalled due to certain conditions. Both the u-pipe and v-pipe instructions enter and
leave the D1 and D2 stages in unison. When an instruction in one pipe is stalled, then
the instruction in the other pipe is also stalled at the same pipeline stage. Thus both the u-
pipe and the v-pipe instructions enter the EX stage in unison. Once in EX if the u-pipe
instruction is stalled, then the v-pipe instruction (if any) is also stalled. If the v-pipe
instruction is stalled then the instruction paired with it in the u-pipe is not allowed to
advance. No successive instructions are allowed to enter the EX stage of either pipeline
until the instructions in both pipelines have advanced to WB.

INSTRUCTION PREFETCH
In the Pentium processor PF stage, two independent pairs of line-size (32-byte) prefetch
buffers operate in conjunction with the branch target buffer. Only one prefetch buffer
actively requests prefetches at any given time. Prefetches are requested sequentially until
a branch instruction is fetched. When a branch instruction is fetched, the branch target
buffer (BTB) predicts whether the branch will be taken or not. If the branch is predicted
not taken, prefetch requests continue linearly. On a predicted taken branch the other
prefetch buffer is enabled and begins to prefetch as though the branch was taken. If a
branch is discovered mis-predicted, the instruction pipelines are flushed and prefetching
activity starts over.

Integer Instruction Pairing Rules
The Pentium processor can issue one or two instructions every clock. In order to issue
two instructions simultaneously they must satisfy the following conditions:
• Both instructions in the pair must be “simple” as defined below




16/153                          MPMC©                               Pawar Virendra D.
Simple instructions are entirely hardwired; they do not require any microcode control
and, in general, execute in one clock. The exceptions are the ALU mem, reg and ALU
reg, mem
• There must be no read-after-write or write-after-write register dependencies between
them
• Neither instruction may contain both a displacement and an immediate
• Instructions with prefixes can only occur in the u-pipe.
• Instruction prefixes are treated as separate 1-byte instructions. Sequencing hardware is
used to allow them to function as simple instructions. The following integer instructions
are
considered simple and may be paired:
1. mov reg, reg/mem/imm
2. mov mem, reg/imm
3. alu reg, reg/mem/imm
4. alu mem, reg/imm
5. inc reg/mem
6. dec reg/mem
7. push reg/mem
8. pop reg
9. lea reg,mem
10. jmp/call/jcc near
11. nop
12. test reg, reg/mem
13. test acc, imm

In addition, conditional and unconditional branches may be paired only if they occur as
the second instruction in the pair. They may not be paired with the next sequential
instruction. Also, SHIFT/ROT by 1 and SHIFT by imm may pair as the first instruction
in a pair. The register dependencies that prohibit instruction pairing include implicit
dependencies via registers or flags not explicitly encoded in the instruction. For example,
an ALU instruction in the u-pipe (which sets the flags) may not be paired with an ADC or
an SBB instruction in the v-pipe. There are two exceptions to this rule. The first is the
commonly occurring sequence of compare and branch which may be paired. The second
exception is pairs of pushes or pops. Although these instructions have an implicit
dependency on the stack pointer, special hardware is included to allow these common
operations to proceed in parallel. Although in general two paired instructions may
proceed in parallel independently, there is an exception for paired “read-modify-write”
instructions. Read-modify-write instructions are ALU operations with an operand in
memory. When two of these instructions are paired there is a sequencing delay of two
clocks in addition to the three clocks required to execute the individual instructions.
Although instructions may execute in parallel their behavior as seen by the programmer
is exactly the same as if they were executed sequentially.




17/153                         MPMC©                              Pawar Virendra D.
BRANCH PREDICTION Branch Target Buffer (BTB)

The Pentium processor uses a Branch Target Buffer (BTB) to predict the outcome of
branch instructions which minimizes pipeline stalls due to prefetch delays.
The Pentium processor accesses the BTB with the address of the instruction in the D1
stage. It contains a Branch prediction state machine with four states: (1) strongly not
taken, (2) weakly not taken, (3) weakly taken, and (4) strongly taken. In the event of a
correct prediction, a branch will execute without pipeline stalls or flushes. Branches
which miss the BTB are assumed to be not taken. Conditional and unconditional near
branches and near calls execute in 1 clock and may be executed in parallel with other
integer instructions. A mispredicted branch (whether a BTB hit or miss) or a correctly
predicted branch with the wrong target address will cause the pipelines to be flushed and
the correct target to be fetched. Incorrectly predicted unconditional branches will incur an
additional three clock delay, incorrectly predicted conditional branches in the u-pipe will
incur an additional three clock delay, and incorrectly predicted conditional branches in
the v-pipe will incur an additional four clock delay.
                                  NT                                       H: History
                                  T
         H: 11                                             H: 10           P: Prediction
   T     P: T                                              P: T            T: Taken
                                  T                                        NT: Not Taken

                                                       T              NT
                                                                      T
                                  T
   NT
           H: 00                                           H: 01
   T       P: NT                                           P: T
                                  NT
                                  T
The benefits of branch prediction are illustrated in the following example. Consider the
following loop from a benchmark program for computing prime numbers:
for(k=i+prime;k<=SIZE;k+=prime)
flags[k]=FALSE;
A popular compiler generates the following assembly code:
(prime is allocated to ecx, k is allocated to edx, and al contains the value FALSE)
inner_loop:
mov byte ptr flags[edx],al
add edx,ecx
cmp edx, SIZE
jle inner_loop
Each iteration of this loop will execute in 6 clocks on the Intel486 CPU. On the Pentium
processor, the mov is paired with the add; the cmp with the jle. With branch
prediction, each loop iteration executes in 2 clocks.




18/153                          MPMC©                              Pawar Virendra D.
CACHE

ON-CHIP CACHES
The Pentium processor implements two internal caches for a total integrated cache size of
16 Kbytes: an 8 Kbyte data cache and a separate 8 Kbyte code cache. These caches are
transparent to application software to maintain compatibility with previous The data
cache fully supports the MESI (modified/exclusive/shared/invalid) writeback cache
consistency protocol. The code cache is inherently write protected to prevent code from
being inadvertently corrupted, and as a consequence supports a subset of the MESI
protocol, the S (shared) and I (invalid) states. The caches have been designed for
maximum flexibility and performance. The data cache is configurable as writeback or
writethrough on a line-by-line basis. Memory areas can be defined as non-cacheable by
software and external hardware. Cache writeback and invalidations can be initiated by
hardware or software. Protocols for cache consistency and line replacement are
implemented in hardware, easing system devise On the Pentium processor , each of the
caches are 8 Kbytes in size and each is organized as a 2-way set associative cache. There
are 128 sets in each cache, each set containing 2 lines (each line has its own tag address).
Each cache line is 32 bytes wide. The In the Pentium processor , replacement in both the
data and instruction caches is handled by the LRU mechanism which requires one bit per
set in each of the caches.

Cache Structure
The instruction and data caches can be accessed simultaneously. The instruction cache
can provide up to 32 bytes of raw opcodes and the data cache can provide data for two
data references all in the same clock. This capability is implemented partially through the
tag structure. The tags in the data cache are triple ported. One of the ports is dedicated
to snooping while the other two are used to lookup two independent addresses
corresponding to data references from each of the pipelines. The instruction cache
tags of the Pentium processor are also triple ported. Again, one port is dedicated to
support snooping and the other two ports facilitate split line accesses (simultaneously
accessing upper half of one line and lower half of the next line. Each of the caches are
parity protected. The operating modes of the caches are controlled by the CD (cache
disable) and NW (not writethrough) bits in CR0. TLB (Translation lookaside Buffers).
Each of the caches are accessed with physical addresses and each cache has its own TLB
(translation lookaside buffer) to translate linear addresses to physical addresses. The
TLBs associated with the instruction cache are single ported whereas the data cache
TLBs are fully dual ported to be able to translate two independent linear addresses for
two data references simultaneously.




19/153                          MPMC©                             Pawar Virendra D.
The goal of an effective memory system is that the effective access time that the
processor sees is very close to to, the access time of the cache. Most accesses that the
processor makes to the cache are contained within this level. The achievement of this
goal depends on many factors: the architecture of the processor, the behavioral properties
of the programs being executed, and the size and organization of the cache. Caches work
on the basis of the locality of program behavior. There are three principles involved:

   1. Spatial Locality - Given an access to a particular location in memory, there is a
      high probability that other accesses will be made to either that or neighboring
      locations within the lifetime of the program.
   2. Temporal Locality - This is complementary to spatial locality. Given a sequence
      of references to n locations, there is a high probability that references following
      this sequence will be made into the sequence. Elements of the sequence will again
      be referenced during the lifetime of the program.
   3. Sequentiality- Given that a reference has been made to a particular location s it is
      likely that within the next several references a reference to the location of s + 1
      will be made. Sequentiality is a restricted type of spatial locality and can be
      regarded as a subset of it.

                            Some common terms
Processor reference that are found in the cache are called cache hits. References not
found in the cache are called cache misses. On a cache miss, the cache control
mechanism must fetch the missing data from memory and place it in the cache. Usually
the cache fetches a spatial locality called the line from memory. The physical word is the
basic           unit          of          access           in       the         memory.
The processor-cache interface can be characterized by a number of parameters. Those
that directly affect processor performance include:

   1. Access time for a reference found in the cache (a hit) - property of the cache size
      and organization.
   2. Access time for a reference not found in the cache (a miss) - property of the
      memory organization.
   3. Time to initially compute a real address given a virtual address (not-in-TLB-time)
      - property of the address translation facility, which, though strictly speaking, is
      not part of the cache, resembles the cache in most aspects and is discussed in this
      chapter.




Data Cache Consistency Protocol (MESI Protocol)
The Pentium processor Cache Consistency Protocol is a set of rules by which states are



20/153                         MPMC©                             Pawar Virendra D.
assigned to cached entries (lines). The rules apply for memory read/write cycles only. I/O
and special cycles are not run through the data cache. Every line in the Pentium processor
data cache is assigned a state dependent on both Pentium processor generated activities
and activities generated by other bus masters (snooping). The Pentium processor Data
Cache Protocol consists of four states that define whether a line is valid (HIT/MISS), if it
is available in other caches, and if it has been MODIFIED. The four states are the M
(Modified), E (Exclusive), S (Shared) and the I (Invalid) states and the protocol is
referred to as the MESI protocol. A definition of the states is given below:

M - Modified: An M-state line is available in ONLY one cache and it is also MODIFIED
(different from main memory). An M-state line can be accessed (read/written
to) without sending a cycle out on the bus.
E - Exclusive: An E-state line is also available in ONLY one cache in the system, but the
line is not MODIFIED (i.e., it is the same as main memory). An E-state line can be
accessed (read/written to) without generating a bus cycle. A write to an E-state line will
cause the line to become MODIFIED.
S - Shared: This state indicates that the line is potentially shared with other caches (i.e.
the same line may exist in more than one cache). A read to an S-state line will not
generate bus activity, but a write to a SHARED line will generate a write through cycle
on the bus. The write through cycle may invalidate this line in other caches. A write to an
S-state line will update the cache.
I - Invalid: This state indicates that the line is not available in the cache. A read to this
line will be a MISS and may cause the Pentium processor to execute a LINE FILL (fetch
the whole line into the cache from main memory). A write to an INVALID line will
cause the Pentium processor to execute a write-through
cycle on the bus.


Inquire Cycles (Snooping)
The purpose of inquire cycles is to check whether the address being presented is
contained within the caches in the Pentium processor.

------------------------------------------------------------------------
                         ----------------------




21/153                          MPMC©                              Pawar Virendra D.
Cache Organization
Within the cache, there are three basic types of organization:

   1. Direct Mapped
   2. Fully Associative
   3. Set Associative

In fully associative mapping, when a request is made to the cache, the requested address
is compared in a directory against all entries in the directory. If the requested address is
found (a directory hit), the corresponding location in the cache is fetched and returned to
the processor; otherwise, a miss occurs.




22/153                          MPMC©                             Pawar Virendra D.
Fully Associative Cache


In a direct mapped cache, lower order line address bits are used to access the directory.
Since multiple line addresses map into the same location in the cache directory, the upper
line address bits (tag bits) must be compared with the directory address to ensure a hit. If
a comparison is not valid, the result is a cache miss, or simply a miss. The address given
to the cache by the processor actually is subdivided into several pieces, each of which has
a different role in accessing data.




23/153                          MPMC©                             Pawar Virendra D.
Direct Mapped Cache


The set associative cache operates in a fashion somewhat similar to the direct-mapped
cache. Bits from the line address are used to address a cache directory. However, now
there are multiple choices: two, four, or more complete line addresses may be present in
the directory. Each of these line addresses corresponds to a location in a sub-cache. The
collection of these sub-caches forms the total cache array. In a set associative cache, as in
the direct-maped cache, all of these sub-arrays can be accessed simultaneously, together
with the cache directory. If any of the entries in the cache directory match the reference
address, and there is a hit, the particular sub-cache array is selected and out gated back to
the processor.




                                  Set Associative Cache




24/153                          MPMC©                              Pawar Virendra D.
Cache Calculation
         Tag                      Line / Set                                   Byte/Block


                      Cache                                              Main
                      512 bytes                                          Memory
                                                                         16Kb

2 4 Lines
                      16 Bytes /                                                                210 Lines

                      line                                                                        16 bytes /
                                                                                                  line
                      2 Sets




Line Size                                               = 16               = 24         Byte / Block = 4
Total Number of address lines
to address main memory                                  = 16 Kb            = 214

Total number of lines in Cache                          = 512              = 29
Set or Ways                                             = 2
                                                           512
                                                        =                  = 28
                                                            2

                                                          28
Line or Set Size                                        =   4
                                                                    = 24        Line /Set Size = 4
                                                          2
                                                          Total Number lines in main memory
Tag Size                                                =
                                                                 Total Number of lines
                                                                       in cache set
                                                              10
                                                            2
                                                        =                  = 26       Tag size = 6
                                                            24

214 (Total ) = 2 6 (Tag ) * 2 4 ( Line / Set ) * 2 4 ( Block / Byte)



25/153                                 MPMC©                                      Pawar Virendra D.
THE X87 FPU
FLOATING-POINT UNIT
The floating-point unit (FPU) of the Pentium processor is integrated with the integer unit
on the first five stages of the U pipe line The fifth stage FB becomes X1. It is heavily
pipelined. The FPU is designed to be able to accept one floating point .operation every
clock. It can receive up to two floating-point instructions every clock, one of which must
be an exchange instruction.

Floating-Point Pipeline Stages
The Pentium processor FPU has 8 pipeline stages, the first five of which it shares with
the integer unit. Integer instructions pass through only the first 5 stages. Integer
instructions use the fifth (X1) stage as a WB (write-back) stage. The 8 FP pipeline stages,
and the activities that are performed in them are summarized below:

PF Prefetch;
D1 Instruction Decode;
D2 Address generation;
EX Memory and register read; conversion of FP data to external memory format and
memory write;
X1 Floating-Point Execute stage one; conversion of external memory format to internal
FP data format and write operand to FP register file; bypass 1 (bypass 1 described in the
“Bypasses” section).
X2 Floating-Point Execute stage two;
WF Perform rounding and write floating-point result to register file; bypass 2 (bypass 2
described in the “Bypasses” section).
ER Error Reporting/Update Status Word.

FPU Bypasses

The Pentium processor stack architecture instruction set requires that all instructions have
one source operand on the top of the stack. Since most instructions also have their
destination as the top of the stack, most instructions see a “top of stack bottleneck.” New
source operands must be brought to the top of the stack before we can issue an arithmetic
instruction on them. This calls for extra usage of the exchange instruction, which allows
the programmer to bring an available operand to the top of the stack.

The following section describes the floating-point register file bypasses that exist on the
Pentium processor. The register file has two write ports and two read ports. The read
ports are used to read data out of the register file in the E stage. One write port is used to
write data into the register file in the X1 stage, and the other in the WF stage. A bypass
allows data that is about to be written into the register file to be available as an operand
that is to be read from the register file by any succeeding floating-point instruction. A
bypass is specified by a pair of ports (a write port and a read port) that get circumvented.
Using the bypass, data is made available even before actually writing it to the register
file.


26/153                          MPMC©                               Pawar Virendra D.
The following procedures are implemented:
1. Bypass the X1 stage register file write port and the E stage register file read port.
2. Bypass the WF stage register file write port and the E stage register file read port.
With bypass 1, the result of a floating-point load (that writes to the register file in the X1
stage) can bypass the X1 stage write and be sent directly to the operand fetch stage or E
stage of the next instruction. With bypass 2, the result of any arithmetic operation can
bypass the WF stage write to the register file, and be sent directly to the desired execution
unit as an operand for the next instruction.

PROGRAMMING WITH THE x87 FPU

The x87 Floating-Point Unit (FPU) provides high-performance floating-point processing
capabilities for use in graphics processing, scientific, engineering, and business
applications. It supports the floating-point, integer, and packed BCD integer data types
and the floating-point processing algorithms and exception handling architecture defined
in the IEEE Standard 754 for Binary Floating-Point Arithmetic.

X87 FPU EXECUTION ENVIRONMENT

The x87 FPU represents a separate execution environment within the IA-32. This
execution environment consists of eight data registers (called the x87 FPU data registers)
and the following special-purpose registers:
• Status register
• Control register
• Tag word register
• Last instruction pointer register
• Last data (operand) pointer register
• Opcode register
 These registers are described in the following sections.

x87 FPU Data Registers

The x87 FPU data registers consist of eight 80-bit registers. Values are stored in these
registers in the double extended-precision floating-point format. When floating-point,
integer, or packed BCD integer values are loaded from memory into any of the x87 FPU
data registers, the values are automatically converted into double extended precision
floating-point format (if they are not already in that format). When computation results
are subsequently transferred back into memory from any of the x87 FPU registers, the
results can be left in the double extended-precision floating-point format or converted
back into a shorter floating-point format, an integer format, or the packed BCD integer
format.




27/153                          MPMC©                               Pawar Virendra D.
x87 FPU Execution Environment


The x87 FPU instructions treat the eight x87 FPU data registers as a register stack .All
addressing of the data registers is relative to the register on the top of the stack. The
register number of the current top-of-stack register is stored in the TOP (stack TOP) field
in the x87 FPU status word. Load operations decrement TOP by one and load a value into
the new top of- stack register, and store operations store the value from the current TOP
register in memory and then increment TOP by one. (For the x87 FPU, a load operation is
equivalent to a push and a store operation is equivalent to a pop.) Note that load and store
operations are also available that do not push and pop the stack.




                            x87 FPU Data Register Stack


28/153                          MPMC©                             Pawar Virendra D.
If a load operation is performed when TOP is at 0, register wraparound occurs and the
new value of TOP is set to 7. The floating-point stack-overflow exception indicates when
wraparound might cause an unsaved value to be overwritten
.
Many floating-point instructions have several addressing modes that permit the
programmer to implicitly operate on the top of the stack, or to explicitly operate on
specific registers relative to the TOP. Assemblers support these register addressing
modes, using the expression ST(0), or simply ST, to represent the current stack top and
ST(i) to specify the ith register from TOP in the stack (0 ≤ i ≤ 7). For example, if TOP
contains 011B (register 3 is the top of the stack), the following instruction would add the
contents of two registers in the stack (registers 3 and 5):

FADD ST, ST(2);

Figure shows an example of how the stack structure of the x87 FPU registers and
instructions are typically used to perform a series of computations. Here, a two-
dimensional dot product is computed, as follows:
1. The first instruction (FLD value1) decrements the stack register pointer (TOP) and
    loads the value 5.6 from memory into ST(0). The result of this operation is shown in
    snapshot (a).
2. The second instruction multiplies the value in ST(0) by the value 2.4 from memory and
    stores the result in ST(0), shown in snap-shot (b).
3. The third instruction decrements TOP and loads the value 3.8 in ST(0).
4. The fourth instruction multiplies the value in ST(0) by the value 10.3 from memory
   and stores the result in ST(0), shown in snap-shot (c).
5. The fifth instruction adds the value and the value in ST(1) and stores the result in
   ST(0), shown in snap-shot (d).




                   Example x87 FPU Dot Product Computation


29/153                         MPMC©                              Pawar Virendra D.
MICROPROCESSOR INITIALIZATION AND
CONFIGURATION
Before normal operation of the Pentium processor can begin, the Pentium processor must
be initialized by driving the RESET pin active. The RESET pin forces the Pentium
processor to begin execution in a known state. Several features are optionally invoked at
the falling edge of RESET: Built-in-Self-Test (BIST), Functional Redundancy Checking
and Tristate Test Mode.
In addition to the standard RESET pin, the Pentium processor has implemented an
initialization pin (INIT) that allows the processor to begin execution in a known state
without disrupting the contents of the internal caches or the floating-point state.

POWER UP SPECIFICATIONS
During power up, RESET must be asserted while VCC is approaching nominal operating
voltage to prevent internal bus contention which could negatively affect the reliability of
the processor. It is recommended that CLK begin toggling within 150 ms after VCC
reaches its proper operating level. This recommendation is only to ensure long term
reliability of the device.
In order for RESET to be recognized, the CLK input needs to be toggling. RESET must
remain asserted for 1 millisecond after VCC and CLK have reached their AC/DC
specifications.


TEST AND CONFIGURATION FEATURES (BIST, FRC,
TRISTATE TEST MODE)

The INIT, FLUSH#, and FRCMC# inputs are sampled when RESET transitions from
high to low to determine if BIST will be run, or if tristate test mode or checker mode will
be entered (respectively). If RESET is driven synchronously, these signals must be at
their valid level and meet setup and hold times on the clock before the falling edge of
RESET. If RESET is asserted asynchronously, these signals must be at their valid level
two clocks before and after RESET transitions from high to low.

Built In Self-Test
Self-test is initiated by driving the INIT pin high when RESET transitions from high
to low. No bus cycles are run by the Pentium processor during self test. The duration of
self test is approximately 219 core clocks. Approximately 70% of the devices in the
Pentium processor are tested by BIST. The Pentium processor BIST consists of two parts:
hardware self-test and microcode self-test. During the hardware portion of BIST, the
microcode ROM and all large PLAs are tested. All possible input combinations of the
microcode ROM and PLAs are tested. The constant ROMs, BTB, TLBs, and all caches
are tested by the microcode portion of BIST. The array tests (caches, TLBs and BTB)
have two passes. On the first pass, data patterns are written to arrays, read back and
checked for mismatches. The second pass writes the complement of the initial data
pattern, reads it back, and checks for mismatches. The constant ROMs are tested by using
the microcode to add various constants and check the result against a stored value.


30/153                         MPMC©                              Pawar Virendra D.
Upon successful completion of BIST, the cumulative result of all tests are stored in the
EAX register. If EAX contains 0h, then all checks passed; any non-zero result indicates a
faulty unit

Tristate Test Mode
When the FLUSH# pin is sampled low when RESET transitions from high to low, the
Pentium processor enters tristate test mode. The Pentium processor floats all of its output
pins and bidirectional pins including pins which are never floated during normal
operation (except TDO). Tristate test mode can be initiated in order to facilitate testing by
external circuitry to test board interconnects. The Pentium processor remains in tristate
test mode until the RESET pin is asserted again.

Functional Redundancy Checking
The functional redundancy checking master/checker configuration input is sampled when
RESET is high to determine whether the Pentium processor is configured in master mode
(FRCMC# high) or checker mode (FRCMC# low).                   The final master/checker
configuration of the Pentium processor is determined the clock before the falling edge of
RESET. When configured as a master, the Pentium processor drives its output pins as
required by the bus protocol. When configured as a checker, the Pentium processor
tristates all outputs (except IERR#, PICD0, PICD1 and TDO) and samples the output
pins (that would normally be driven in master mode). If the sampled value differs from
the value computed internally, the Pentium processor asserts IERR# to indicate an error.


INITIALIZATION WITH RESET, INIT AND BIST
Two pins, RESET and INIT, are used to reset the Pentium processor in different manners. A
“cold” or “power on” RESET refers to the assertion of RESET while power is initially being
applied to the Pentium processor. A “warm” RESET refers to the assertion of RESET or INIT
while VCC and CLK remain within specified operating limits.
Table 3-1 shows the effect of asserting RESET and/or INIT.




Toggling either the RESET pin or the INIT pin individually forces the Pentium processor
to begin execution at address FFFFFFF0h. The internal instruction cache and data cache
are invalidated when RESET is asserted (modified lines in the data cache are NOT
written back). The instruction cache and data cache are not altered when the INIT pin is
asserted without RESET. In both cases, the branch target buffer (BTB) and translation
lookaside buffers (TLBs) are invalidated. After RESET (with or without BIST) or INIT,
the Pentium processor will start executing instructions at location FFFFFFF0H. When the
first Intersegment Jump or Call instruction is executed, address lines A20-A31 will be
driven low for CS-relative memory cycles and the Pentium processor will only execute


31/153                          MPMC©                               Pawar Virendra D.
instructions in the lower one Mbyte of physical memory. This allows the system designer
to use a ROM at the top of physical memory to initialize the system. RESET is internally
hardwired and forces the Pentium processor to terminate all execution and bus cycle
activity within 2 clocks. No instruction or bus activity will occur as long as RESET is
active. INIT is implemented as an edge triggered interrupt and will be recognized when
an instruction boundary is reached. As soon as the Pentium processor completes the INIT
sequence, instruction execution and bus cycle activity will continue at address
FFFFFFF0h even if the INIT pin is not deasserted. At the conclusion of RESET (with or
without self-test) or INIT, the DX register will contain a component identifier. The upper
byte will contain 05h and the lower byte will contain a stepping identifier.




32/153                         MPMC©                             Pawar Virendra D.
BUS CYCLES
The Pentium processor bus is designed to support a 528-Mbyte/sec data transfer rate at 66
MHz. All data transfers occur as a result of one or more bus cycles.

PHYSICAL MEMORY AND I/O INTERFACE
Pentium processor memory is accessible in 8-, 16-, 32-, and 64-bit quantities. Pentium
processor I/O is accessible in 8-, 16-, and 32-bit quantities. The Pentium processor can
directly address up to 4 Gbytes of physical memory, and up to 64 Kbytes of I/O.
In hardware, memory space is organized as a sequence of 64-bit quantities. Each 64-bit
location has eight individually addressable bytes at consecutive memory addresses




                                Memory Organization

The I/O space is organized as a sequence of 32-bit quantities. Each 32-bit quantity has
four individually addressable bytes at consecutive memory addresses. See Figure for a
conceptual diagram of the I/O space.




                                I/O Space Organization


33/153                        MPMC©                             Pawar Virendra D.
Sixty-four-bit memories are organized as arrays of physical quadwords (8-byte words).
Physical quadwords begin at addresses evenly divisible by 8. The quadwords are
addressable by physical address lines A31-A3.
Thirty-two-bit memories are organized as arrays of physical dwords (4-byte words).
Physical dwords begin at addresses evenly divisible by 4. The dwords are addressable by
physical address lines A31-A3 and A2. A2 can be decoded from the byte enables .
Sixteen-bit memories are organized as arrays of physical words (2-byte words). Physical
words begin at addresses evenly divisible by 2.

DATA TRANSFER MECHANISM
All data transfers occur as a result of one or more bus cycles. Logical data operands of
byte, word, dword, and quadword lengths may be transferred. Data may be accessed at
any byte boundary, but two cycles may be required for misaligned data transfers. The
Pentium processor considers a 2-byte or 4-byte operand that crosses a 4-byte boundary to
be misaligned. In addition, an 8-byte operand that crosses an 8-byte boundary is
misaligned. The Pentium processor address signals are split into two components.

High-order address bits are provided by the address lines A31-A3. The byte enables
BE7#- BE0# form the low-order address and selects the appropriate byte of the 8-byte
data bus.

For both memory and I/O accesses, the byte enable outputs indicate which of the
associated data bus bytes are driven valid for write cycles and on which bytes data is
expected back for read cycles. Non-contiguous byte enable patterns will never occur.




                           Generating A2-A0 from BE7-0#



Interfacing With 8-, 16-, 32-, and 64-Bit Memories
In 64-bit physical memories such as, each 8-byte quadword begins at a byte address
that is a multiple of eight. A31-A3 are used as an 8-byte quadword select and BE7#-
BE0# select individual bytes within the word.



34/153                        MPMC©                             Pawar Virendra D.
Pentium® Processor with 64-Bit Memory

The Figure shows the Pentium processor data bus interface to 32-, 16- and 8-bit wide
memories. External byte swapping logic is needed on the data lines so that data is
supplied to and received from the Pentium processor on the correct data pins see Table.
For memory widths smaller than 64 bits, byte assembly logic is needed to return all bytes
of data requested by the Pentium processor in one cycle.




                     Addressing 32-, 16- and 8-Bit Memories




35/153                         MPMC©                             Pawar Virendra D.
Data Bus Interface to 32-, 16- and 8-Bit Memories

Operand alignment and size dictate when two cycles are required for a data transfer.




36/153                         MPMC©                             Pawar Virendra D.
BUS STATE DEFINITION
This section describes the Pentium processor bus states in detail. See Figure for the bus
state diagram.
Ti: This is the bus idle state. In this state, no bus cycles are being run. The Pentium
processor may or may not be driving the address and status pins, depending on the state
of the HLDA,AHOLD, and BOFF# inputs. An asserted BOFF# or RESET will always
force the state machine back to this state. HLDA will only be driven in this state.
T1: This is the first clock of a bus cycle. Valid address and status are driven out and
ADS# is asserted. There is one outstanding bus cycle.
T2: This is the second and subsequent clock of the first outstanding bus cycle. In state T2,
data is driven out (if the cycle is a write), or data is expected (if the cycle is a read), and
the BRDY# pin is sampled. There is one outstanding bus cycle.
T12: This state indicates there are two outstanding bus cycles, and that the Pentium
processor is starting the second bus cycle at the same time that data is being transferred
for the first. In T12, the Pentium processor drives the address and status and asserts
ADS# for the second outstanding bus cycle, while data is transferred and BRDY# is
sampled for the first outstanding cycle.
T2P: This state indicates there are two outstanding bus cycles, and that both are in their
second and subsequent clocks. In T2P, data is being transferred and BRDY# is sampled
for the first outstanding cycle. The address, status and ADS# for the second outstanding
cycle were driven sometime in the past (in state T12).
TD: This state indicates there is one outstanding bus cycle, that its address, status and
ADS# have already been driven sometime in the past (in state T12), and that the data and
BRDY# pins are not being sampled because the data bus requires one dead clock to turn
around between consecutive reads and writes, or writes and reads. The Pentium processor
enters TD if in the previous clock there were two outstanding cycles, the last BRDY# was
returned, and a dead clock is needed. The timing diagrams in the next section give
examples when a dead clock is needed.
Table gives a brief summary of bus activity during each bus state. Figure shows the
Pentium processor bus state diagram.




                            Pentium® Processor Bus Activity




37/153                           MPMC©                               Pawar Virendra D.
Pentium® Processor Bus Control State Machine




38/153             MPMC©                        Pawar Virendra D.
BUS CYCLES
The Pentium processor requests data transfer cycles, bus cycles, and bus operations.
A data transfer cycle is one data item, up to 8 bytes in width, being returned to the
Pentium processor or accepted from the Pentium processor with BRDY# asserted. A bus
cycle begins with the Pentium processor driving an address and status and asserting
ADS#, and ends when the last BRDY# is returned. A bus cycle may have 1 or 4 data
transfers. A burst cycle is a bus cycle with 4 data transfers. A bus operation is a sequence
of bus cycles to carry out a specific function, such as a locked read-modify-write or an
interrupt acknowledge.

Single-Transfer Cycle
The Pentium processor supports a number of different types of bus cycles. The simplest
type of bus cycle is a single-transfer non-cacheable 64-bit cycle, either with or without
wait states. Non-pipelined read and write cycles with 0 wait states are shown in Figure




                               Non Pipelined Read or Write




39/153                          MPMC©                             Pawar Virendra D.
The Pentium processor initiates a cycle by asserting the address status signal (ADS#) in
the first clock. The clock in which ADS# is asserted is by definition the first clock in the
bus cycle. The ADS# output indicates that a valid bus cycle definition and address is
available on the cycle definition pins and the address bus. The CACHE# output is
deasserted (high) to indicate that the cycle will be a single transfer cycle.

For a zero wait state transfer, BRDY# is returned by the external system in the second
clock of the bus cycle. BRDY# indicates that the external system has presented valid data
on the data pins in response to a read or the external system has accepted data in response
to a write. The Pentium processor samples the BRDY# input in the second and
subsequent clocks of a bus Cycle

If the system is not ready to drive or accept data, wait states can be added to these cycles
by not returning BRDY# to the processor at the end of the second clock. Cycles of this
type, with one and two wait states added are shown in Figure .Note that BRDY# must be
driven inactive at the end of the second clock.

Burst Cycles
For bus cycles that require more than a single data transfer (cacheable cycles and
writeback cycles), the Pentium processor uses the burst data transfer. In burst transfers, a
new data item can be sampled or driven by the Pentium processor in consecutive clocks.
In addition the addresses of the data items in burst cycles all fall within the same 32-byte
aligned area (corresponding to an internal Pentium processor cache line).

The implementation of burst cycles is via the BRDY# pin. While running a bus cycle of
more than one data transfer, the Pentium processor requires that the memory system
perform a burst transfer and follow the burst order see Table. Given the first address in
the burst sequence, the address of subsequent transfers must be calculated by external
hardware. This requirement exists because the Pentium processor address and byte-
enables are asserted for the first transfer and are not re-driven for each transfer. The burst
sequence is optimized for two bank memory subsystems and is shown in Table




                             Pentium Processor Burst Order




40/153                          MPMC©                               Pawar Virendra D.
BURST READ CYCLES
When initiating any read, the Pentium processor will present the address and byte enables
for the data item requested. When the cycle is converted into a cache linefill, the first data
item returned should correspond to the address sent out by the Pentium processor;
however, the byte enables should be ignored, and valid data must be returned on all 64
data lines. In addition, the address of the subsequent transfers in the burst sequence must
be calculated by external hardware since the address and byte enables are not re-driven
for each transfer.

Figure shows a cacheable burst read cycle. Note that in this case the initial cycle
generated by the Pentium processor might have been satisfied by a single data transfer,
but was transformed into a multiple-transfer cache fill by KEN# being returned active on
the clock that the first BRDY# is returned. In this case KEN# has such an effect because
the cycle is internally cacheable in the Pentium processor (CACHE# pin is driven active).
KEN# is only sampled once during a cycle to determine cacheability.




                                  Basic Burst Read Cycle




41/153                          MPMC©                               Pawar Virendra D.
BURST WRITE CYCLES
Figure shows the timing diagram of basic burst write cycle. KEN# is ignored in burst
write cycle. If the CACHE# pin is active (low) during a write cycle, it indicates that the
cycle will be a burst writeback cycle. Burst write cycles are always writebacks of
modified lines in the data cache. Writeback cycles have several causes:

1. Writeback due to replacement of a modified line in the data cache.
2. Writeback due to an inquire cycle that hits a modified line in the data cache.
3. Writeback due to an internal snoop that hits a modified line in the data cache.
4. Writebacks caused by asserting the FLUSH# pin.
5. Writebacks caused by executing the WBINVD instruction.

The only write cycles that are burstable by the Pentium processor are writeback cycles.
All other write cycles will be 64 bits or less, single transfer bus cycles.




                                 Basic Burst Write Cycle


For writeback cycles, the lower five bits of the first burst address always starts at zero;
therefore, the burst order becomes 0, 8h, 10h, and 18h. Again, note that the address of the
subsequent transfers in the burst sequence must be calculated by external hardware since
the Pentium processor does not drive the address and byte enables for each transfer.




42/153                          MPMC©                              Pawar Virendra D.
Locked Operations
The Pentium processor architecture provides a facility to perform atomic accesses of
memory. For example, a programmer can change the contents of a memory-based
variable and be assured that the variable was not accessed by another bus master between
the read of the variable and the update of that variable. This functionality is provided for
select instructions using a LOCK prefix, and also for instructions which implicitly
perform locked read modify write cycles such as the XCHG (exchange) instruction when
one of its operands is memory based. Locked cycles are also generated when a segment
descriptor or page table entry is updated and during interrupt acknowledge cycles.

In hardware, the LOCK functionality is implemented through the LOCK# pin, which
indicates to the outside world that the Pentium processor is performing a read-modify-
write sequence of cycles, and that the Pentium processor should be allowed atomic
access for the location that was accessed with the first locked cycle. Locked operations
begin with a read cycle and end with a write cycle. Note that the data width read is not
necessarily the data width written. For example, for descriptor access bit updates the
Pentium processor fetches eight bytes and writes one byte.

A locked operation is a combination of one or multiple read cycles followed by one or
multiple write cycles. Programmer generated locked cycles and locked page table /
directory accesses are treated differently and are described in the following sections.
Snooping (Inquire)
When operating in an MP system, IA-32 processors (beginning with the Intel486
processor) have the ability to snoop other processor’s accesses to system memory and
to their internal caches. They use this snooping ability to keep their internal caches
consistent both with system memory and with the caches in other processors on the bus.
For example, in the Pentium and P6 family processors, if through snooping one processor
detects that another processor intends to write to a memory location that it currently has
cached in shared state, the snooping processor will invalidate its cache line forcing it to
perform a cache line fill the next time it accesses the same memory location.

.




43/153                          MPMC©                             Pawar Virendra D.
REGISTER SET




          Alternate General Purpose Register Names




44/153            MPMC©                        Pawar Virendra D.
• I/O ports — The IA-32 architecture supports a transfers of data to and from
input/output (I/O) ports.
• Control registers — The five control registers (CR0 through CR4) determine the
operating mode of the processor and the characteristics of the currently executing task.
• Memory management registers — The GDTR, IDTR, task register, and LDTR
specify the locations of data structures used in protected mode memory management.
• Debug registers — The debug registers (DR0 through DR7) control and allow
monitoring of the processor’s debugging operations.




BASIC PROGRAM EXECUTION REGISTERS
The processor provides 16 basic program execution registers for use in general system
and application programming (see Figure ). These registers can be grouped as follows:
• General-purpose registers. These eight registers are available for storing operands and
pointers.
• Segment registers. These registers hold up to six segment selectors.
• EFLAGS (program status and control) register. The EFLAGS register report on the
status of the program being executed and allows limited (application-program level)
control of the processor.
• EIP (instruction pointer) register. The EIP register contains a 32-bit pointer to the
next instruction to be executed.
• EAX — Accumulator for operands and results data
• EBX — Pointer to data in the DS segment
• ECX — Counter for string and loop operations
• EDX — I/O pointer
• ESI — Pointer to data in the segment pointed to by the DS register; source pointer for
string operations
• EDI — Pointer to data (or destination) in the segment pointed to by the ES register;
destination pointer for string operations
• ESP — Stack pointer (in the SS segment)
• EBP — Pointer to data on the stack (in the SS segment)
As shown in Figure 3-5, the lower 16 bits of the general-purpose registers map directly to
the register set found in the 8086 and Intel 286 processors and can be referenced with the
names AX, BX, CX, DX, BP, SI, DI, and SP. Each of the lower two bytes of the EAX,
EBX, ECX, and EDX registers can be referenced by the names AH, BH, CH, and DH
(high bytes) and AL, BL, CL, and DL (low bytes).


DATA TYPES
This chapter introduces data types defined for the IA-32 architecture.

FUNDAMENTAL DATA TYPES
The fundamental data types of IA-32 architecture are bytes, words, doublewords,
quadwords, and double quadwords (see Figure ). A byte is eight bits, a word is 2 bytes


45/153                         MPMC©                              Pawar Virendra D.
(16 bits), a doubleword is 4 bytes (32 bits), a quadword is 8 bytes (64 bits), and a double
quadword is 16 bytes (128 bits). A subset of the IA-32 architecture instructions operates
on these fundamental data types without any additional operand typing.




Figure shows the byte order of each of the fundamental data types when referenced as
operands in memory. The low byte (bits 0 through 7) of each data type occupies the
lowest address in memory and that address is also the address of the operand.




   Bytes, Words, Doublewords, Quadwords, and Double Quadwords in Memory


46/153                         MPMC©                              Pawar Virendra D.
Alignment
Words, Doublewords, Quadwords, and Double Quadwords

Words, doublewords, and quadwords do not need to be aligned in memory on natural
boundaries. The natural boundaries for words, double words, and quadwords are even-
numbered addresses, addresses evenly divisible by four, and addresses evenly divisible
by eight, respectively. However, to improve the performance of programs, data structures
(especially stacks) should be aligned on natural boundaries whenever possible. The
reason for this is that the processor requires two memory accesses to make an unaligned
memory access; aligned accesses require only one memory access. A word or
doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses
an 8-byte boundary is considered unaligned and requires two separate memory bus cycles
for access.
Some instructions that operate on double quadwords require memory operands to be
aligned on a natural boundary. These instructions generate a general-protection exception
(#GP) if an unaligned operand is specified. A natural boundary for a double quadword is
any address evenly divisible by 16. Other instructions that operate on double quadwords
permit unaligned access (without generating a general-protection exception). However,
additional memory bus cycles are required to access unaligned data from memory.




NUMERIC DATA TYPES
Although bytes, words, and doublewords are the fundamental data types of the IA-32
architecture, some instructions support additional interpretations of these data types to
allow operations to be performed on numeric data types (signed and unsigned integers,
and floating-point numbers). See Figure




47/153                        MPMC©                             Pawar Virendra D.
Numeric Data Types


OPERAND ADDRESSING
IA-32 machine-instructions act on zero or more operands. Some operands are specified
explicitly and others are implicit. The data for a source operand can be located in:
• the instruction itself (an immediate operand)
• a register
• a memory location
• an I/O port

When an instruction returns data to a destination operand, it can be returned to:
• a register

• a memory location
• an I/O port

Immediate Operands
Some instructions use data encoded in the instruction itself as a source operand. These
operands are called immediate operands (or simply immediates). For example, the
following ADD instruction adds an immediate value of 14 to the contents of the EAX
register:
ADD EAX, 14


48/153                          MPMC©                              Pawar Virendra D.
All arithmetic instructions (except the DIV and IDIV instructions) allow the source
operand to be an immediate value. The maximum value allowed for an immediate
operand varies among instructions, but can never be greater than the maximum value of
an unsigned doubleword integer (232).

Register Operands
Source and destination operands can be any of the following registers, depending on the
instruction being executed:
• 32-bit general-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, or EBP)
• 16-bit general-purpose registers (AX, BX, CX, DX, SI, DI, SP, or BP)
• 8-bit general-purpose registers (AH, BH, CH, DH, AL, BL, CL, or DL)
• segment registers (CS, DS, SS, ES, FS, and GS)
• EFLAGS register
• x87 FPU registers (ST0 through ST7, status word, control word, tag word, data operand
         pointer, and instruction pointer)

                                                                                      in a pair
Some instructions (such as the DIV and MUL instructions) use quadword operands contained
of 32-bit registers. Register pairs are represented with a colon separating them. For
example, in the register pair EDX:EAX, EDX contains the high order bits and EAX
contains the low order bits of a quadword operand. Several instructions (such as the
PUSHFD and POPFD instructions) are provided to load and store the contents of the
EFLAGS register or to set or clear individual flags in this register. Other
instructions (such as the Jcc instructions) use the state of the status flags in the EFLAGS
register as condition codes for branching or other decision making operations.
The processor contains a selection of system registers that are used to control memory
management, interrupt and exception handling, task management, processor
management, and debugging activities. Some of these system registers are accessible by
an application program, the operating system, or the executive through a set of system
instructions. When accessing a system register with a system instruction, the register is
generally an implied operand of the instruction.

Memory Operands
Source and destination operands in memory are referenced by means of a segment
selector and an offset (see Figure). Segment selectors specify the segment containing the
operand. Offsets specify the linear or effective address of the operand. Offsets can be 32
bits (represented by the notation m16:32) or 16 bits (represented by the notation m16:16).




                                    Memory Operand Address

Specifying a Segment Selector
The segment selector can be specified either implicitly or explicitly. The most common
method of specifying a segment selector is to load it in a segment register and then allow



49/153                            MPMC©                                 Pawar Virendra D.
the processor to select the register implicitly, depending on the type of operation being
performed. The processor automatically chooses a segment according to the rules given
in Table When storing data in memory or loading data from memory, the DS segment
default can be overridden to allow other segments to be accessed. Within an assembler,
the segment override is generally handled with a colon “:” operator. For example, the
following MOV instruction moves a value from register EAX into the segment pointed to
by the ES register. The offset into the segment is contained in the EBX register:
MOV ES:[EBX], EAX;




                            Default Segment Selection Rules

At the machine level, a segment override is specified with a segment-override prefix,
which is a byte placed at the beginning of an instruction. The following default segment
selections cannot be overridden:
• Instruction fetches must be made from the code segment.
• Destination strings in string instructions must be stored in the data segment pointed to
by
        the ES register.
• Push and pop operations must always reference the SS segment.
Some instructions require a segment selector to be specified explicitly. In these cases, the
16-bit segment selector can be located in a memory location or in a 16-bit register. For
example, the following MOV instruction moves a segment selector located in register BX
into segment register DS:
MOV DS, BX
Segment selectors can also be specified explicitly as part of a 48-bit far pointer in
memory. Here, the first doubleword in memory contains the offset and the next word
contains the segment selector.

Specifying an Offset
The offset part of a memory address can be specified directly as a static value (called a
displacement) or through an address computation made up of one or more of the
following components:
• Displacement — An 8-, 16-, or 32-bit value.
• Base — The value in a general-purpose register.
• Index — The value in a general-purpose register.
• Scale factor — A value of 2, 4, or 8 that is multiplied by the index value.



50/153                          MPMC©                             Pawar Virendra D.
The offset which results from adding these components is called an effective address.
Each of these components can have either a positive or negative (2s complement) value,
with the exception of the scaling factor. Figure 3-11 shows all the possible ways that
these components can be combined to create an effective address in the selected segment.




                      Offset (or Effective Address) Computation


The uses of general-purpose registers as base or index components are restricted in the
following manner:
• The ESP register cannot be used as an index register.
• When the ESP or EBP register is used as the base, the SS segment is the default
segment. In all other cases, the DS segment is the default segment.
The base, index, and displacement components can be used in any combination, and any
of these components can be null. A scale factor may be used only when an index also is
used. Each possible combination is useful for data structures commonly used by
programmers in high-level languages and assembly language. The following addressing
modes suggest uses for common combinations of address components.
• Displacement A displacement alone represents a direct (uncomputed) offset to the
operand. Because the displacement is encoded in the instruction, this form of an address
is sometimes called an absolute or static address. It is commonly used to access a
statically allocated scalar operand.
• Base A base alone represents an indirect offset to the operand. Since the value in the
base register can change, it can be used for dynamic storage of variables and data
structures.
• Base + Displacement A base register and a displacement can be used together for
two distinct purposes:
• As an index into an array when the element size is not 2, 4, or 8 bytes—The
displacement component encodes the static offset to the beginning of the array. The base
register holds the results of a calculation to determine the offset to a specific element
within the array.
• To access a field of a record: the base register holds the address of the beginning of the
record, while the displacement is a static offset to the field.
An important special case of this combination is access to parameters in a procedure
activation record. A procedure activation record is the stack frame created when a
procedure is entered. Here, the EBP register is the best choice for the base register,


51/153                          MPMC©                             Pawar Virendra D.
because it automatically selects the stack segment. This is a compact encoding for this
common function.
• (Index ∗ Scale) + Displacement This address mode offers an efficient way to index
into a static array when the element size is 2, 4, or 8 bytes. The displacement locates the
beginning of the array, the index register holds the subscript of the desired array element,
and the processor automatically converts the subscript into an index by applying the
scaling factor.
• Base + Index + Displacement            Using two registers together supports either a
twodimensional array (the displacement holds the address of the beginning of the array)
or one of several instances of an array of records (the displacement is an offset to a field
within the record).
• Base + (Index ∗ Scale) + Displacement             Using all the addressing components
together allows efficient indexing of a two-dimensional array when the elements of the
array are 2, 4, or 8 bytes in size.

I/O Port Addressing
The processor supports an I/O address space that contains up to 65,536 8-bit I/O ports.
Ports that are 16-bit and 32-bit may also be defined in the I/O address space. An I/O port
can be addressed with either an immediate operand or a value in the DX register.




52/153                          MPMC©                             Pawar Virendra D.
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc
Mpmc

Mais conteúdo relacionado

Mais procurados

Comparison of pentium processor with 80386 and 80486
Comparison of pentium processor with  80386 and 80486Comparison of pentium processor with  80386 and 80486
Comparison of pentium processor with 80386 and 80486Tech_MX
 
Pentium (80586) Microprocessor By Er. Swapnil Kaware
Pentium (80586) Microprocessor By Er. Swapnil KawarePentium (80586) Microprocessor By Er. Swapnil Kaware
Pentium (80586) Microprocessor By Er. Swapnil KawareProf. Swapnil V. Kaware
 
Register of 80386
Register of 80386Register of 80386
Register of 80386aviban
 
Evolution of microprocessors and 80486 Microprocessor.
Evolution of microprocessors and 80486 Microprocessor.Evolution of microprocessors and 80486 Microprocessor.
Evolution of microprocessors and 80486 Microprocessor.Ritwik MG
 
80286 microprocessor
80286 microprocessor80286 microprocessor
80286 microprocessorAvin Mathew
 
Computer architecture the pentium architecture
Computer architecture the pentium architectureComputer architecture the pentium architecture
Computer architecture the pentium architectureMazin Alwaaly
 
Introduction to Microprocessors
Introduction to MicroprocessorsIntroduction to Microprocessors
Introduction to MicroprocessorsSeble Nigussie
 
Chapter 1 microprocessor introduction
Chapter 1 microprocessor introductionChapter 1 microprocessor introduction
Chapter 1 microprocessor introductionShubham Singh
 
Module 4 advanced microprocessors
Module 4 advanced microprocessorsModule 4 advanced microprocessors
Module 4 advanced microprocessorsDeepak John
 

Mais procurados (18)

Pentium
PentiumPentium
Pentium
 
80486 and pentium
80486 and pentium80486 and pentium
80486 and pentium
 
Pentinum 2
Pentinum 2Pentinum 2
Pentinum 2
 
Comparison of pentium processor with 80386 and 80486
Comparison of pentium processor with  80386 and 80486Comparison of pentium processor with  80386 and 80486
Comparison of pentium processor with 80386 and 80486
 
Pentium (80586) Microprocessor By Er. Swapnil Kaware
Pentium (80586) Microprocessor By Er. Swapnil KawarePentium (80586) Microprocessor By Er. Swapnil Kaware
Pentium (80586) Microprocessor By Er. Swapnil Kaware
 
80486
8048680486
80486
 
Register of 80386
Register of 80386Register of 80386
Register of 80386
 
Intel 80486 Microprocessor
Intel 80486 MicroprocessorIntel 80486 Microprocessor
Intel 80486 Microprocessor
 
Advanced microprocessor
Advanced microprocessorAdvanced microprocessor
Advanced microprocessor
 
Evolution of microprocessors and 80486 Microprocessor.
Evolution of microprocessors and 80486 Microprocessor.Evolution of microprocessors and 80486 Microprocessor.
Evolution of microprocessors and 80486 Microprocessor.
 
80286 microprocessor
80286 microprocessor80286 microprocessor
80286 microprocessor
 
Computer architecture the pentium architecture
Computer architecture the pentium architectureComputer architecture the pentium architecture
Computer architecture the pentium architecture
 
Architecture of pentium family
Architecture of pentium familyArchitecture of pentium family
Architecture of pentium family
 
Pentium 8086 Instruction Format
Pentium 8086 Instruction FormatPentium 8086 Instruction Format
Pentium 8086 Instruction Format
 
Al2ed chapter3
Al2ed chapter3Al2ed chapter3
Al2ed chapter3
 
Introduction to Microprocessors
Introduction to MicroprocessorsIntroduction to Microprocessors
Introduction to Microprocessors
 
Chapter 1 microprocessor introduction
Chapter 1 microprocessor introductionChapter 1 microprocessor introduction
Chapter 1 microprocessor introduction
 
Module 4 advanced microprocessors
Module 4 advanced microprocessorsModule 4 advanced microprocessors
Module 4 advanced microprocessors
 

Destaque

mpmc (Microprocessor and microcontroller) notes
mpmc (Microprocessor and microcontroller) notesmpmc (Microprocessor and microcontroller) notes
mpmc (Microprocessor and microcontroller) notesNexus
 
Microprocessor and Microcontroller Lab Manual
Microprocessor and Microcontroller Lab ManualMicroprocessor and Microcontroller Lab Manual
Microprocessor and Microcontroller Lab ManualSanthosh Kumar
 
Microprocessors and microcontrollers short answer questions and answers
Microprocessors and microcontrollers short answer questions and answersMicroprocessors and microcontrollers short answer questions and answers
Microprocessors and microcontrollers short answer questions and answersAbhijith Augustine
 
8086 class notes-Y.N.M
8086 class notes-Y.N.M8086 class notes-Y.N.M
8086 class notes-Y.N.MDr.YNM
 
8051 Microcontroller Notes
8051 Microcontroller Notes8051 Microcontroller Notes
8051 Microcontroller NotesDr.YNM
 
8086 instruction set with types
8086 instruction set with types8086 instruction set with types
8086 instruction set with typesRavinder Rautela
 
Microwave Engineering Lecture Notes
Microwave Engineering Lecture NotesMicrowave Engineering Lecture Notes
Microwave Engineering Lecture NotesFellowBuddy.com
 
microprocessor Questions with solution
microprocessor Questions with solutionmicroprocessor Questions with solution
microprocessor Questions with solutiondit
 
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015vtunotesbysree
 
Microwave devices and circuits (samuel liao)
Microwave devices and circuits (samuel liao)Microwave devices and circuits (samuel liao)
Microwave devices and circuits (samuel liao)Sudhanshu Tripathi
 
Microwave engineering full
Microwave engineering fullMicrowave engineering full
Microwave engineering fulllieulieuw
 
Fundamentals of Power System protection by Y.G.Paithankar and S.R.Bhide
Fundamentals of Power System protection by Y.G.Paithankar and S.R.BhideFundamentals of Power System protection by Y.G.Paithankar and S.R.Bhide
Fundamentals of Power System protection by Y.G.Paithankar and S.R.BhideSourabh Ghosh
 
Digital Communication Notes written by Arun Kumar G, Associate Professor, Dep...
Digital Communication Notes written by Arun Kumar G, Associate Professor, Dep...Digital Communication Notes written by Arun Kumar G, Associate Professor, Dep...
Digital Communication Notes written by Arun Kumar G, Associate Professor, Dep...Arunkumar Gowdru
 

Destaque (20)

mpmc (Microprocessor and microcontroller) notes
mpmc (Microprocessor and microcontroller) notesmpmc (Microprocessor and microcontroller) notes
mpmc (Microprocessor and microcontroller) notes
 
Unit 5
Unit 5Unit 5
Unit 5
 
Microprocessor and Microcontroller Lab Manual
Microprocessor and Microcontroller Lab ManualMicroprocessor and Microcontroller Lab Manual
Microprocessor and Microcontroller Lab Manual
 
MPMC Unit-1
MPMC Unit-1MPMC Unit-1
MPMC Unit-1
 
Mpmc lab
Mpmc labMpmc lab
Mpmc lab
 
Microprocessors and microcontrollers short answer questions and answers
Microprocessors and microcontrollers short answer questions and answersMicroprocessors and microcontrollers short answer questions and answers
Microprocessors and microcontrollers short answer questions and answers
 
8086 class notes-Y.N.M
8086 class notes-Y.N.M8086 class notes-Y.N.M
8086 class notes-Y.N.M
 
8051 Microcontroller Notes
8051 Microcontroller Notes8051 Microcontroller Notes
8051 Microcontroller Notes
 
8086 instruction set with types
8086 instruction set with types8086 instruction set with types
8086 instruction set with types
 
Microwave Engineering Lecture Notes
Microwave Engineering Lecture NotesMicrowave Engineering Lecture Notes
Microwave Engineering Lecture Notes
 
Mp &mc programs
Mp &mc programsMp &mc programs
Mp &mc programs
 
microprocessor Questions with solution
microprocessor Questions with solutionmicroprocessor Questions with solution
microprocessor Questions with solution
 
Networking - Everything That You Wanted to Know
Networking - Everything That You Wanted to KnowNetworking - Everything That You Wanted to Know
Networking - Everything That You Wanted to Know
 
Microprocessors 1-8086
Microprocessors 1-8086Microprocessors 1-8086
Microprocessors 1-8086
 
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
VTU 4TH SEM CSE MICROPROCESSORS SOLVED PAPERS OF JUNE-2014 & JUNE-2015
 
Microwave devices and circuits (samuel liao)
Microwave devices and circuits (samuel liao)Microwave devices and circuits (samuel liao)
Microwave devices and circuits (samuel liao)
 
Microwave engineering full
Microwave engineering fullMicrowave engineering full
Microwave engineering full
 
Fundamentals of Power System protection by Y.G.Paithankar and S.R.Bhide
Fundamentals of Power System protection by Y.G.Paithankar and S.R.BhideFundamentals of Power System protection by Y.G.Paithankar and S.R.Bhide
Fundamentals of Power System protection by Y.G.Paithankar and S.R.Bhide
 
Embedded system ppt
Embedded system pptEmbedded system ppt
Embedded system ppt
 
Digital Communication Notes written by Arun Kumar G, Associate Professor, Dep...
Digital Communication Notes written by Arun Kumar G, Associate Professor, Dep...Digital Communication Notes written by Arun Kumar G, Associate Professor, Dep...
Digital Communication Notes written by Arun Kumar G, Associate Professor, Dep...
 

Semelhante a Mpmc

microprocessor unit1 2022.pptx
microprocessor unit1 2022.pptxmicroprocessor unit1 2022.pptx
microprocessor unit1 2022.pptx22X041SARAVANANS
 
EC 8691 Microprocessor and Microcontroller.pptx
EC 8691 Microprocessor and Microcontroller.pptxEC 8691 Microprocessor and Microcontroller.pptx
EC 8691 Microprocessor and Microcontroller.pptxGobinathAECEJRF1101
 
PENTIUM - PRO MICROPROCESSORS MP SY.pptx
PENTIUM - PRO MICROPROCESSORS MP SY.pptxPENTIUM - PRO MICROPROCESSORS MP SY.pptx
PENTIUM - PRO MICROPROCESSORS MP SY.pptxSanjayBhosale20
 
Microprocessor Unit -1 SE computer-II.pptx
Microprocessor  Unit -1 SE computer-II.pptxMicroprocessor  Unit -1 SE computer-II.pptx
Microprocessor Unit -1 SE computer-II.pptxakshathsingh2003
 
VJITSk 6713 user manual
VJITSk 6713 user manualVJITSk 6713 user manual
VJITSk 6713 user manualkot seelam
 
PIC MICROCONTROLLERS -CLASS NOTES
PIC MICROCONTROLLERS -CLASS NOTESPIC MICROCONTROLLERS -CLASS NOTES
PIC MICROCONTROLLERS -CLASS NOTESDr.YNM
 
Evolution Of Microprocessors
Evolution Of MicroprocessorsEvolution Of Microprocessors
Evolution Of Microprocessorsharinder
 
Evolution of microprocessors
Evolution of microprocessorsEvolution of microprocessors
Evolution of microprocessorsharinder
 
Microprocessors & Microcomputers Lecture Notes
Microprocessors & Microcomputers Lecture NotesMicroprocessors & Microcomputers Lecture Notes
Microprocessors & Microcomputers Lecture NotesFellowBuddy.com
 
introduction of microprocessor
introduction of microprocessorintroduction of microprocessor
introduction of microprocessorReetika Singh
 
Describr the features of pentium microppr
Describr the features of pentium micropprDescribr the features of pentium microppr
Describr the features of pentium microppredwardkiwalabye1
 
Solution manual the 8051 microcontroller based embedded systems
Solution manual the 8051 microcontroller based embedded systemsSolution manual the 8051 microcontroller based embedded systems
Solution manual the 8051 microcontroller based embedded systemsmanishpatel_79
 
I. Introduction to Microprocessor System.ppt
I. Introduction to Microprocessor System.pptI. Introduction to Microprocessor System.ppt
I. Introduction to Microprocessor System.pptHAriesOa1
 
Microprocessors and Applications
Microprocessors and ApplicationsMicroprocessors and Applications
Microprocessors and Applicationsrachurivlsi
 

Semelhante a Mpmc (20)

Pentium
PentiumPentium
Pentium
 
microprocessor unit1 2022.pptx
microprocessor unit1 2022.pptxmicroprocessor unit1 2022.pptx
microprocessor unit1 2022.pptx
 
EC 8691 Microprocessor and Microcontroller.pptx
EC 8691 Microprocessor and Microcontroller.pptxEC 8691 Microprocessor and Microcontroller.pptx
EC 8691 Microprocessor and Microcontroller.pptx
 
PENTIUM - PRO MICROPROCESSORS MP SY.pptx
PENTIUM - PRO MICROPROCESSORS MP SY.pptxPENTIUM - PRO MICROPROCESSORS MP SY.pptx
PENTIUM - PRO MICROPROCESSORS MP SY.pptx
 
Microprocessor Unit -1 SE computer-II.pptx
Microprocessor  Unit -1 SE computer-II.pptxMicroprocessor  Unit -1 SE computer-II.pptx
Microprocessor Unit -1 SE computer-II.pptx
 
80386 & 80486
80386 & 8048680386 & 80486
80386 & 80486
 
VJITSk 6713 user manual
VJITSk 6713 user manualVJITSk 6713 user manual
VJITSk 6713 user manual
 
PIC MICROCONTROLLERS -CLASS NOTES
PIC MICROCONTROLLERS -CLASS NOTESPIC MICROCONTROLLERS -CLASS NOTES
PIC MICROCONTROLLERS -CLASS NOTES
 
Evolution Of Microprocessors
Evolution Of MicroprocessorsEvolution Of Microprocessors
Evolution Of Microprocessors
 
Evolution of microprocessors
Evolution of microprocessorsEvolution of microprocessors
Evolution of microprocessors
 
U I - 4. 80386 Real mode.pptx
U I - 4. 80386 Real mode.pptxU I - 4. 80386 Real mode.pptx
U I - 4. 80386 Real mode.pptx
 
Microprocessors & Microcomputers Lecture Notes
Microprocessors & Microcomputers Lecture NotesMicroprocessors & Microcomputers Lecture Notes
Microprocessors & Microcomputers Lecture Notes
 
80286 microprocessors
80286 microprocessors80286 microprocessors
80286 microprocessors
 
introduction of microprocessor
introduction of microprocessorintroduction of microprocessor
introduction of microprocessor
 
Doc32002
Doc32002Doc32002
Doc32002
 
Describr the features of pentium microppr
Describr the features of pentium micropprDescribr the features of pentium microppr
Describr the features of pentium microppr
 
Solution manual the 8051 microcontroller based embedded systems
Solution manual the 8051 microcontroller based embedded systemsSolution manual the 8051 microcontroller based embedded systems
Solution manual the 8051 microcontroller based embedded systems
 
I. Introduction to Microprocessor System.ppt
I. Introduction to Microprocessor System.pptI. Introduction to Microprocessor System.ppt
I. Introduction to Microprocessor System.ppt
 
Microprocessors and Applications
Microprocessors and ApplicationsMicroprocessors and Applications
Microprocessors and Applications
 
EEE226a.ppt
EEE226a.pptEEE226a.ppt
EEE226a.ppt
 

Mais de Akshay Nagpurkar (20)

4.osi model
4.osi model4.osi model
4.osi model
 
L6 mecse ncc
L6 mecse nccL6 mecse ncc
L6 mecse ncc
 
Tcp ip
Tcp ipTcp ip
Tcp ip
 
1 ip address
1 ip address1 ip address
1 ip address
 
1.network topology
1.network topology1.network topology
1.network topology
 
1.lan man wan
1.lan man wan1.lan man wan
1.lan man wan
 
Dcunit4 transmission media
Dcunit4 transmission mediaDcunit4 transmission media
Dcunit4 transmission media
 
Ppl for students unit 4 and 5
Ppl for students unit 4 and 5Ppl for students unit 4 and 5
Ppl for students unit 4 and 5
 
Ppl for students unit 1,2 and 3
Ppl for students unit 1,2 and 3Ppl for students unit 1,2 and 3
Ppl for students unit 1,2 and 3
 
Ppl for students unit 4 and 5
Ppl for students unit 4 and 5Ppl for students unit 4 and 5
Ppl for students unit 4 and 5
 
234 rb trees2x2
234 rb trees2x2234 rb trees2x2
234 rb trees2x2
 
Ppl home assignment_unit4
Ppl home assignment_unit4Ppl home assignment_unit4
Ppl home assignment_unit4
 
Ppl home assignment_unit5
Ppl home assignment_unit5Ppl home assignment_unit5
Ppl home assignment_unit5
 
3 multiplexing-wdm
3 multiplexing-wdm3 multiplexing-wdm
3 multiplexing-wdm
 
2 multiplexing
2 multiplexing2 multiplexing
2 multiplexing
 
1 multiplexing
1 multiplexing1 multiplexing
1 multiplexing
 
Pcm pulse codemodulation-2
Pcm pulse codemodulation-2Pcm pulse codemodulation-2
Pcm pulse codemodulation-2
 
Modulation techniq of modem
Modulation techniq of modemModulation techniq of modem
Modulation techniq of modem
 
Ppl home assignment_unit3
Ppl home assignment_unit3Ppl home assignment_unit3
Ppl home assignment_unit3
 
Ppl home assignment_unit2
Ppl home assignment_unit2Ppl home assignment_unit2
Ppl home assignment_unit2
 

Último

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Último (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Mpmc

  • 1. Microprocessors and Microcontrollers Third Year BE Computers Pawar Virendra D. Mo. No.:9423582261 1/153 MPMC© Pawar Virendra D.
  • 2. Syllabus EC4813 : Microprocessors and Microcontrollers Microprocessors and Microcontrollers Prerequisites : Understanding of Microprocessors, Peripheral Chips, Analogue Sensors, Conversion, Interfacing Techniques. Aim : This course covers the design of hardware and software code using a modern microcontroller. It emphasizes on assembly language programming of the microcontroller including device drivers, exception and interrupt handling, and interfacing with higher- level languages. Objectives: 1. To exhibit knowledge of the architecture of microcontrollers and apply program control structures to microcontrollers; 2. To develop the ability to use assembly language to program a microcontroller and demonstrate the capability to program the microcontroller to communicate with external circuitry using parallel ports; 3. To demonstrate the capability to program the microcontroller to communicate with external circuitry using serial ports and timer ports. Unit 1 : Introduction to Pentium microprocessor ( 7 Hrs ) Pentium Microprocessor: History ,Feature & Architecture, Pin Description , Functional Description Real Mode, Risc Super Scalar, Pipe lining , Instruction Pairing, Branch Prediction, Inst Data Cache. FPU Unit 2 : Bus Cycles and Memory Organization: ( 7 Hrs ) Bus Cycles & Memory Organisation : Init & Configuration, Bus Operations-RST, Bus Operations-RST, Mem/Io Organisation, Data Transfer Mechanism , 8/16/32 bit Data Bus I, Programmers Model, Register Set, Instru Set , Data Types, Instructions Unit 3 : Protected Mode: ( 6 Hrs ) Protected Mode :Intro Segmentation, Supp Registers ,Rel Int Desc, Mem Man thru Segmentation , Logical to linear translation, protection by segmentation, Privilege Level protection, related instructions, inter - privilege level transfer of control, paging-support registers, descriptors ,linear-physical add trans, TLB, page level protection ,virtual memory Unit 4 : Multitasking, Interrupts, Exceptions and I/O ( 6 Hrs ) Multitasking, Interrupts, Exception I/O :Multi Tasking Support Reg , Rel Des, Task Switch I/O per BitMap, Virtual Mode, Add Gen, Priv Level, Inst &Reg ,enter/Leaving V86 M, Interrupt Structure Real/Prot V86 Mode, I/O Handling, comparison of 3 modes. Unit 5 : 8051 Micro controller ( 7 Hrs ) Family Architecture , ,Data / Programme Memory , Reg set Reg Bank SFR, Ext Data / Mem Programme Mem, Interrupt Structure , Timer Prog ,Serial Port Prog , Misc Features, Min System Unit 6 : PIC Micro-Controller ( 7 Hrs ) PIC Micro-Controller :OverView ,Features, Pin Out, Capture /Compare /Pulse width modulation Mode , Block Dia Prog Model, Rest /Clocking, Mem Org, Prog/Data, Flash Eprom, Add Mode/Inst Set Prog , I/o, Interrupt , Timer, ADC Outcomes: Upon completion of the course, the student should be able to: 2/153 MPMC© Pawar Virendra D.
  • 3. 1. Describe and use the functional blocks utilized in a basic microcontroller based system. 2. Describe the programmer's model of the CPU's instruction set and various addressing modes. 3. Proficiently use the various instruction set and functional groups, when programming. 4. Integrate structured programming techniques and sub-routines into microcontroller based hardware topologies. 5. Develop I/O port, ADC hardware, and software interfacing techniques. 6. Describe the use of sensors, interfacing, and signal conditioning when utilizing the microcontroller in control and monitor applications. Text Books: 1. Antonakos J., "The Pentium Microprocessor", Pearson Education, 2004, 2nd Edition. 2. Deshmukh A., "Microcontrollers - Theory and Applications", Tata McGraw-Hill, 2004, Reference Books: 1. Mazidi M., Gillispie J., " The 8051 Microcontroller and embedded systems", Pearson education, 2002, ISBN - 81-7808-574-7 2 Intel Pentium Data Sheets 3. Ayala K., "The 8051 Microcontroller", Penram International, 1996, ISBN 81 -900828- 4-1 4. Intel 8 bit Microcontroller manual 5. Microchip manual for PIC 16CXX and 16FXX 3/153 MPMC© Pawar Virendra D.
  • 4. INTRODUCTION 16-bit Processors and Segmentation (1978) The IA-32 architecture family was preceded by 16-bit processors, the 8086 and 8088. The 8086 has 16-bit registers and a 16-bit external data bus, with 20-bit addressing giving a 1-MByte address space. The 8088 is similar to the 8086 except it has an 8-bit external data bus. The 8086/8088 introduced segmentation to the IA-32 architecture. With segmentation, a 16-bit segment register contains a pointer to a memory segment of up to 64 KBytes. Using four segment registers at a time, 8086/8088 processors are able to address up to 256 KBytes without switching between segments. The 20-bit addresses that can be formed using a segment register and an additional 16-bit pointer provide a total address range of 1 MByte. The Intel® 286 Processor (1982) The Intel 286 processor introduced protected mode operation into the IA-32 architecture. Protected mode uses the segment register content as selectors or pointers into descriptor tables. Descriptors provide 24-bit base addresses with a physical memory size of up to 16 Mbytes , support for virtual memory management on a segment swapping basis, and a number of protection mechanisms. These mechanisms include: • Segment limit checking • Read-only and execute-only segment options • Four privilege levels The Intel386™ Processor (1985) The Intel386 processor was the first 32-bit processor in the IA-32 architecture family. It introduced 32-bit registers for use both to hold operands and for addressing. The lower half of each 32-bit Intel386 register retains the properties of the 16-bit registers of earlier generations, permitting backward compatibility. The processor also provides a virtual- 8086 mode that allows for even greater efficiency when executing programs created for 8086/8088 processors. In addition, the Intel386 processor has support for: • A 32-bit address bus that supports up to 4-GBytes of physical memory • A segmented-memory model and a flat memory model • Paging, with a fixed 4-KByte page size providing a method for virtual memory management • Support for parallel stages The Intel486™ Processor (1989) The Intel486™ processor added more parallel execution capability by expanding the Intel386 processor’s instruction decode and execution units into five pipelined stages. Each stage operates in parallel with the others on up to five instructions in different stages of execution. In addition, the processor added: • An 8-KByte on-chip first-level cache that increased the percent of instructions that could execute at the scalar rate of one per clock 4/153 MPMC© Pawar Virendra D.
  • 5. • An integrated x87 FPU • Power saving and system management capabilities The Intel® Pentium® Processor (1993) The introduction of the Intel Pentium processor added a second execution pipeline to achieve superscalar performance (two pipelines, known as u and v, together can execute two instructions per clock). The on-chip first-level cache doubled, with 8 KBytes devoted to code and another 8 KBytes devoted to data. The data cache uses the MESI protocol to support more efficient write-back cache in addition to the write-through cache previously used by the Intel486 processor. Branch prediction with an on-chip branch table was added to increase performance in looping constructs. In addition, the processor added: • Extensions to make the virtual-8086 mode more efficient and allow for 4-MByte as well as 4-KByte pages • Internal data paths of 128 and 256 bits add speed to internal data transfers • Burst able external data bus was increased to 64 bits • An APIC to support systems with multiple processors • A dual processor mode to support glueless two processor systems PROCESSOR FEATURES OVERVIEW The Pentium processor supports the features of previous Intel Architecture processors and provides significant enhancements including the following: • Superscalar Architecture • Dynamic Branch Prediction • Pipelined Floating-Point Unit • Improved Instruction Execution Time • Separate Code and Data Caches. • Writeback MESI Protocol in the Data Cache • 64-Bit Data Bus • Bus Cycle Pipelining • Address Parity • Internal Parity Checking • Functional Redundancy Checking2 and Lock Step operation2 • Execution Tracing • Performance Monitoring • IEEE 1149.1 Boundary Scan • System Management Mode • Virtual Mode Extensions • Upgradable with a Pentium OverDrive processor2 • Dual processing support • Advanced SL Power Management Features • Fractional Bus Operation • On-Chip Local APIC Device • Functional Redundancy Checking and Lock Step operation 5/153 MPMC© Pawar Virendra D.
  • 6. • Support for the Intel 82498/82493 and 82497/82492 cache chipset products • Upgradability with a Pentium OverDrive processor • Split line accesses to the code cache COMPONENT INTRODUCTION The application instruction set of the Pentium processor family includes the complete instruction set of existing Intel Architecture processors to ensure backward compatibility, with extensions to accommodate the additional functionality of the Pentium processor. All application software written for the Intel386™ and Intel486™ microprocessors will run on the Pentium processor without modification. The on-chip memory management unit (MMU) is completely compatible with the Intel386 and Intel486 CPUs. The two instruction pipelines and the floating-point unit on the Pentium processor are capable of independent operation. Each pipeline issues frequently used instructions in a single clock. Together, the dual pipes can issue two integer instructions in one clock, or one floating-point instruction (under certain circumstances, 2 floating-point instructions) 6/153 MPMC© Pawar Virendra D.
  • 7. in one clock. Branch prediction is implemented in the Pentium processor. To support this, the Pentium processor implements two prefetch buffers, one to prefetch code in a linear fashion, and one that prefetches code according to the Branch Target Buffer (BTB) so the needed code is almost always prefetched before it is needed for execution. The Pentium processor includes separate code and data caches integrated on chip to meet its performance goals.. The caches on the Pentium processor are each 8 Kbytes in size and 2-way set-associative. Each cache has a dedicated Translation Lookaside Buffer (TLB) to translate linear addresses to physical addresses. The Pentium processor data cache is configurable to be writeback or writethrough on a line-by-line basis and follows the MESI protocol. The data cache tags are triple ported to support two data transfers and an inquire cycle in the same clock. The code cache is an inherently write protected cache. The code cache tags of the Pentium processor are also triple ported to support snooping and split-line accesses. The Pentium processor has a 64-bit data bus. Burst read and burst writeback cycles are supported by the Pentium processor. In addition, bus cycle pipelining has been added to allow two bus cycles to be in progress simultaneously. The Pentium processor Memory Management Unit contains optional extensions to the architecture which allow 4 MB page sizes. The Pentium processor has added significant data integrity and error detection capability. Data parity checking is still supported on a byte-by-byte basis. Address parity checking, and internal parity checking features have been added along with a new exception, the machine check exception. The Pentium processor has implemented functional redundancy checking to provide maximum error detection of the processor and the interface to the processor. When functional redundancy checking is used, a second processor, the “checker” is used to execute in lock step with the “master” processor. The checker samples the master’s outputs and compares those values with the values it computes internally, and asserts an error signal if a mismatch occurs. The Pentium processor with MMX technology does not support functional redundancy checking. As more and more functions are integrated on chip, the complexity of board level testing is increased. To address this, the Pentium processor has increased test and debug capability by implementing IEEE Boundary Scan (Standard 1149.1). System management mode has been implemented along with some extensions to the SMM architecture. Enhancements to the Virtual 8086 mode have been made to increase performanceby reducing the number of times it is necessary to trap to a Virtual 8086 monitor. including the two instruction pipelines, the “u” pipe and the “v” pipe. The u-pipe can execute all integer and floating-point instructions. The v-pipe can execute simple integer instructions and the FXCH floating-point instruction. 7/153 MPMC© Pawar Virendra D.
  • 8. The separate code and data caches are shown. The data cache has two ports, one for each of the two pipes (the tags are triple ported to allow simultaneous inquire cycles). The data cache has a dedicated to translate linear addresses to the physical addresses used by the data cache. The code cache, branch target buffer and prefetch buffers are responsible for getting raw instructions into the execution units of the Pentium processor. Instructions are fetched from the code cache or from the external bus. Branch addresses are remembered by the branch target buffer. The code cache TLB translates linear addresses to physical addresses used by the code cache. The decode unit contains two parallel decoders which decode and issue up to the next two sequential instructions into the execution pipeline. The control ROM contains the microcode which controls the sequence of operations performed by the processor. The control unit has direct control over both pipelines. The Pentium processor contains a pipelined floating-point unit that provides a significant floating-point performance advantage over previous generations of Intel Architecture- based processors. The Pentium processor includes features to support multi-processor systems, namely an on chip Advanced Programmable Interrupt Controller (APIC). This APIC implementation supports multiprocessor interrupt management (with symmetric interrupt distribution across all processors), multiple I/O subsystem support, 8259A compatibility, and inter-processor interrupt support. The dual processor configuration allows two Pentium processors to share a single L2 cache for a low-cost symmetric multi-processor system. The two processors appear to the system as a single Pentium processor. Multiprocessor operating systems properly schedule computing tasks between the two processors. This scheduling of tasks is transparent to software applications and the end-user. Logic built into the processors support a “glueless” interface for easy system design. Through a private bus, the two Pentium processors arbitrate for the external bus and maintain cache coherency. The Pentium processor can also be used in a conventional multi-processor system in which one L2 cache is dedicated to each processor. The Pentium processor is produced on Intel’s advanced silicon technology. The Pentium processor also includes SL enhanced power management features. When the clock to the Pentium processor is stopped, power dissipation is virtually eliminated. The low VCC operating voltages and SL enhanced power management features make the Pentium processor a good choice for energy-efficient desktop designs. 8/153 MPMC© Pawar Virendra D.
  • 9. PIN DESCRIPTION Symbol Type Name and Function A31-A3 I/O As outputs, the address lines of the processor along with the byte enables define the physical area of memory or I/O accessed. The external system drives the inquire address to the processor on A31-A5. D63-D0 I/O These are the 64 data lines for the processor. Lines D7-D0 define the least significant byte of the data bus; lines D63-D56 define the most significant byte of the data bus. When the CPU is driving the data lines, they are driven during the T2, T12, or T2P clocks for that cycle. During reads, the CPU samples the data bus when BRDY# is returned. ADS# O The address status indicates that a new valid bus cycle is currently being driven by the Pentium processor BE7#-BE5# O The byte enable pins are used to determine which bytes must BE4#-BE0# I/O be written to external memory, or which bytes were requested by the CPU for the current cycle. The byte enables are driven in the same clock as the address lines (A31-3). BOFF# I The backoff input is used to abort all outstanding bus cycles that have not yet completed. In response to BOFF#, the Pentium processor will float all pins normally floated during bus hold in the next clock. Theprocessor remains in bus hold until BOFF# is negated, at which time the Pentium processor restarts the aborted bus cycle(s) in their entirety. BRDY# I The burst ready input indicates that the external system has presented valid data on the data pins in response to a read or that the external system has accepted the Pentium processor data in response to a write request. This signal is sampled in the T2, T12 and T2P bus states. CACHE# O For Pentium processor initiated cycles the cache pin indicates internal cacheability of the cycle (if a read), and indicates a burst write back cycle (if a write). If this pin is driven inactive during a read cycle, the Pentium processor will not cache the returned data, regardless of the state of the KEN# pin. This pin is also used to determine the cycle length (number of transfers in the cycle). CPUTYP I CPU type distinguishes the Primary processor from the Dual processor. In a single processor environment, or when the Pentium processor is acting as the Primary processor in a dual processing system, CPUTYP should be strapped to VSS. The Dual processor should have CPUTYP strapped to VCC. For the Pentium OverDrive processor, CPUTYP will be used to determine whether the bootup handshake protocol will be used (in a dual socket system) or not (in a single socket system). FLUSH# I When asserted, the cache flush input forces the Pentium processor to write back all modified lines in the data cache 9/153 MPMC© Pawar Virendra D.
  • 10. and invalidate its internal caches. A Flush Acknowledge special cycle will be generated by the Pentium processor indicating completion of the write back and invalidation. If FLUSH# is sampled low when RESET transitions from high to low, tristate test mode is entered. If two Pentium processor are operating in dual processing mode and FLUSH# is asserted, the Dual processor will perform a flush first (without a flush acknowledge cycle), then the Primary processor will perform a flush followed by a flush acknowledge cycle. NOTE: If the FLUSH# signal is asserted in dual processing mode, it must be deasserted at least one clock prior to BRDY# of the FLUSH Acknowledge cycle to avoid DP arbitration problems. FRCMC# I The functional redundancy checking master/checker mode input is used to determine whether the Pentium processor is configured in master mode or checker mode. When configured as a master, the Pentium processor drives its output pins as required by the bus protocol. When configured as a checker, the Pentium processor tristates all outputs (except IERR# and TDO) and samples the output pins. The configuration as a master/checker is set after RESET and may not be changed other than by a subsequent RESET. HOLD I In response to the bus hold request, the Pentium processor will float most of its output and input/output pins and assert HLDA after completing all outstanding bus cycles. The Pentium processor will maintain its bus in this state until HOLD is de-asserted. HOLD is not recognized during LOCK cycles. The Pentium processor will recognize HOLD during reset. HOLDA O The bus hold acknowledge pin goes active in response to a hold request driven to the processor on the HOLD pin. It indicates that the Pentium processor has floated most of the output pins and relinquished the bus to another local bus master. When leaving bus hold, HLDA will be driven inactive and the Pentium processor will resume driving the bus. If the Pentium processor has a bus cycle pending, it will be driven in the same clock that HLDA is de-asserted. INIT I The Pentium processor initialization input pin forces the Pentium processor to begin execution in a known state. The processor state after INIT is the same as the state after RESET except that the internal caches, write buffers, and floating point registers retain the values they had prior to INIT. INIT may NOT be used in lieu of RESET after power-up. If INIT is sampled high when RESET transitions from high to low, the Pentium processor will perform built-in self test prior to the start of program execution. 10/153 MPMC© Pawar Virendra D.
  • 11. INV I The invalidation input determines the final cache line state (S or I) in case of an inquire cycle hit. It is sampled together with the address for the inquire cycle in the clock EADS# is sampled active. KEN# I The cache enable pin is used to determine whether the current cycle is cacheable or not and is consequently used to determine cycle length. When the Pentium processor generates a cycle that can be cached (CACHE# asserted) and KEN# is active, the cycle will be transformed into a burst line fill cycle. LOCK# O The bus lock pin indicates that the current bus cycle is locked. The Pentium processor will not allow a bus hold when LOCK# is asserted (but AHOLD and BOFF# are allowed). LOCK# goes active in the first clock of the first locked bus cycle and goes inactive after the BRDY# is returned for the last locked bus cycle. LOCK# is guaranteed to be de-asserted for at least one clock between back-to-back locked cycles. NA# I An active next address input indicates that the external memory system is ready to accept a new bus cycle although all data transfers for the current cycle have not yet completed. The Pentium processor will issue ADS# for a pending cycle two clocks after NA# is asserted. The Pentium processor supports up to 2 outstanding bus cycles. RESET I RESET forces the Pentium processor to begin execution at a known state. All the Pentium processor internal caches will be invalidated upon the RESET. Modified lines in the data cache are not written back. FLUSH#, FRCMC# and INIT are sampled when RESET transitions from high to low to determine if tristate test mode or checker mode will be entered, or if BIST will be run. 11/153 MPMC© Pawar Virendra D.
  • 12. REAL MODE RISC A Complex Instruction Set Computer (CISC) provides a large and powerful range of instructions, which is less flexible to implement. For example, the 8086 microprocessor family has these instructions: JA Jump if Above JAE Jump if Above or Equal JB Jump if Below By contrast, the Reduced Instruction Set Computer (RISC) concept is to identify the sub- components and use those. As these are much simpler, they can be implemented directly in silicon, so will run at the maximum possible speed. Nothing is 'translated' Most modern CISC processors, such as the Pentium, uses a fast RISC core with an interpreter sitting between the core and the instruction. So when you are running Windows95 on a PC, it is not that much different to trying to get W95 running on the software PC emulator. Just imagine the power hidden inside the Pentium... . This is not to say that CISC processors cannot have a large number of registers, some do. However for it's use, a typical RISC processor requires more registers to give it additional flexibility. Gone are the days when you had two general purpose registers and an 'accumulator'. One thing RISC does offer, though, is register independence The 8086 offers you fourteen registers, but with caveats: The first four (A, B, C, and D) are Data registers (a.k.a. scratch-pad registers). They are 16bit and accessed as two 8 bit registers, thus register A is really AH (A, high-order byte) and AL (A low-order byte). These can be used as general purpose registers, but they can also have dedicated functions - Accumulator, Base, Count, and Data. The advantages of RISC against CISC are those today: • RISC processors are much simpler to build, by this again results in the following advantages: o easier to build, i.e. you can use already existing production facilities o much less expensive, just compare the price of a XScale with that of a Pentium III at 1 GHz... o less power consumption, which again gives two advantages: much longer use of battery driven devices no need for cooling of the device, which again gives to advantages: 12/153 MPMC© Pawar Virendra D.
  • 13. smaller design of the whole device no noise RISC processors are much simpler to program which doesn't only help the assembler programmer, but the compiler designer, too. You'll hardly find any compiler which uses all the functions of a Pentium III optimally SUPER SCALAR A superscalar CPU architecture implements a form of parallelism called instruction level parallelism within a single processor. It therefore allows faster CPU throughput than would otherwise be possible at a given clock rate. A superscalar processor executes more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to redundant functional units on the processor. Each functional unit is not a separate CPU core but an execution resource within a single CPU such as an arithmetic logic unit, a bit shifter, or a multiplier. While a superscalar CPU is typically also pipelined, pipelining and superscalar architecture are considered different performance enhancement techniques. The superscalar technique is traditionally associated with several identifying characteristics (within a given CPU core): • Instructions are issued from a sequential instruction stream • CPU hardware dynamically checks for data dependencies between instructions at run time (versus software checking at compile time) • The CPU accepts multiple instructions per clock cycle The simplest processors are scalar processors. Each instruction executed by a scalar processor typically manipulates one or two data items at a time. By contrast, each instruction executed by a vector processor operates simultaneously on many data items. An analogy is the difference between scalar and vector arithmetic. A superscalar processor is sort of a mixture of the two. Each instruction processes one data item, but there are multiple redundant functional units within each CPU thus multiple instructions can be processing separate data items concurrently. Superscalar CPU design emphasizes improving the instruction dispatcher accuracy, and allowing it to keep the multiple functional units in use at all times. This has become increasingly important when the number of units increased. While early superscalar CPUs would have two ALUs and a single FPU, a modern design such as the PowerPC 970 includes four ALUs, two FPUs, and two SIMD units. If the dispatcher is ineffective at keeping all of these units fed with instructions, the performance of the system will suffer. 13/153 MPMC© Pawar Virendra D.
  • 14. A superscalar processor usually sustains an execution rate in excess of one instruction per machine cycle. But merely processing multiple instructions concurrently does not make an architecture superscalar, since pipelined, multiprocessor or multi-core architectures also achieve that, but with different methods. In a superscalar CPU the dispatcher reads instructions from memory and decides which ones can be run in parallel, dispatching them to redundant functional units contained inside a single CPU. Therefore a superscalar processor can be envisioned having multiple parallel pipelines, each of which is processing instructions simultaneously from a single instruction thread. Existing binary executable programs have varying degrees of intrinsic parallelism. In some cases instructions are not dependent on each other and can be executed simultaneously. In other cases they are inter-dependent: one instruction impacts either resources or results of the other. The instructions a = b + c; d = e + f can be run in parallel because none of the results depend on other calculations. However, the instructions a = b + c; b = e + f might not be runnable in parallel, depending on the order in which the instructions complete while they move through the units. When the number of simultaneously issued instructions increases, the cost of dependency checking increases extremely rapidly. This is exacerbated by the need to check dependencies at run time and at the CPU's clock rate. This cost includes additional logic gates required to implement the checks, 14/153 MPMC© Pawar Virendra D.
  • 15. PIPELINE AND INSTRUCTION FLOW The integer instructions traverse a five stage pipeline in the Pentium processor The pipeline stages are as follows: PF Prefetch D1 Instruction Decode D2 Address Generate EX Execute - ALU and Cache Access WB Writeback The Pentium processor is a superscalar machine, built around two general purpose integer pipelines and a pipelined floating-point unit capable of executing two instructions in parallel. Both pipelines operate in parallel allowing integer instructions to execute in a single clock in each pipeline. Figure depicts instruction flow in the Pentium processor. The pipelines in the Pentium processor are called the “u” and “v” pipes and the process of issuing two instructions in parallel is termed “pairing.” The u-pipe can execute any instruction in the Intel architecture, while the v-pipe can execute “simple” instructions as defined in the “Instruction Pairing Rules” section of this chapter. When instructions are paired, the instruction issued to the v-pipe is always the next sequential instruction after the one issued to the u-pipe. Pentium® Processor Pipeline Execution The Pentium processor pipeline has been optimized to achieve higher throughput compared to previous generations of Intel Architecture processors. The first stage of the pipeline is the Prefetch (PF) stage in which instructions are prefetched from the on-chip instruction cache or memory. Because the Pentium processor has separate caches for instructions and data, prefetches do not conflict with data references for access to the cache. If the requested line is not in the code cache, a memory reference is made. In the PF stage of the Pentium processor, two independent pairs of line-size (32-byte) prefetch buffers operate in conjunction with the Branch Target Buffer. This allows one prefetch buffer to prefetch instructions sequentially, while the other prefetches according to the branch target buffer predictions. The pipeline stage after 15/153 MPMC© Pawar Virendra D.
  • 16. the PF stage in the Pentium processor is Decode 1 (D1) in which two parallel decoders attempt to decode and issue the next two sequential instructions. The decoders determine whether one or two instructions can be issued contingent upon the “Instruction Pairing Rules.” The Pentium processor requires an extra D1 clock to decode instruction prefixes. Prefixes are issued to the u-pipe at the rate of one per clock without pairing. After all prefixes have been issued, the base instruction will then be issued and paired according to the pairing rules. The D1 stage is followed by Decode2 (D2) in which addresses of memory resident operands are calculated. In instructions containing both a displacement and an immediate, or instructions containing a base and index addressing mode , The Pentium processor removes both of these restrictions and is able to issue instructions in these categories in a single clock. The Pentium processor uses the Execute (EX) stage of the pipeline for both ALU operations and for data cache access; therefore those instructions specifying both an ALU operation and a data cache access will require more than one clock in this stage. In EX all u-pipe instructions and all v-pipe instructions except conditional branches are verified for correct branch prediction. Microcode is designed to utilize both pipelines and thus those instructions requiring microcode execute faster. The final stage is Writeback (WB) where instructions are enabled to modify processor state and complete execution. In this stage, v-pipe conditional branches are verified for correct branch prediction. During their progression through the pipeline, instructions may be stalled due to certain conditions. Both the u-pipe and v-pipe instructions enter and leave the D1 and D2 stages in unison. When an instruction in one pipe is stalled, then the instruction in the other pipe is also stalled at the same pipeline stage. Thus both the u- pipe and the v-pipe instructions enter the EX stage in unison. Once in EX if the u-pipe instruction is stalled, then the v-pipe instruction (if any) is also stalled. If the v-pipe instruction is stalled then the instruction paired with it in the u-pipe is not allowed to advance. No successive instructions are allowed to enter the EX stage of either pipeline until the instructions in both pipelines have advanced to WB. INSTRUCTION PREFETCH In the Pentium processor PF stage, two independent pairs of line-size (32-byte) prefetch buffers operate in conjunction with the branch target buffer. Only one prefetch buffer actively requests prefetches at any given time. Prefetches are requested sequentially until a branch instruction is fetched. When a branch instruction is fetched, the branch target buffer (BTB) predicts whether the branch will be taken or not. If the branch is predicted not taken, prefetch requests continue linearly. On a predicted taken branch the other prefetch buffer is enabled and begins to prefetch as though the branch was taken. If a branch is discovered mis-predicted, the instruction pipelines are flushed and prefetching activity starts over. Integer Instruction Pairing Rules The Pentium processor can issue one or two instructions every clock. In order to issue two instructions simultaneously they must satisfy the following conditions: • Both instructions in the pair must be “simple” as defined below 16/153 MPMC© Pawar Virendra D.
  • 17. Simple instructions are entirely hardwired; they do not require any microcode control and, in general, execute in one clock. The exceptions are the ALU mem, reg and ALU reg, mem • There must be no read-after-write or write-after-write register dependencies between them • Neither instruction may contain both a displacement and an immediate • Instructions with prefixes can only occur in the u-pipe. • Instruction prefixes are treated as separate 1-byte instructions. Sequencing hardware is used to allow them to function as simple instructions. The following integer instructions are considered simple and may be paired: 1. mov reg, reg/mem/imm 2. mov mem, reg/imm 3. alu reg, reg/mem/imm 4. alu mem, reg/imm 5. inc reg/mem 6. dec reg/mem 7. push reg/mem 8. pop reg 9. lea reg,mem 10. jmp/call/jcc near 11. nop 12. test reg, reg/mem 13. test acc, imm In addition, conditional and unconditional branches may be paired only if they occur as the second instruction in the pair. They may not be paired with the next sequential instruction. Also, SHIFT/ROT by 1 and SHIFT by imm may pair as the first instruction in a pair. The register dependencies that prohibit instruction pairing include implicit dependencies via registers or flags not explicitly encoded in the instruction. For example, an ALU instruction in the u-pipe (which sets the flags) may not be paired with an ADC or an SBB instruction in the v-pipe. There are two exceptions to this rule. The first is the commonly occurring sequence of compare and branch which may be paired. The second exception is pairs of pushes or pops. Although these instructions have an implicit dependency on the stack pointer, special hardware is included to allow these common operations to proceed in parallel. Although in general two paired instructions may proceed in parallel independently, there is an exception for paired “read-modify-write” instructions. Read-modify-write instructions are ALU operations with an operand in memory. When two of these instructions are paired there is a sequencing delay of two clocks in addition to the three clocks required to execute the individual instructions. Although instructions may execute in parallel their behavior as seen by the programmer is exactly the same as if they were executed sequentially. 17/153 MPMC© Pawar Virendra D.
  • 18. BRANCH PREDICTION Branch Target Buffer (BTB) The Pentium processor uses a Branch Target Buffer (BTB) to predict the outcome of branch instructions which minimizes pipeline stalls due to prefetch delays. The Pentium processor accesses the BTB with the address of the instruction in the D1 stage. It contains a Branch prediction state machine with four states: (1) strongly not taken, (2) weakly not taken, (3) weakly taken, and (4) strongly taken. In the event of a correct prediction, a branch will execute without pipeline stalls or flushes. Branches which miss the BTB are assumed to be not taken. Conditional and unconditional near branches and near calls execute in 1 clock and may be executed in parallel with other integer instructions. A mispredicted branch (whether a BTB hit or miss) or a correctly predicted branch with the wrong target address will cause the pipelines to be flushed and the correct target to be fetched. Incorrectly predicted unconditional branches will incur an additional three clock delay, incorrectly predicted conditional branches in the u-pipe will incur an additional three clock delay, and incorrectly predicted conditional branches in the v-pipe will incur an additional four clock delay. NT H: History T H: 11 H: 10 P: Prediction T P: T P: T T: Taken T NT: Not Taken T NT T T NT H: 00 H: 01 T P: NT P: T NT T The benefits of branch prediction are illustrated in the following example. Consider the following loop from a benchmark program for computing prime numbers: for(k=i+prime;k<=SIZE;k+=prime) flags[k]=FALSE; A popular compiler generates the following assembly code: (prime is allocated to ecx, k is allocated to edx, and al contains the value FALSE) inner_loop: mov byte ptr flags[edx],al add edx,ecx cmp edx, SIZE jle inner_loop Each iteration of this loop will execute in 6 clocks on the Intel486 CPU. On the Pentium processor, the mov is paired with the add; the cmp with the jle. With branch prediction, each loop iteration executes in 2 clocks. 18/153 MPMC© Pawar Virendra D.
  • 19. CACHE ON-CHIP CACHES The Pentium processor implements two internal caches for a total integrated cache size of 16 Kbytes: an 8 Kbyte data cache and a separate 8 Kbyte code cache. These caches are transparent to application software to maintain compatibility with previous The data cache fully supports the MESI (modified/exclusive/shared/invalid) writeback cache consistency protocol. The code cache is inherently write protected to prevent code from being inadvertently corrupted, and as a consequence supports a subset of the MESI protocol, the S (shared) and I (invalid) states. The caches have been designed for maximum flexibility and performance. The data cache is configurable as writeback or writethrough on a line-by-line basis. Memory areas can be defined as non-cacheable by software and external hardware. Cache writeback and invalidations can be initiated by hardware or software. Protocols for cache consistency and line replacement are implemented in hardware, easing system devise On the Pentium processor , each of the caches are 8 Kbytes in size and each is organized as a 2-way set associative cache. There are 128 sets in each cache, each set containing 2 lines (each line has its own tag address). Each cache line is 32 bytes wide. The In the Pentium processor , replacement in both the data and instruction caches is handled by the LRU mechanism which requires one bit per set in each of the caches. Cache Structure The instruction and data caches can be accessed simultaneously. The instruction cache can provide up to 32 bytes of raw opcodes and the data cache can provide data for two data references all in the same clock. This capability is implemented partially through the tag structure. The tags in the data cache are triple ported. One of the ports is dedicated to snooping while the other two are used to lookup two independent addresses corresponding to data references from each of the pipelines. The instruction cache tags of the Pentium processor are also triple ported. Again, one port is dedicated to support snooping and the other two ports facilitate split line accesses (simultaneously accessing upper half of one line and lower half of the next line. Each of the caches are parity protected. The operating modes of the caches are controlled by the CD (cache disable) and NW (not writethrough) bits in CR0. TLB (Translation lookaside Buffers). Each of the caches are accessed with physical addresses and each cache has its own TLB (translation lookaside buffer) to translate linear addresses to physical addresses. The TLBs associated with the instruction cache are single ported whereas the data cache TLBs are fully dual ported to be able to translate two independent linear addresses for two data references simultaneously. 19/153 MPMC© Pawar Virendra D.
  • 20. The goal of an effective memory system is that the effective access time that the processor sees is very close to to, the access time of the cache. Most accesses that the processor makes to the cache are contained within this level. The achievement of this goal depends on many factors: the architecture of the processor, the behavioral properties of the programs being executed, and the size and organization of the cache. Caches work on the basis of the locality of program behavior. There are three principles involved: 1. Spatial Locality - Given an access to a particular location in memory, there is a high probability that other accesses will be made to either that or neighboring locations within the lifetime of the program. 2. Temporal Locality - This is complementary to spatial locality. Given a sequence of references to n locations, there is a high probability that references following this sequence will be made into the sequence. Elements of the sequence will again be referenced during the lifetime of the program. 3. Sequentiality- Given that a reference has been made to a particular location s it is likely that within the next several references a reference to the location of s + 1 will be made. Sequentiality is a restricted type of spatial locality and can be regarded as a subset of it. Some common terms Processor reference that are found in the cache are called cache hits. References not found in the cache are called cache misses. On a cache miss, the cache control mechanism must fetch the missing data from memory and place it in the cache. Usually the cache fetches a spatial locality called the line from memory. The physical word is the basic unit of access in the memory. The processor-cache interface can be characterized by a number of parameters. Those that directly affect processor performance include: 1. Access time for a reference found in the cache (a hit) - property of the cache size and organization. 2. Access time for a reference not found in the cache (a miss) - property of the memory organization. 3. Time to initially compute a real address given a virtual address (not-in-TLB-time) - property of the address translation facility, which, though strictly speaking, is not part of the cache, resembles the cache in most aspects and is discussed in this chapter. Data Cache Consistency Protocol (MESI Protocol) The Pentium processor Cache Consistency Protocol is a set of rules by which states are 20/153 MPMC© Pawar Virendra D.
  • 21. assigned to cached entries (lines). The rules apply for memory read/write cycles only. I/O and special cycles are not run through the data cache. Every line in the Pentium processor data cache is assigned a state dependent on both Pentium processor generated activities and activities generated by other bus masters (snooping). The Pentium processor Data Cache Protocol consists of four states that define whether a line is valid (HIT/MISS), if it is available in other caches, and if it has been MODIFIED. The four states are the M (Modified), E (Exclusive), S (Shared) and the I (Invalid) states and the protocol is referred to as the MESI protocol. A definition of the states is given below: M - Modified: An M-state line is available in ONLY one cache and it is also MODIFIED (different from main memory). An M-state line can be accessed (read/written to) without sending a cycle out on the bus. E - Exclusive: An E-state line is also available in ONLY one cache in the system, but the line is not MODIFIED (i.e., it is the same as main memory). An E-state line can be accessed (read/written to) without generating a bus cycle. A write to an E-state line will cause the line to become MODIFIED. S - Shared: This state indicates that the line is potentially shared with other caches (i.e. the same line may exist in more than one cache). A read to an S-state line will not generate bus activity, but a write to a SHARED line will generate a write through cycle on the bus. The write through cycle may invalidate this line in other caches. A write to an S-state line will update the cache. I - Invalid: This state indicates that the line is not available in the cache. A read to this line will be a MISS and may cause the Pentium processor to execute a LINE FILL (fetch the whole line into the cache from main memory). A write to an INVALID line will cause the Pentium processor to execute a write-through cycle on the bus. Inquire Cycles (Snooping) The purpose of inquire cycles is to check whether the address being presented is contained within the caches in the Pentium processor. ------------------------------------------------------------------------ ---------------------- 21/153 MPMC© Pawar Virendra D.
  • 22. Cache Organization Within the cache, there are three basic types of organization: 1. Direct Mapped 2. Fully Associative 3. Set Associative In fully associative mapping, when a request is made to the cache, the requested address is compared in a directory against all entries in the directory. If the requested address is found (a directory hit), the corresponding location in the cache is fetched and returned to the processor; otherwise, a miss occurs. 22/153 MPMC© Pawar Virendra D.
  • 23. Fully Associative Cache In a direct mapped cache, lower order line address bits are used to access the directory. Since multiple line addresses map into the same location in the cache directory, the upper line address bits (tag bits) must be compared with the directory address to ensure a hit. If a comparison is not valid, the result is a cache miss, or simply a miss. The address given to the cache by the processor actually is subdivided into several pieces, each of which has a different role in accessing data. 23/153 MPMC© Pawar Virendra D.
  • 24. Direct Mapped Cache The set associative cache operates in a fashion somewhat similar to the direct-mapped cache. Bits from the line address are used to address a cache directory. However, now there are multiple choices: two, four, or more complete line addresses may be present in the directory. Each of these line addresses corresponds to a location in a sub-cache. The collection of these sub-caches forms the total cache array. In a set associative cache, as in the direct-maped cache, all of these sub-arrays can be accessed simultaneously, together with the cache directory. If any of the entries in the cache directory match the reference address, and there is a hit, the particular sub-cache array is selected and out gated back to the processor. Set Associative Cache 24/153 MPMC© Pawar Virendra D.
  • 25. Cache Calculation Tag Line / Set Byte/Block Cache Main 512 bytes Memory 16Kb 2 4 Lines 16 Bytes / 210 Lines line 16 bytes / line 2 Sets Line Size = 16 = 24 Byte / Block = 4 Total Number of address lines to address main memory = 16 Kb = 214 Total number of lines in Cache = 512 = 29 Set or Ways = 2 512 = = 28 2 28 Line or Set Size = 4 = 24 Line /Set Size = 4 2 Total Number lines in main memory Tag Size = Total Number of lines in cache set 10 2 = = 26 Tag size = 6 24 214 (Total ) = 2 6 (Tag ) * 2 4 ( Line / Set ) * 2 4 ( Block / Byte) 25/153 MPMC© Pawar Virendra D.
  • 26. THE X87 FPU FLOATING-POINT UNIT The floating-point unit (FPU) of the Pentium processor is integrated with the integer unit on the first five stages of the U pipe line The fifth stage FB becomes X1. It is heavily pipelined. The FPU is designed to be able to accept one floating point .operation every clock. It can receive up to two floating-point instructions every clock, one of which must be an exchange instruction. Floating-Point Pipeline Stages The Pentium processor FPU has 8 pipeline stages, the first five of which it shares with the integer unit. Integer instructions pass through only the first 5 stages. Integer instructions use the fifth (X1) stage as a WB (write-back) stage. The 8 FP pipeline stages, and the activities that are performed in them are summarized below: PF Prefetch; D1 Instruction Decode; D2 Address generation; EX Memory and register read; conversion of FP data to external memory format and memory write; X1 Floating-Point Execute stage one; conversion of external memory format to internal FP data format and write operand to FP register file; bypass 1 (bypass 1 described in the “Bypasses” section). X2 Floating-Point Execute stage two; WF Perform rounding and write floating-point result to register file; bypass 2 (bypass 2 described in the “Bypasses” section). ER Error Reporting/Update Status Word. FPU Bypasses The Pentium processor stack architecture instruction set requires that all instructions have one source operand on the top of the stack. Since most instructions also have their destination as the top of the stack, most instructions see a “top of stack bottleneck.” New source operands must be brought to the top of the stack before we can issue an arithmetic instruction on them. This calls for extra usage of the exchange instruction, which allows the programmer to bring an available operand to the top of the stack. The following section describes the floating-point register file bypasses that exist on the Pentium processor. The register file has two write ports and two read ports. The read ports are used to read data out of the register file in the E stage. One write port is used to write data into the register file in the X1 stage, and the other in the WF stage. A bypass allows data that is about to be written into the register file to be available as an operand that is to be read from the register file by any succeeding floating-point instruction. A bypass is specified by a pair of ports (a write port and a read port) that get circumvented. Using the bypass, data is made available even before actually writing it to the register file. 26/153 MPMC© Pawar Virendra D.
  • 27. The following procedures are implemented: 1. Bypass the X1 stage register file write port and the E stage register file read port. 2. Bypass the WF stage register file write port and the E stage register file read port. With bypass 1, the result of a floating-point load (that writes to the register file in the X1 stage) can bypass the X1 stage write and be sent directly to the operand fetch stage or E stage of the next instruction. With bypass 2, the result of any arithmetic operation can bypass the WF stage write to the register file, and be sent directly to the desired execution unit as an operand for the next instruction. PROGRAMMING WITH THE x87 FPU The x87 Floating-Point Unit (FPU) provides high-performance floating-point processing capabilities for use in graphics processing, scientific, engineering, and business applications. It supports the floating-point, integer, and packed BCD integer data types and the floating-point processing algorithms and exception handling architecture defined in the IEEE Standard 754 for Binary Floating-Point Arithmetic. X87 FPU EXECUTION ENVIRONMENT The x87 FPU represents a separate execution environment within the IA-32. This execution environment consists of eight data registers (called the x87 FPU data registers) and the following special-purpose registers: • Status register • Control register • Tag word register • Last instruction pointer register • Last data (operand) pointer register • Opcode register These registers are described in the following sections. x87 FPU Data Registers The x87 FPU data registers consist of eight 80-bit registers. Values are stored in these registers in the double extended-precision floating-point format. When floating-point, integer, or packed BCD integer values are loaded from memory into any of the x87 FPU data registers, the values are automatically converted into double extended precision floating-point format (if they are not already in that format). When computation results are subsequently transferred back into memory from any of the x87 FPU registers, the results can be left in the double extended-precision floating-point format or converted back into a shorter floating-point format, an integer format, or the packed BCD integer format. 27/153 MPMC© Pawar Virendra D.
  • 28. x87 FPU Execution Environment The x87 FPU instructions treat the eight x87 FPU data registers as a register stack .All addressing of the data registers is relative to the register on the top of the stack. The register number of the current top-of-stack register is stored in the TOP (stack TOP) field in the x87 FPU status word. Load operations decrement TOP by one and load a value into the new top of- stack register, and store operations store the value from the current TOP register in memory and then increment TOP by one. (For the x87 FPU, a load operation is equivalent to a push and a store operation is equivalent to a pop.) Note that load and store operations are also available that do not push and pop the stack. x87 FPU Data Register Stack 28/153 MPMC© Pawar Virendra D.
  • 29. If a load operation is performed when TOP is at 0, register wraparound occurs and the new value of TOP is set to 7. The floating-point stack-overflow exception indicates when wraparound might cause an unsaved value to be overwritten . Many floating-point instructions have several addressing modes that permit the programmer to implicitly operate on the top of the stack, or to explicitly operate on specific registers relative to the TOP. Assemblers support these register addressing modes, using the expression ST(0), or simply ST, to represent the current stack top and ST(i) to specify the ith register from TOP in the stack (0 ≤ i ≤ 7). For example, if TOP contains 011B (register 3 is the top of the stack), the following instruction would add the contents of two registers in the stack (registers 3 and 5): FADD ST, ST(2); Figure shows an example of how the stack structure of the x87 FPU registers and instructions are typically used to perform a series of computations. Here, a two- dimensional dot product is computed, as follows: 1. The first instruction (FLD value1) decrements the stack register pointer (TOP) and loads the value 5.6 from memory into ST(0). The result of this operation is shown in snapshot (a). 2. The second instruction multiplies the value in ST(0) by the value 2.4 from memory and stores the result in ST(0), shown in snap-shot (b). 3. The third instruction decrements TOP and loads the value 3.8 in ST(0). 4. The fourth instruction multiplies the value in ST(0) by the value 10.3 from memory and stores the result in ST(0), shown in snap-shot (c). 5. The fifth instruction adds the value and the value in ST(1) and stores the result in ST(0), shown in snap-shot (d). Example x87 FPU Dot Product Computation 29/153 MPMC© Pawar Virendra D.
  • 30. MICROPROCESSOR INITIALIZATION AND CONFIGURATION Before normal operation of the Pentium processor can begin, the Pentium processor must be initialized by driving the RESET pin active. The RESET pin forces the Pentium processor to begin execution in a known state. Several features are optionally invoked at the falling edge of RESET: Built-in-Self-Test (BIST), Functional Redundancy Checking and Tristate Test Mode. In addition to the standard RESET pin, the Pentium processor has implemented an initialization pin (INIT) that allows the processor to begin execution in a known state without disrupting the contents of the internal caches or the floating-point state. POWER UP SPECIFICATIONS During power up, RESET must be asserted while VCC is approaching nominal operating voltage to prevent internal bus contention which could negatively affect the reliability of the processor. It is recommended that CLK begin toggling within 150 ms after VCC reaches its proper operating level. This recommendation is only to ensure long term reliability of the device. In order for RESET to be recognized, the CLK input needs to be toggling. RESET must remain asserted for 1 millisecond after VCC and CLK have reached their AC/DC specifications. TEST AND CONFIGURATION FEATURES (BIST, FRC, TRISTATE TEST MODE) The INIT, FLUSH#, and FRCMC# inputs are sampled when RESET transitions from high to low to determine if BIST will be run, or if tristate test mode or checker mode will be entered (respectively). If RESET is driven synchronously, these signals must be at their valid level and meet setup and hold times on the clock before the falling edge of RESET. If RESET is asserted asynchronously, these signals must be at their valid level two clocks before and after RESET transitions from high to low. Built In Self-Test Self-test is initiated by driving the INIT pin high when RESET transitions from high to low. No bus cycles are run by the Pentium processor during self test. The duration of self test is approximately 219 core clocks. Approximately 70% of the devices in the Pentium processor are tested by BIST. The Pentium processor BIST consists of two parts: hardware self-test and microcode self-test. During the hardware portion of BIST, the microcode ROM and all large PLAs are tested. All possible input combinations of the microcode ROM and PLAs are tested. The constant ROMs, BTB, TLBs, and all caches are tested by the microcode portion of BIST. The array tests (caches, TLBs and BTB) have two passes. On the first pass, data patterns are written to arrays, read back and checked for mismatches. The second pass writes the complement of the initial data pattern, reads it back, and checks for mismatches. The constant ROMs are tested by using the microcode to add various constants and check the result against a stored value. 30/153 MPMC© Pawar Virendra D.
  • 31. Upon successful completion of BIST, the cumulative result of all tests are stored in the EAX register. If EAX contains 0h, then all checks passed; any non-zero result indicates a faulty unit Tristate Test Mode When the FLUSH# pin is sampled low when RESET transitions from high to low, the Pentium processor enters tristate test mode. The Pentium processor floats all of its output pins and bidirectional pins including pins which are never floated during normal operation (except TDO). Tristate test mode can be initiated in order to facilitate testing by external circuitry to test board interconnects. The Pentium processor remains in tristate test mode until the RESET pin is asserted again. Functional Redundancy Checking The functional redundancy checking master/checker configuration input is sampled when RESET is high to determine whether the Pentium processor is configured in master mode (FRCMC# high) or checker mode (FRCMC# low). The final master/checker configuration of the Pentium processor is determined the clock before the falling edge of RESET. When configured as a master, the Pentium processor drives its output pins as required by the bus protocol. When configured as a checker, the Pentium processor tristates all outputs (except IERR#, PICD0, PICD1 and TDO) and samples the output pins (that would normally be driven in master mode). If the sampled value differs from the value computed internally, the Pentium processor asserts IERR# to indicate an error. INITIALIZATION WITH RESET, INIT AND BIST Two pins, RESET and INIT, are used to reset the Pentium processor in different manners. A “cold” or “power on” RESET refers to the assertion of RESET while power is initially being applied to the Pentium processor. A “warm” RESET refers to the assertion of RESET or INIT while VCC and CLK remain within specified operating limits. Table 3-1 shows the effect of asserting RESET and/or INIT. Toggling either the RESET pin or the INIT pin individually forces the Pentium processor to begin execution at address FFFFFFF0h. The internal instruction cache and data cache are invalidated when RESET is asserted (modified lines in the data cache are NOT written back). The instruction cache and data cache are not altered when the INIT pin is asserted without RESET. In both cases, the branch target buffer (BTB) and translation lookaside buffers (TLBs) are invalidated. After RESET (with or without BIST) or INIT, the Pentium processor will start executing instructions at location FFFFFFF0H. When the first Intersegment Jump or Call instruction is executed, address lines A20-A31 will be driven low for CS-relative memory cycles and the Pentium processor will only execute 31/153 MPMC© Pawar Virendra D.
  • 32. instructions in the lower one Mbyte of physical memory. This allows the system designer to use a ROM at the top of physical memory to initialize the system. RESET is internally hardwired and forces the Pentium processor to terminate all execution and bus cycle activity within 2 clocks. No instruction or bus activity will occur as long as RESET is active. INIT is implemented as an edge triggered interrupt and will be recognized when an instruction boundary is reached. As soon as the Pentium processor completes the INIT sequence, instruction execution and bus cycle activity will continue at address FFFFFFF0h even if the INIT pin is not deasserted. At the conclusion of RESET (with or without self-test) or INIT, the DX register will contain a component identifier. The upper byte will contain 05h and the lower byte will contain a stepping identifier. 32/153 MPMC© Pawar Virendra D.
  • 33. BUS CYCLES The Pentium processor bus is designed to support a 528-Mbyte/sec data transfer rate at 66 MHz. All data transfers occur as a result of one or more bus cycles. PHYSICAL MEMORY AND I/O INTERFACE Pentium processor memory is accessible in 8-, 16-, 32-, and 64-bit quantities. Pentium processor I/O is accessible in 8-, 16-, and 32-bit quantities. The Pentium processor can directly address up to 4 Gbytes of physical memory, and up to 64 Kbytes of I/O. In hardware, memory space is organized as a sequence of 64-bit quantities. Each 64-bit location has eight individually addressable bytes at consecutive memory addresses Memory Organization The I/O space is organized as a sequence of 32-bit quantities. Each 32-bit quantity has four individually addressable bytes at consecutive memory addresses. See Figure for a conceptual diagram of the I/O space. I/O Space Organization 33/153 MPMC© Pawar Virendra D.
  • 34. Sixty-four-bit memories are organized as arrays of physical quadwords (8-byte words). Physical quadwords begin at addresses evenly divisible by 8. The quadwords are addressable by physical address lines A31-A3. Thirty-two-bit memories are organized as arrays of physical dwords (4-byte words). Physical dwords begin at addresses evenly divisible by 4. The dwords are addressable by physical address lines A31-A3 and A2. A2 can be decoded from the byte enables . Sixteen-bit memories are organized as arrays of physical words (2-byte words). Physical words begin at addresses evenly divisible by 2. DATA TRANSFER MECHANISM All data transfers occur as a result of one or more bus cycles. Logical data operands of byte, word, dword, and quadword lengths may be transferred. Data may be accessed at any byte boundary, but two cycles may be required for misaligned data transfers. The Pentium processor considers a 2-byte or 4-byte operand that crosses a 4-byte boundary to be misaligned. In addition, an 8-byte operand that crosses an 8-byte boundary is misaligned. The Pentium processor address signals are split into two components. High-order address bits are provided by the address lines A31-A3. The byte enables BE7#- BE0# form the low-order address and selects the appropriate byte of the 8-byte data bus. For both memory and I/O accesses, the byte enable outputs indicate which of the associated data bus bytes are driven valid for write cycles and on which bytes data is expected back for read cycles. Non-contiguous byte enable patterns will never occur. Generating A2-A0 from BE7-0# Interfacing With 8-, 16-, 32-, and 64-Bit Memories In 64-bit physical memories such as, each 8-byte quadword begins at a byte address that is a multiple of eight. A31-A3 are used as an 8-byte quadword select and BE7#- BE0# select individual bytes within the word. 34/153 MPMC© Pawar Virendra D.
  • 35. Pentium® Processor with 64-Bit Memory The Figure shows the Pentium processor data bus interface to 32-, 16- and 8-bit wide memories. External byte swapping logic is needed on the data lines so that data is supplied to and received from the Pentium processor on the correct data pins see Table. For memory widths smaller than 64 bits, byte assembly logic is needed to return all bytes of data requested by the Pentium processor in one cycle. Addressing 32-, 16- and 8-Bit Memories 35/153 MPMC© Pawar Virendra D.
  • 36. Data Bus Interface to 32-, 16- and 8-Bit Memories Operand alignment and size dictate when two cycles are required for a data transfer. 36/153 MPMC© Pawar Virendra D.
  • 37. BUS STATE DEFINITION This section describes the Pentium processor bus states in detail. See Figure for the bus state diagram. Ti: This is the bus idle state. In this state, no bus cycles are being run. The Pentium processor may or may not be driving the address and status pins, depending on the state of the HLDA,AHOLD, and BOFF# inputs. An asserted BOFF# or RESET will always force the state machine back to this state. HLDA will only be driven in this state. T1: This is the first clock of a bus cycle. Valid address and status are driven out and ADS# is asserted. There is one outstanding bus cycle. T2: This is the second and subsequent clock of the first outstanding bus cycle. In state T2, data is driven out (if the cycle is a write), or data is expected (if the cycle is a read), and the BRDY# pin is sampled. There is one outstanding bus cycle. T12: This state indicates there are two outstanding bus cycles, and that the Pentium processor is starting the second bus cycle at the same time that data is being transferred for the first. In T12, the Pentium processor drives the address and status and asserts ADS# for the second outstanding bus cycle, while data is transferred and BRDY# is sampled for the first outstanding cycle. T2P: This state indicates there are two outstanding bus cycles, and that both are in their second and subsequent clocks. In T2P, data is being transferred and BRDY# is sampled for the first outstanding cycle. The address, status and ADS# for the second outstanding cycle were driven sometime in the past (in state T12). TD: This state indicates there is one outstanding bus cycle, that its address, status and ADS# have already been driven sometime in the past (in state T12), and that the data and BRDY# pins are not being sampled because the data bus requires one dead clock to turn around between consecutive reads and writes, or writes and reads. The Pentium processor enters TD if in the previous clock there were two outstanding cycles, the last BRDY# was returned, and a dead clock is needed. The timing diagrams in the next section give examples when a dead clock is needed. Table gives a brief summary of bus activity during each bus state. Figure shows the Pentium processor bus state diagram. Pentium® Processor Bus Activity 37/153 MPMC© Pawar Virendra D.
  • 38. Pentium® Processor Bus Control State Machine 38/153 MPMC© Pawar Virendra D.
  • 39. BUS CYCLES The Pentium processor requests data transfer cycles, bus cycles, and bus operations. A data transfer cycle is one data item, up to 8 bytes in width, being returned to the Pentium processor or accepted from the Pentium processor with BRDY# asserted. A bus cycle begins with the Pentium processor driving an address and status and asserting ADS#, and ends when the last BRDY# is returned. A bus cycle may have 1 or 4 data transfers. A burst cycle is a bus cycle with 4 data transfers. A bus operation is a sequence of bus cycles to carry out a specific function, such as a locked read-modify-write or an interrupt acknowledge. Single-Transfer Cycle The Pentium processor supports a number of different types of bus cycles. The simplest type of bus cycle is a single-transfer non-cacheable 64-bit cycle, either with or without wait states. Non-pipelined read and write cycles with 0 wait states are shown in Figure Non Pipelined Read or Write 39/153 MPMC© Pawar Virendra D.
  • 40. The Pentium processor initiates a cycle by asserting the address status signal (ADS#) in the first clock. The clock in which ADS# is asserted is by definition the first clock in the bus cycle. The ADS# output indicates that a valid bus cycle definition and address is available on the cycle definition pins and the address bus. The CACHE# output is deasserted (high) to indicate that the cycle will be a single transfer cycle. For a zero wait state transfer, BRDY# is returned by the external system in the second clock of the bus cycle. BRDY# indicates that the external system has presented valid data on the data pins in response to a read or the external system has accepted data in response to a write. The Pentium processor samples the BRDY# input in the second and subsequent clocks of a bus Cycle If the system is not ready to drive or accept data, wait states can be added to these cycles by not returning BRDY# to the processor at the end of the second clock. Cycles of this type, with one and two wait states added are shown in Figure .Note that BRDY# must be driven inactive at the end of the second clock. Burst Cycles For bus cycles that require more than a single data transfer (cacheable cycles and writeback cycles), the Pentium processor uses the burst data transfer. In burst transfers, a new data item can be sampled or driven by the Pentium processor in consecutive clocks. In addition the addresses of the data items in burst cycles all fall within the same 32-byte aligned area (corresponding to an internal Pentium processor cache line). The implementation of burst cycles is via the BRDY# pin. While running a bus cycle of more than one data transfer, the Pentium processor requires that the memory system perform a burst transfer and follow the burst order see Table. Given the first address in the burst sequence, the address of subsequent transfers must be calculated by external hardware. This requirement exists because the Pentium processor address and byte- enables are asserted for the first transfer and are not re-driven for each transfer. The burst sequence is optimized for two bank memory subsystems and is shown in Table Pentium Processor Burst Order 40/153 MPMC© Pawar Virendra D.
  • 41. BURST READ CYCLES When initiating any read, the Pentium processor will present the address and byte enables for the data item requested. When the cycle is converted into a cache linefill, the first data item returned should correspond to the address sent out by the Pentium processor; however, the byte enables should be ignored, and valid data must be returned on all 64 data lines. In addition, the address of the subsequent transfers in the burst sequence must be calculated by external hardware since the address and byte enables are not re-driven for each transfer. Figure shows a cacheable burst read cycle. Note that in this case the initial cycle generated by the Pentium processor might have been satisfied by a single data transfer, but was transformed into a multiple-transfer cache fill by KEN# being returned active on the clock that the first BRDY# is returned. In this case KEN# has such an effect because the cycle is internally cacheable in the Pentium processor (CACHE# pin is driven active). KEN# is only sampled once during a cycle to determine cacheability. Basic Burst Read Cycle 41/153 MPMC© Pawar Virendra D.
  • 42. BURST WRITE CYCLES Figure shows the timing diagram of basic burst write cycle. KEN# is ignored in burst write cycle. If the CACHE# pin is active (low) during a write cycle, it indicates that the cycle will be a burst writeback cycle. Burst write cycles are always writebacks of modified lines in the data cache. Writeback cycles have several causes: 1. Writeback due to replacement of a modified line in the data cache. 2. Writeback due to an inquire cycle that hits a modified line in the data cache. 3. Writeback due to an internal snoop that hits a modified line in the data cache. 4. Writebacks caused by asserting the FLUSH# pin. 5. Writebacks caused by executing the WBINVD instruction. The only write cycles that are burstable by the Pentium processor are writeback cycles. All other write cycles will be 64 bits or less, single transfer bus cycles. Basic Burst Write Cycle For writeback cycles, the lower five bits of the first burst address always starts at zero; therefore, the burst order becomes 0, 8h, 10h, and 18h. Again, note that the address of the subsequent transfers in the burst sequence must be calculated by external hardware since the Pentium processor does not drive the address and byte enables for each transfer. 42/153 MPMC© Pawar Virendra D.
  • 43. Locked Operations The Pentium processor architecture provides a facility to perform atomic accesses of memory. For example, a programmer can change the contents of a memory-based variable and be assured that the variable was not accessed by another bus master between the read of the variable and the update of that variable. This functionality is provided for select instructions using a LOCK prefix, and also for instructions which implicitly perform locked read modify write cycles such as the XCHG (exchange) instruction when one of its operands is memory based. Locked cycles are also generated when a segment descriptor or page table entry is updated and during interrupt acknowledge cycles. In hardware, the LOCK functionality is implemented through the LOCK# pin, which indicates to the outside world that the Pentium processor is performing a read-modify- write sequence of cycles, and that the Pentium processor should be allowed atomic access for the location that was accessed with the first locked cycle. Locked operations begin with a read cycle and end with a write cycle. Note that the data width read is not necessarily the data width written. For example, for descriptor access bit updates the Pentium processor fetches eight bytes and writes one byte. A locked operation is a combination of one or multiple read cycles followed by one or multiple write cycles. Programmer generated locked cycles and locked page table / directory accesses are treated differently and are described in the following sections. Snooping (Inquire) When operating in an MP system, IA-32 processors (beginning with the Intel486 processor) have the ability to snoop other processor’s accesses to system memory and to their internal caches. They use this snooping ability to keep their internal caches consistent both with system memory and with the caches in other processors on the bus. For example, in the Pentium and P6 family processors, if through snooping one processor detects that another processor intends to write to a memory location that it currently has cached in shared state, the snooping processor will invalidate its cache line forcing it to perform a cache line fill the next time it accesses the same memory location. . 43/153 MPMC© Pawar Virendra D.
  • 44. REGISTER SET Alternate General Purpose Register Names 44/153 MPMC© Pawar Virendra D.
  • 45. • I/O ports — The IA-32 architecture supports a transfers of data to and from input/output (I/O) ports. • Control registers — The five control registers (CR0 through CR4) determine the operating mode of the processor and the characteristics of the currently executing task. • Memory management registers — The GDTR, IDTR, task register, and LDTR specify the locations of data structures used in protected mode memory management. • Debug registers — The debug registers (DR0 through DR7) control and allow monitoring of the processor’s debugging operations. BASIC PROGRAM EXECUTION REGISTERS The processor provides 16 basic program execution registers for use in general system and application programming (see Figure ). These registers can be grouped as follows: • General-purpose registers. These eight registers are available for storing operands and pointers. • Segment registers. These registers hold up to six segment selectors. • EFLAGS (program status and control) register. The EFLAGS register report on the status of the program being executed and allows limited (application-program level) control of the processor. • EIP (instruction pointer) register. The EIP register contains a 32-bit pointer to the next instruction to be executed. • EAX — Accumulator for operands and results data • EBX — Pointer to data in the DS segment • ECX — Counter for string and loop operations • EDX — I/O pointer • ESI — Pointer to data in the segment pointed to by the DS register; source pointer for string operations • EDI — Pointer to data (or destination) in the segment pointed to by the ES register; destination pointer for string operations • ESP — Stack pointer (in the SS segment) • EBP — Pointer to data on the stack (in the SS segment) As shown in Figure 3-5, the lower 16 bits of the general-purpose registers map directly to the register set found in the 8086 and Intel 286 processors and can be referenced with the names AX, BX, CX, DX, BP, SI, DI, and SP. Each of the lower two bytes of the EAX, EBX, ECX, and EDX registers can be referenced by the names AH, BH, CH, and DH (high bytes) and AL, BL, CL, and DL (low bytes). DATA TYPES This chapter introduces data types defined for the IA-32 architecture. FUNDAMENTAL DATA TYPES The fundamental data types of IA-32 architecture are bytes, words, doublewords, quadwords, and double quadwords (see Figure ). A byte is eight bits, a word is 2 bytes 45/153 MPMC© Pawar Virendra D.
  • 46. (16 bits), a doubleword is 4 bytes (32 bits), a quadword is 8 bytes (64 bits), and a double quadword is 16 bytes (128 bits). A subset of the IA-32 architecture instructions operates on these fundamental data types without any additional operand typing. Figure shows the byte order of each of the fundamental data types when referenced as operands in memory. The low byte (bits 0 through 7) of each data type occupies the lowest address in memory and that address is also the address of the operand. Bytes, Words, Doublewords, Quadwords, and Double Quadwords in Memory 46/153 MPMC© Pawar Virendra D.
  • 47. Alignment Words, Doublewords, Quadwords, and Double Quadwords Words, doublewords, and quadwords do not need to be aligned in memory on natural boundaries. The natural boundaries for words, double words, and quadwords are even- numbered addresses, addresses evenly divisible by four, and addresses evenly divisible by eight, respectively. However, to improve the performance of programs, data structures (especially stacks) should be aligned on natural boundaries whenever possible. The reason for this is that the processor requires two memory accesses to make an unaligned memory access; aligned accesses require only one memory access. A word or doubleword operand that crosses a 4-byte boundary or a quadword operand that crosses an 8-byte boundary is considered unaligned and requires two separate memory bus cycles for access. Some instructions that operate on double quadwords require memory operands to be aligned on a natural boundary. These instructions generate a general-protection exception (#GP) if an unaligned operand is specified. A natural boundary for a double quadword is any address evenly divisible by 16. Other instructions that operate on double quadwords permit unaligned access (without generating a general-protection exception). However, additional memory bus cycles are required to access unaligned data from memory. NUMERIC DATA TYPES Although bytes, words, and doublewords are the fundamental data types of the IA-32 architecture, some instructions support additional interpretations of these data types to allow operations to be performed on numeric data types (signed and unsigned integers, and floating-point numbers). See Figure 47/153 MPMC© Pawar Virendra D.
  • 48. Numeric Data Types OPERAND ADDRESSING IA-32 machine-instructions act on zero or more operands. Some operands are specified explicitly and others are implicit. The data for a source operand can be located in: • the instruction itself (an immediate operand) • a register • a memory location • an I/O port When an instruction returns data to a destination operand, it can be returned to: • a register • a memory location • an I/O port Immediate Operands Some instructions use data encoded in the instruction itself as a source operand. These operands are called immediate operands (or simply immediates). For example, the following ADD instruction adds an immediate value of 14 to the contents of the EAX register: ADD EAX, 14 48/153 MPMC© Pawar Virendra D.
  • 49. All arithmetic instructions (except the DIV and IDIV instructions) allow the source operand to be an immediate value. The maximum value allowed for an immediate operand varies among instructions, but can never be greater than the maximum value of an unsigned doubleword integer (232). Register Operands Source and destination operands can be any of the following registers, depending on the instruction being executed: • 32-bit general-purpose registers (EAX, EBX, ECX, EDX, ESI, EDI, ESP, or EBP) • 16-bit general-purpose registers (AX, BX, CX, DX, SI, DI, SP, or BP) • 8-bit general-purpose registers (AH, BH, CH, DH, AL, BL, CL, or DL) • segment registers (CS, DS, SS, ES, FS, and GS) • EFLAGS register • x87 FPU registers (ST0 through ST7, status word, control word, tag word, data operand pointer, and instruction pointer) in a pair Some instructions (such as the DIV and MUL instructions) use quadword operands contained of 32-bit registers. Register pairs are represented with a colon separating them. For example, in the register pair EDX:EAX, EDX contains the high order bits and EAX contains the low order bits of a quadword operand. Several instructions (such as the PUSHFD and POPFD instructions) are provided to load and store the contents of the EFLAGS register or to set or clear individual flags in this register. Other instructions (such as the Jcc instructions) use the state of the status flags in the EFLAGS register as condition codes for branching or other decision making operations. The processor contains a selection of system registers that are used to control memory management, interrupt and exception handling, task management, processor management, and debugging activities. Some of these system registers are accessible by an application program, the operating system, or the executive through a set of system instructions. When accessing a system register with a system instruction, the register is generally an implied operand of the instruction. Memory Operands Source and destination operands in memory are referenced by means of a segment selector and an offset (see Figure). Segment selectors specify the segment containing the operand. Offsets specify the linear or effective address of the operand. Offsets can be 32 bits (represented by the notation m16:32) or 16 bits (represented by the notation m16:16). Memory Operand Address Specifying a Segment Selector The segment selector can be specified either implicitly or explicitly. The most common method of specifying a segment selector is to load it in a segment register and then allow 49/153 MPMC© Pawar Virendra D.
  • 50. the processor to select the register implicitly, depending on the type of operation being performed. The processor automatically chooses a segment according to the rules given in Table When storing data in memory or loading data from memory, the DS segment default can be overridden to allow other segments to be accessed. Within an assembler, the segment override is generally handled with a colon “:” operator. For example, the following MOV instruction moves a value from register EAX into the segment pointed to by the ES register. The offset into the segment is contained in the EBX register: MOV ES:[EBX], EAX; Default Segment Selection Rules At the machine level, a segment override is specified with a segment-override prefix, which is a byte placed at the beginning of an instruction. The following default segment selections cannot be overridden: • Instruction fetches must be made from the code segment. • Destination strings in string instructions must be stored in the data segment pointed to by the ES register. • Push and pop operations must always reference the SS segment. Some instructions require a segment selector to be specified explicitly. In these cases, the 16-bit segment selector can be located in a memory location or in a 16-bit register. For example, the following MOV instruction moves a segment selector located in register BX into segment register DS: MOV DS, BX Segment selectors can also be specified explicitly as part of a 48-bit far pointer in memory. Here, the first doubleword in memory contains the offset and the next word contains the segment selector. Specifying an Offset The offset part of a memory address can be specified directly as a static value (called a displacement) or through an address computation made up of one or more of the following components: • Displacement — An 8-, 16-, or 32-bit value. • Base — The value in a general-purpose register. • Index — The value in a general-purpose register. • Scale factor — A value of 2, 4, or 8 that is multiplied by the index value. 50/153 MPMC© Pawar Virendra D.
  • 51. The offset which results from adding these components is called an effective address. Each of these components can have either a positive or negative (2s complement) value, with the exception of the scaling factor. Figure 3-11 shows all the possible ways that these components can be combined to create an effective address in the selected segment. Offset (or Effective Address) Computation The uses of general-purpose registers as base or index components are restricted in the following manner: • The ESP register cannot be used as an index register. • When the ESP or EBP register is used as the base, the SS segment is the default segment. In all other cases, the DS segment is the default segment. The base, index, and displacement components can be used in any combination, and any of these components can be null. A scale factor may be used only when an index also is used. Each possible combination is useful for data structures commonly used by programmers in high-level languages and assembly language. The following addressing modes suggest uses for common combinations of address components. • Displacement A displacement alone represents a direct (uncomputed) offset to the operand. Because the displacement is encoded in the instruction, this form of an address is sometimes called an absolute or static address. It is commonly used to access a statically allocated scalar operand. • Base A base alone represents an indirect offset to the operand. Since the value in the base register can change, it can be used for dynamic storage of variables and data structures. • Base + Displacement A base register and a displacement can be used together for two distinct purposes: • As an index into an array when the element size is not 2, 4, or 8 bytes—The displacement component encodes the static offset to the beginning of the array. The base register holds the results of a calculation to determine the offset to a specific element within the array. • To access a field of a record: the base register holds the address of the beginning of the record, while the displacement is a static offset to the field. An important special case of this combination is access to parameters in a procedure activation record. A procedure activation record is the stack frame created when a procedure is entered. Here, the EBP register is the best choice for the base register, 51/153 MPMC© Pawar Virendra D.
  • 52. because it automatically selects the stack segment. This is a compact encoding for this common function. • (Index ∗ Scale) + Displacement This address mode offers an efficient way to index into a static array when the element size is 2, 4, or 8 bytes. The displacement locates the beginning of the array, the index register holds the subscript of the desired array element, and the processor automatically converts the subscript into an index by applying the scaling factor. • Base + Index + Displacement Using two registers together supports either a twodimensional array (the displacement holds the address of the beginning of the array) or one of several instances of an array of records (the displacement is an offset to a field within the record). • Base + (Index ∗ Scale) + Displacement Using all the addressing components together allows efficient indexing of a two-dimensional array when the elements of the array are 2, 4, or 8 bytes in size. I/O Port Addressing The processor supports an I/O address space that contains up to 65,536 8-bit I/O ports. Ports that are 16-bit and 32-bit may also be defined in the I/O address space. An I/O port can be addressed with either an immediate operand or a value in the DX register. 52/153 MPMC© Pawar Virendra D.