26. Instruction Set Architecture (ISA)
• Serves as an interface between software and
hardware.
• Provides a mechanism by which the software
tells the hardware what should be done.
High level language code : C, C++, Java, Fortran,
compiler
Assembly language code: architecture specific statements
assembler
Machine language code: architecture specific bit patterns
software
instruction set
hardware
CSCE430/830
ISA
27. Instruction Set Design Issues
• Instruction set design issues include:
– Where are operands stored?
» registers, memory, stack, accumulator
– How many explicit operands are there?
» 0, 1, 2, or 3
– How is the operand location specified?
» register, immediate, indirect, . . .
– What type & size of operands are supported?
» byte, int, float, double, string, vector. . .
– What operations are supported?
» add, sub, mul, move, compare . . .
CSCE430/830
ISA
28. Classifying ISAs
Accumulator (before 1960, e.g. 68HC11):
1-address
add A
acc acc + mem[A]
Stack (1960s to 1970s):
0-address
add
tos tos + next
Memory-Memory (1970s to 1980s):
2-address
3-address
add A, B
add A, B, C
mem[A] mem[A] + mem[B]
mem[A] mem[B] + mem[C]
Register-Memory (1970s to present, e.g. 80x86):
2-address
add R1, A
load R1, A
R1 R1 + mem[A]
R1 mem[A]
Register-Register (Load/Store, RISC) (1960s to present, e.g.
MIPS):
3-address
CSCE430/830
add R1, R2, R3
load R1, R2
store R1, R2
R1 R2 + R3
R1 mem[R2]
mem[R1] R2
ISA
30. Code Sequence C = A + B
for Four Instruction Sets
Stack
Accumulator
Push A
Push B
Add
Pop C
Load A
Add B
Store C
memory
CSCE430/830
acc = acc + mem[C]
Register
(register-memory)
Load R1, A
Add R1, B
Store C, R1
memory
R1 = R1 + mem[C]
Register (loadstore)
Load R1,A
Load R2, B
Add R3, R1, R2
Store C, R3
R3 = R1 + R2
ISA
32. Types of Operations
•
•
•
•
•
•
•
•
CSCE430/830
Arithmetic and Logic:
Data Transfer:
Control
System
Floating Point
Decimal
String
Graphics
AND, ADD
MOVE, LOAD, STORE
BRANCH, JUMP, CALL
OS CALL, VM
ADDF, MULF, DIVF
ADDD, CONVERT
MOVE, COMPARE
(DE)COMPRESS
ISA
33. MIPS Instructions
• All instructions exactly 32 bits wide
• Different formats for different purposes
• Similarities in formats ease implementation
6 bits
31
31
5 bits
5 bits
5 bits
5 bits
op
rs
rt
rd
6 bits
5 bits
5 bits
16 bits
op
rs
rt
offset
shamt funct
6 bits
31
CSCE430/830
6 bits
address
0
I-Format
26 bits
op
0
R-Format
J-Format
0
ISA-2
34. MIPS Instruction Types
• Arithmetic & Logical - manipulate data in
registers
add $s1, $s2, $s3
or $s3, $s4, $s5
$s1 = $s2 + $s3
$s3 = $s4 OR $s5
• Data Transfer - move register data to/from
memory load & store
lw $s1, 100($s2)
sw $s1, 100($s2)
$s1 = Memory[$s2 + 100]
Memory[$s2 + 100] = $s1
• Branch - alter program flow
beq $s1, $s2, 25
if ($s1==$s1) PC = PC + 4 + 4*25
else PC = PC + 4
CSCE430/830
ISA-2
35. MIPS Arithmetic & Logical
Instructions
• Instruction usage (assembly)
add dest, src1, src2
sub dest, src1, src2
and dest, src1, src2
dest=src1 + src2
dest=src1 - src2
dest=src1 AND src2
• Instruction characteristics
– Always 3 operands: destination + 2 sources
– Operand order is fixed
– Operands are always general purpose registers
• Design Principles:
– Design Principle 1: Simplicity favors regularity
– Design Principle 2: Smaller is faster
CSCE430/830
ISA-2
36. Arithmetic & Logical Instructions Binary Representation
6 bits
5 bits
5 bits
op
31
5 bits
rs
rt
rd
5 bits
6 bits
shamt funct
0
• Used for arithmetic, logical, shift instructions
–
–
–
–
–
–
op: Basic operation of the instruction (opcode)
rs: first register source operand
rt: second register source operand
rd: register destination operand
shamt: shift amount (more about this later)
funct: function - specific type of operation
• Also called “R-Format” or “R-Type”
Instructions
CSCE430/830
ISA-2
37. Arithmetic & Logical Instructions Binary Representation Example
• Machine language for
add $8, $17, $18
• See reference card for op, funct values
6 bits
5 bits
5 bits
op
rs
rt
rd
0
31
5 bits
17
18
8
5 bits
6 bits
shamt funct
0
32
000000 10001 10010 01000 00000 100000
CSCE430/830
0
Decimal
Binary
ISA-2
38. MIPS Data Transfer Instructions
• Transfer data between registers and memory
• Instruction format (assembly)
lw $dest, offset($addr)
sw $src, offset($addr)
load word
store word
• Uses:
– Accessing a variable in main memory
– Accessing an array element
CSCE430/830
ISA-2
39. Review: Chapter 1
• Classes of Computers and Classes of
Parallelism
• Technology Trend
• Dependability
• Performance Measurements and Benchmarks
• Principles
CSCE430/830
ISA-2
40. 5 Classes of Computers
• Personal Mobile Devices
– Cost is its primary concern
– Energy, media performance, and responsiveness
• Desktop Computing
– Price-Performance is its primary concern
• Servers
– Availability, Scalability, and Throughput
• Clusters/warehouse-scale computers
– Price-Performance, Energy
• Embedded Computer
– Price
CSCE430/830
ISA-2
41. Classes of Parallelism & Architectures
• Data-Level Parallelism
– Data items can be operated on at the same time
• Task-Level Parallelism
– Tasks can operate independently and largely in parallel
• Instruction-Level Parallelism: data-level para.
– Pipelining, speculative execution
• Vector Architectures & GPU: data-level para.
– A single instruction operates a collection of data in para.
• Thread-Level Parallelism: either data-level
para. or task-level para.
– Exploits parallelism via parallel threads
• Request-Level Parallelism: task-level para.
CSCE430/830
– Exploits parallelism via decoupled tasks
ISA-2
42. 4 ways for hardware to support
parallelism
• Single Instruction stream, Single Data stream
– SISD
• Single Instruction stream, Multiple Data streams
– SIMD, e.g., GPU, targets data-level parallelism
• Multiple Instruction streams, Single Data stream
– MISD, no commercial multiprocessor of this type
• Multiple Instruction streams, Multiple Data streams
– MIMD, e.g., multi-core processors, targets task-level parallelism
CSCE430/830
ISA-2
43. Trend in Technology
• Integrated Circuit (IC) logic technology
– Moore’s Law: a growth rate in transistor count on a chip
of about 40%-55% per year, or doubling every 18 or 24
months.
• Semiconductor DRAM
– In 2011, a growth rate in capacity: 25%-40% per year
• Flash
– A growth rate in capacity: 50%-60% per year
• Magnetic Disk
– Since 2004, it has dropped back to 40% per year.
CSCE430/830
ISA-2
44. Trend in Performance
• Bandwidth vs. Latency
– The improvement on Bandwidth is much significant than
that on Latency.
CSCE430/830
ISA-2
45. Growth in Processor Performance
Move to multi-processor
Hurdle: Power
Wall
Lack: Instructionlevel Parallelism
Parallelism: via
Pipelining
RISC
CSCE430/830
Locality: using
Cache
ISA-2
46. An example of Intel 486 CPU
released in 1992,66MHz, w/ L2 Cache, 4.96.3W
CSCE430/830
http://www.cpu-world.com/CPUs/80486/IntelA80486DX2-66.html
ISA-2
47. A CPU fan for Intel 486 CPU
CSCE430/830
http://www.cnaweb.com/486-ball-bearing-cpufan.aspx
ISA-2
48. An example of Intel Pentium 4 CPU
released in 2002, 2.8GHz, w/ 512KB Cache, 68.4W
CSCE430/830
http://www.pcplanetsystems.com/abc/product_detail
s.php?item_id=146&category_id=61
ISA-2
49. A typical CPU fan for Intel Pentium 4
http://www.dansdata.com/p4coc.htm
CSCE430/830
ISA-2
50. A special CPU fan for gaming/multimedia users
CSCE430/830
http://www.pcper.com/reviews/Cases-andCooling/Asus-Star-Ice-CPU-Cooler-Review
ISA-2
51. Trend in Power and Energy in IC
• Energydynamic
–
½ X Capacitive Load X Voltage2
• Powerdynamic
–
½ X Capacitive Load X Voltage2 X Freq. switched
• Example
– Intel 486 66MHz Voltage: 5V
– Intel Pentium 4 2.8GHz Voltage: 1.5V
– Intel Core 990x 3.4GHz Voltage: 0.8-1.375V
• Improving Energy Efficiency
– Do nothing well; Dynamic Voltage-Frequency
Scaling(DVFS); Design for typical case; Overclocking
• Powerstatic
–
CSCE430/830
Currentstatic X Voltage
ISA-2
52. Dependability
• Service Accomplishment & Service Interruption
• Transitions between 2 states: Failure & Restoration
• Measurements
– Reliability: a measure of the continuous service accomplishment
from a reference initial instant.
» MTTF: Mean time to failure
» FIT: failures per billion hours, 1/MTTF X 109
» MTTR: Mean time to repair
» MTBF: Mean time between failures = MTTF + MTTR
– Availability: a measure of the service accomplishment with
respect to the alternation between the two states.
» MTTF/(MTTF+MTTR)
» Upper bound: 100%
CSCE430/830
ISA-2
53. Performance Measurements and
Benchmarks
• Metrics
– Throughput: a total amount of work done in a given time
– Response time (Execution time): the time between the start and
the completion of an event
• Speedup of X relative to Y
– Execution timeY / Execution timeX
• Execution time
– Wall clock time: a latency to complete a task
– CPU time: only computation time
• Benchmarks
– Kernels, Toy programs, Synthetic benchmarks
– Benchmark suites: SPEC [CPU] & TPC [Transaction Processing]
– SpecRatio = Execution Timereference / Execution Timetarget
CSCE430/830
ISA-2
54. Design Principles
• Take Advantage of Parallelism
• Principle of Locality
• Focus on the Common Case
– Amdahl’s Law
– Upper bound of the speedup: ?
CSCE430/830
ISA-2
56. If we can have two drying machines
Dirty Laundry
Washing Machine
30 minutes washing
CSCE430/830
2 Drying Machines Clean Laundry
90/2=45 minutes
drying
ISA-2
60. Design Principles
• Take Advantage of Parallelism
• Principle of Locality
• Focus on the Common Case
– Amdahl’s Law
– Upper bound of the speedup:
» 1 / (1 - Fractionenhanced)
CSCE430/830
ISA-2
61. Exercise 1
• If the new processor is 10 times faster than
the original process, and we assume that the
original processor is busy with computation
40% of the time and is waiting for I/O 60% of
the time, what is the overall speedup gained
by incorporating the enhancement?
• Fractionenhanced = 0.4, Speedupenhanced = 10
• Speedupoverall = 1/(0.6+0.4/10) = 1.56
• What is the upper bound of the overall
speedup?
• Upper bound = 1/0.6 = 1.67
CSCE430/830
ISA-2
62. Exercise 2
• In a disk subsystem:
–
–
–
–
–
10 disks, each rated at 1,000,000-hour MTTF
1 ATA controller, 500,000-hour MTTF
1 power supply, 200,000-hour MTTF
1 fan, 200,000-hour MTTF
1 ATA cable, 1,000,000-hour MTTF
• Assuming the lifetimes are exponentially
distributed and that failures are independent,
compute the MTTF of the system as a whole
CSCE430/830
ISA-2
63. Exercise 2
• Because the overall failure rate of the
collection is the sum of the failure rates of the
modules, the failure rate of the system
– = 10*(1/1,000,000) + 1/500,000 + 1/200,000 + 1/200,000 +
1/1,000,000
– = 23/1,000,000 or 23,000 FIT
• Because MTTF is the inverse of the failure
rate
– MTTFsystem = 1/(23/1,000,000) = 43,500 hours
CSCE430/830
ISA-2