IEEE floating point representation

Computer Organization And
Architecture
Presented by :Maskur Al Shal Sabil
ID: IT18021
Dept : Information & Communication Technology
Mawlana Bhashani Science & Technology University
10/20/2020 1IT18021

Learning Outcome
• Floating Point Representation
• IEEE 754 Standards For Floating Point
Representation
• Single Precision
• Double Precision
• Single Precision Addition
10/20/2020 IT18021 2

Floating Point
Representation
The floating point representation does not reserve any
specific number of bits for the integer part or the
fractional part. Instead it reserve a certain point for
the number and a certain number of bit where within
that number the decimal place sits called the
exponent.
10/20/2020 IT18021 3

IEEE 754 Floating point
representation
According to IEEE754 standard, the floating point
number is represented in following ways:
• Half Precision(16bit):1 sign bit,5 bit exponent & 10
bit mantissa
• Single Precision(32bit):1 sign bit,8 bit exponent &
23 bit mantissa
• Double Precision(64bit):1 sign bit,11 bit exponent &
52bit mantissa
• Extend precision(128bit):1 sign bit,15bit exponent &
112 bit mantissa
10/20/2020 IT18021 4

Floating Point
Representation
10/20/2020 IT18021 5
The floating point representation has two part : the one
signed part called the mantissa and other called the
exponent.
(sign) × mantissa × 2exponent
Sign Bit Exponent Mantissa

Decimal To Binary
Conversion
10/20/2020 IT18021 6
(55.35)10 = (?)2
(55)10=(110111)2
(0.35)10 = (010110)2
(45.45)10=(110111.010110)2
32 16 8 4 2 1
1 1 0 1 1 1
0.35 × 2 0 .7
0.7× 2 1 .4
.4 × 2 0 .8
.8× 2 1 .6
.6 × 2 1 .2
.2× 2 0 .4

Scientific Notation
- 1.602 ×10-19
sign significand Base Exponent
10/20/2020 IT18021 7

IEEE 32-bit floating
point representation
10/20/2020 IT18021 8
1-bit 8 -bit 23- bit
Number representation: (-1)S × 1.M× 2E-127
Sign Bit Biased Exponent Trailing Significand bit or
Mantissa

IEEE 32-bit floating point
representation
(45.45)10=(101101.011100)2
Step -1: Normalize the number
Step-2: Take the exponent and mantissa.
Step-3:Find. the bias exponent by adding 127
Step-3:Normalize the mantissa by adding 1.
Step -4:Set the sign bit 0 if positive otherwise 1 .
For n bit exponent bias is 2n-1-1
10/20/2020 IT18021 9

representation
10/20/2020 IT18021 10
(45.45)10 = (?)2
(45)10=(101101)2
(0.45)10 = (011100)2
(45.45)10=(101101.011100)2
32 16 8 4 2 1
1 0 1 1 0 1
0.45 × 2 0 .9
0.9 × 2 1 .8
.8 × 2 1 .6
.6 × 2 1 .2
.2 × 2 0 .4
.4 × 2 0 .8

representation
(45.45)10=(101101.011100)2
101101.011100 = 1.01101011100 × 25
Here bias exponent = 5 + 127 = 132
mantissa=01101011100
10/20/2020 IT18021 11
Sign Bit Biased Exponent Trailling Significand bit or
Mantissa

representation
(132)10=(?)2
128 64 32 16 8 4 2 1
1 0 0 0 0 1 0 0
(132)10=(10000100)2
10/20/2020 IT18021 12
0 10000100 01101011100110011001100

representation
1bit 11bits 52bits
Here we use 211-1 – 1 = 1023 as bias value.
10/20/2020 IT18021 13
Sign Bit Biased Exponent Trailling Significand bit or
Mantissa

representation
(45.45)10=(101101.011100)2
101101.011100 = 1.01101011100 × 25
Here bias exponent = 5 + 1023=1028= (10000000100)2
mantissa=01101011100
1-bit 11 -bits 52- bits
10/20/2020 IT18021 14
0 10000000100 01101011100110011001100……

Convert Floating Point To
Decimal
0100 0000 0100 0110 1011 0000 0000 0000
exponent Mantissa
Number representation: (-1)S × 1.M× 2E-127
S=0
E=(1000000)2=(64)
10
M =(.100 0110 1011 0000 0000 0000 )2=
(0.5537109375)10
(-1)0 × 1.5537109375 × 2 64-127 = 1.68453677×10−19
10/20/2020 IT18021 15

Addition of floating point
First consider addition in base 10 if exponent is the
same the just add the significand
5.0E+2
+7.0E+2
12.0E+2=1.2E+3
10/20/2020 IT18021 16

1.2232E+3 + 4.211E+5
First Normalize to higher exponent
a. Find the difference between exponents
b. Shift smaller number right by that amount
1.2232E+3=.012232E+5
10/20/2020 IT18021 17

4.211 E+5
+ 0.012232 E+5
4.223232 E+5
10/20/2020 IT18021 18

32Bit floating point addition
a 0 1101 0111 111 0011 1010 0000 1100 0011
b 0 1101 0111 000 1110 0101 1111 0001 1100
Find the 32 bit floating point number representation of
a+b .
Here,
e=(11010111)= (215)10
m= (111 0011 1010 0000 1100 0011)
10/20/2020 IT18021 19

32Bit floating point
addition
a= (-1)0 × 1. 111 0011 1010 0000 1100 0011 × 2127-215
=1.111 0011 1010 0000 1100 0011 × 212
e=(11010111)= (215)10
m= 000 1110 0101 1111 0001 1100
b= 1. 000 1110 0101 1111 0001 1100 × 212
+ a= 1.111 0011 1010 0000 1100 0011 × 212
11 . 000 0 001 1111 1111 1101 1111 × 212
10/20/2020 IT18021 20

IEEE floating point representation

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a IEEE floating point representation

Semelhante a IEEE floating point representation (20)

Último

Último (20)

IEEE floating point representation