Quick tutorial on IEEE 754 FLOATING POINT representation

QUICK TUTORIAL ON IEEE 754
FLOATING POINT REPRESENTATION
-by
RITU RANJAN SHRIVASTWA

Decimal to IEEE 754 Floating point
representation
There are 32 bits in Standard IEEE 754 representation of floating point numbers in binary and
is divided into three parts namely:
• Sign bit
• Exponent
• Mantissa
The representation in bit format is as follows
Sign bit
1 or 0

EXPONENT
8 bits

MANTISSA
23 bits

To be represented in this format, a number should be in the following normalized form.
(+ or -) 1.(mantissa) x 2^(exponent)
Sometimes in question it asks not to convert in normalized form, otherwise it should be
converted to its normalized form

representation
To convert a number into its normalized form, we need to do the following:
For example, we will take the decimal number +4.6
We see that the number before decimal is not equal to 1 which means we need to convert it
into normalized form and bring 1 there. To do this, we need to keep dividing it by 2 till we get
the normalized form with just 1 left before decimal.
This means
4.6 / 2
2.3 / 2

= 2.3
= 1.15

Hence we get the normalized form and we can write
+4.6  1.15 x 22
Now we will represent this using IEEE 754 standard

representation
We have +1.15 x 22 to represent
1. The sign bit will be ‘0’ as the number is positive
2. The exponent will be 127+2=129 (here we are using 127 as bias value because, the 8 bit
exponent part can accommodate 256 values i.e., 0-255. In this range we need to display
both positive and negative powers, thus we use the first 128 numbers(0-127) to denote
negative power and next 128(128-255) for positive power. Thus unless mentioned as
Excess-128 or Excess-64, we will use 2n-1 as the Bias value where n is the number of bits
in the exponent part.) Hence, if the power had been negative, then the exponent value
would have been
127+(-2) = 127-2 = 125
3. Since we have got our sign bit, and exponent, lets fill them up in the bit pattern.
0

10000001
12910  100000012

MANTISSA
23 bits

representation
Now we need to find out the mantissa part.
First of all, not that the ‘1’ is NOT represented in the bit pattern since it is in the normalized
form, it is known that the ‘1’ will exist. Thus in the mantissa part only the decimal part i.e.,
(0.15) need to be represented.
Let us convert the 0.15 to binary
0.15 x 2 = 0.3 0
0.3 x 2 = 0.6
0
0.6 x 2 = 1.2
1
(i)
0.2 x 2 = 0.4
0
0.4 x 2 = 0.8
0
0.8 x 2 = 1.6
1
(ii)
Now the value from (i) till (ii) will continue to recur and we will keep recurring it till 23 bits
are filled.
Thus the bits obtained are 00100110011001100110011
Hence the bit pattern in the 32 bit format are
0

10000001

00100110011001100110011

 (40933333)16

EXAMPLE PROBLEM

NOTE : In this question,
The total no of bits is only
16. They have given the
bias as 64, where it
should be 63, so you
need to use 64. And also,
the given number need
not be converted into its
normal form

IEEE 754 Floating point to Decimal
conversion
You need to do just the reverse of the above which is very simple.
For example:
Given Binary representation: 11000001101111110……0
Thus we will break it into three parts as:
1

10000011

01111110000000000000000

We clearly see that the number is negative and the power is 131-127 = 4
Mantissa is: 2-1x0 + 2-2x1 + 2-3x1 + 2-4x1 + 2-5x1 + 2-6x1 + 2-7x1 = 0.4921875
The number is -1.4921875 x 24 [note the ‘1’ is added before the 0 in the normal fom]
Which is equal to -23.875
ANS: -23.875

Quick tutorial on IEEE 754 FLOATING POINT representation

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Quick tutorial on IEEE 754 FLOATING POINT representation

Semelhante a Quick tutorial on IEEE 754 FLOATING POINT representation (20)

Último

Último (20)

Quick tutorial on IEEE 754 FLOATING POINT representation