1. C E N T R A L U N I V E R S I T Y O F K A S H M I R
Tulmulla Campus, Ganderbal
BTCS-303
DATA S T R U C T U R E S
Huffman Tree
Presented by: Sameem Makhdoomi
Enrollment no. 2124CUKmr09
2. Contents:
1) What is Haffman’s coding?
2) Why do we need it?
3) What is fixed sized codes.
4) What are variable sized codes.
5) How to encode and decode Huffman’s coding?
3. Huffman Coding is a technique of compressing data to reduce
it’s size without losing any of the details. It was first developed
by David Huffman.
Huffman Coding is generally useful to compress the data in
which there are frequently occuring characters.
What is Huffman Coding?
4. 1)Suppose you want to store the data in a file. You can store
the data in compressed form to reduce the size of a file
2)If we want to send the data over a network, the data can
be compressed and sent to reduce the cost of transmission
Why do we need Huffman Coding?
5.
To understand what will be the cost of a message or the size of that message, all that will be measured in terms of bits
Electronic devices or computers all use Binary codes for sending
message of characters or English alphabets
Message - BCCABBDDAECCBBAEDDCC
Length – 20 ASCII – 8bits
Suppose, I have a message here now this message I want to store it in a file or send it over a network
Characters HEXA BINARY
A 41 01000001
B 42 01000010
C 43 01000011
D 44 01000100
E 45 01000101
8 bits for each alphabet and for this message
containing 20 alphabets the size will be:
8 * 20 = 160 bits
7. Q1: Do i need 8 bit coding for just 5 capital alphabets?
Ans. NO! Few bits can be sufficient.
[0/1] in _ 1bit (2 combinations)
[00,01,10,11] in _ _ 2bits (4 combinations)
That is why we will take _ _ _3bits because it will have [8 combinations]
Q2: Can i use my own codes with less amount of bits?
Ans. YES! That we will learn in next slides.
8. BCCABBDDAECCBBAEDDCC (20)
Coded message: 001010010000001001011011000100010010001001000100011011010010
Character Count /
Frequency
Code
A 3 000
B 5 001
C 6 010
D 4 011
E 2 100
3 bits for each alphabet and for this
message containing 20 alphabets the
size will be:
3 * 20 =60 bits
But that’s not enough!
Fixed Sized coding:
9. We must send the chart of codes along with the coded
message to know whats in the message / to decode it.
5 alphabets in original form with ascii codes:
5 * 8 =40bits +
3bits for each code: 3 * 5 = 15bits
Thus table is of size: 55bits
Thus total size of message along with the table will be:
60 +55 = 115bits
(around 35 to 40% reduction in size from 160 to 115)
A 000
B 001
C 010
D 011
E 100
10. Variable Sized coding (Huffman’s Coding):
I. Huffman says that we don't have to take fixed size codes for the
alphabets/elements.
II.Some characters or alphabets may be appearing less number of times,
some may be appearing more number of times so if you give small size
code for the more appearing characters then the size of the entire
message will be definitely reduced.
11. How to encode Huffman’s tree?
Huffman code follows Optimal Marsh pattern (i.e merging together the
minimum sorted elements to get a single element).
Huffman has given an approach for getting our own variable size code
Suppose the string below is to sent over a network:
B C A A D D D C C A C A C A C
12. Size of this message without coding: 15 * 8 = 120bits
Calculate the frequency of each character in the string
B C A A D D D C C A C A C A C
1 6 5 3
B C A D
Given string:
13. Sort the characters in increasing order of the frequency.
1 6 5 3 1 3 5 6
B C A D B D A C
14. 1 3 5 6 4 5 6
B D A C * A C
Summing up w1 and w2, where w1 and w2
are the first 2 in the increasing order
queue
w1 w2
4 w1+w2
w1 1
B
w2 3
D
15. 4 5 6
* A C
4
1 C 3
B D
1 3
B D
4 5
9
A
w1 w2
w1 w2
9 6
* C
w1+w2
16. 6 9
C *
1 3
B D
5
A
6 9
15
0 1
1
C
0 1
w2
0
4
w1 w2
w1
w1+w2
17. 1 3
B D
5
A
6
15
0 1
9
0
4
C
0
1
1
Character Frequency Code Code Size
A 5 11 5 * 2 =10
B 1 100 1 * 3 =3
C 6 0 6 * 1 =6
D 3 101 3 * 3 =9
4*8=32bits 15 9 bits 28 bits
B C A A D D D C C A C A C A C
Coded message: 100 0 11 11 101 101 101 0 0 11 0 11 0 11 0
18. 1 3
B D
5
A
6 9
15
0 1
0
4
C
0
1
1
coded message to know whats in the message/
to decode it.
4 alphabets in original form with ascii codes:
4 * 8 =32bits +
Random bits for every code:
2+3+1+3=9bits
Thus tree/table is of size: 32+9 =41bits
Random bits for each alphabet and for this message
containing 15 alphabets the size will be:
5*2+3*1+6*1+3*3 =28bits
We must send the tree/table along with the
A 11
B 100
C 0
D 101
19. B C A A D D D C C A C A C A C
A 11
B 100
C 0
D 101
Coded message: 100 0 11 11 101 101 101 0 0 11 0 11 0 11 0
Thus total size of coded message along with the tree/table will be:
Coded message 28 +tree 41 = 69bits
Size of this message without coding: 15 * 8 = 120bits
Thus Huffman Coding reduced size of message from
120bits to just 69bits.
20. 1 3
B D
5
A
6 9
15
0 1
0
4
C
0
1
1
How to decode Huffman’s tree?
Coded message: 100 0 11 11 101 101 101 0 0 11 0 11 0 11 0
Decoded message: BCAADDDCCACACAC
To decode the coded message we must have a tree/
table and a coded message.
I. We start traversing the tree using the coded
message and by the help of 0’s and 1’s on the
edges of the tree we can get our decoded
message.
21. Character Frequency
M 1
P 2
Y 4
S 4
3
7
11
4
Y
0
1
M
1
2
P
0 1
0
4
S
1
Example.
Coded message: 100101110011101011011
22. Element Code Code length
M 100 3
P 101 3
Y 11 2
S 0 1
Decoded message: MPYSSYPSYSY
3
7
11
0
1
M
1
2
P
0 1
4
Y
0
4
S
1