2. Contents:
Introduction
Need for Data Compression
Types of Compression
Lossless Compression
Lossy Compression
Huffman Coding
Applications
Summary
3. Introduction
What is Data compression?
Data compression is a technique in which
information in the file is encoded in such a way
that it takes up less space.
4. Need for Data Compression
Compression reduces the size of file:
a. To save space when storing it.
b. To save time when transmitting it.
c. Most files have lot of redundancy.
5. Continued….
Who needs Compression?
Parkinson’s Law: Data expands to fill available
space.
Moore’s Law: # Transistors on a chip doubles
every 18-24 months.
Text , images ,sounds ,videos ,etc.
6. Types of Compression
Lossless Compression
Lossy Compression
Origin
al
Data
Decompres
sor
(decoder)
Compresso
r
(encoder)
Compress
ed
data
7. Lossless Compression
A lossless compression algorithm eliminates only
redundant information, so one can easily recover
the data upon decompression of the file.
Lossless data compression is compression
without any loss of data quality. The
decompressed is replica of original file.
It is used when it is important that original and
decompressed data be identical.
Some image formats, notably PNG, use only
loseless compression, while those likes TIFF may
use either lossy or lossless methods.
9. Lossy Compression
A Lossy compression method is one where that
compressing data and then decompressing it
retrieves data that may well be different from the
original, but close enough to be useful in some
way.
The algorithm eliminates irrelevant data as well,
and permits only an approximate reconstruction
of original file.
Lossy compression is dangerously attractive
because it can provide compression ratios of
100:1 and 200:1, depending on the type of
information being compressed. But the cost is
loss of data.
10. Examples of LOSSY Methods
are:
PCM
JPEG
MPEG
JPEG-Compression (left to right:
decreasing quality setting results in a 8x8
block generation of pixels)
11. Huffman Coding
Huffman coding compression technique involves
preliminary analysis of the frequency of
occurrence of symbols. Huffman technique
creates, for each symbol, a binary data code, the
length of which is inversely related to the
frequency of occurrence.
In this technique we just add symbols at leaf node
and root of this leaf node given values of addition
of two leafs, and tree is generated.
It gives us the best way for encoding any data
string using minimum bits.
12. Example:
Consider a string ABRA KA DABRA.
OCCURRENCE-
A=5 Let A=010
B=2 B=11
R=2 K=00
D=1 D=10
K=1 R=011
13. Therefore, if we write the given string in Binary form then,
i.e. ABRAKADABRA
BINARY FORM….
01011011010000101001011011010
29 bits
Now apply Huffman coding logic:
Here, the weight of D and K is 1. As occurrence of both the characters
is 1 time. Therefore, the weight of parent node is 1+1=2 i.e.
Addition of weight oh two leaf nodes.
Similarly for other characters……
D K
2
0 1
Weight => 1 1
14. 11
6
4 2
R
B
D K
A
0 1
0 1
0
0 1
1
Here, LHS nodes have
value = 0 and
RHS nodes have
value= 1
So,
from fig. the values will
be-
A=0
B=101
D=110
K=111
R=100
Now the representation of
string is-
0101100011101100101100
0
Tree diagram for given
string by
Huffman coding
5
2 2
1 1
16. Summary:
Data Compression:
Is the substitution of frequently occurring data
items, or symbols with short code that requires
fewer bits for storage than the original symbols.
Saves space, but requires time to save and
extract. Success varies with types of data. Works
best on data with low spatial variability and limited
possible values. Works poorly on high spatial
variability data or continuous surface. Exploits
inherent redundancy and irrelevancy by
transforming a data file into a smaller one.
17. Multiple Choice Questions:
I. Which one of the image format uses only Lossless Data
Compression?
a. GIF
b. PNG
c. TIFF
d. JPEG
II. On which type of data, compression is not suitable…
a. Matrix
b. Highly spatial
c. Vector
d. None of these
18. Continued…
III. Which one of the following technique is irreversible in nature.
a. Run-length
b. PCM
c. Both a and b
d. None of these
IV. MPEG ______ type of compression.
a. Lossy compression
b. Lossless compression
c. Both
d. None of these