O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Data Compression (Huffman)

519 visualizações

Publicada em

  • Entre para ver os comentários

Data Compression (Huffman)

  1. 1. Turn off the lights
  2. 2. Clip End
  3. 3. Data Compression Muhammad Raza Master (B12101085) Muhammad Ali Mehmood (B12101065) Syed Faraz Naqvi (B12101123) -Department of Computer Science, University of Karachi
  4. 4. Reduction in size of data
  5. 5. Save storage when saving information Save time when communicating information
  6. 6. Compression Lossless Lossy
  7. 7. • Image Compression • Audio Compression • Video compression • All Sort of Data Compression
  8. 8. TREE • Sum of children’s frequency • Reference of B-Tree(0/1) * Char variable * Frequency * Reference of B-Tree(0/1)
  9. 9. APPLICATION • Find an object with a certain property in a collection of objects of a certain type • Items in a list be stored so that an item can be easily located • Efficient encoding of set of characters by bit strings
  10. 10. TRAVERSING IN TREE • IN-ORDER TRAVERSAL • PREORDER TRAVERSAL • POSTORDER TRAVERSAL
  11. 11. 4 12 18 24 10 22 31 44 66 90 35 70 15 50 25 Pre-Order In-Order Post-order 1. Visit the root Traverse the left subtree Traverse the left subtree 2. Traverse the left subree Visit the root Traverse the right subtree 3. Traverse the right subtree Traverse the right subtree Visit the root Pre-Order: 25, 15, 10, 4, 12, 22, 18, 24, 50, 35, 31, 44, 70, 66, 90 In-Order: 4, 10, 12, 15, 18, 22, 24, 25, 31, 35, 44, 50, 66, 70, 90 Post Order: 4, 12, 10, 18, 24, 22, 15, 31, 44, 35, 66, 90, 70, 50, 25
  12. 12. • By Dr. David Huffman (1952) • First data compression algorithm • An example of ‘LOSSLESS DATA COMPRESSION’ • Binary tree is used to construct Huffman encoding algorithm Introduction
  13. 13. Basic Idea Largest occurring char has the least encoded bit. Save bits by encoding frequently used characters with fewer bits than rarely used characters
  14. 14. HUFFMAN(X) • Compute frequency f(c) for each character c in X. • Let Q be an empty priority queue • Insert every character c into Q as singleton trees with key f(c) • while Q.SIZE() > 1 – Do • f1 ← Q.MIN-KEY() • T1 ← Q.REMOVE-MIN() • f2 ← Q.MIN-KEY() • T2 ← Q.REMOVE-MIN() • Let T be a new tree with left subtree T1 and right subtree T2 • Q.INSERT(T, f1 + f2) • Return Q.REMOVE-MIN()
  15. 15. it was the best of times it was the worst of times. Symbol Count LF 1 b 1 r 1 f 2 h 2 m 2 a 2 w 3 o 3 i 4 e 5 s 6 t 8 space 11 (full stop) = LF Example:
  16. 16. Symbol Bits LF 101010 b 101011 r 10100 f 11000 h 11001 m 11010 a 11011 w 0010 o 0011 i 1011 e 000 s 100 t 111 space 01
  17. 17. Example#1: HumeraTariq Symbol Count H 1 u 1 m 1 e 1 r 2 a 2 T 1 I 1 q 1 H u m e T i 2 2 2 q 4 3 r a 7 4 11 0 1 1 1 11 1 1 10 0 0 0 00 0
  18. 18. m = HumeraTariq Symbol Bits H 0000 u 0001 m 0010 e 0011 r 10 a 11 T 0100 i 0101 q 0110 Compressed Bit-stream C(m) = 000000010010001110110100111001010110
  19. 19. The length of the encoded bit-stream is the sum over all letters of the number of occurrences times the number of bits per occurrence Compressed bit-stream = frequency * Distance
  20. 20. E.g: m= HumeraTariq • At distance: – 4: six leaf (‘H’, ‘u’, ‘m’, ‘e’, ‘T’, ‘i’, with total frequency 6) – 3: one leaf (‘q’, with frequency 1) – 2: two leaf nodes (‘r’ and ‘a’, with total frequency 4) • Compressed bit-stream = frequency * Distance • total = 4·6 + 3·1 + 2.4 = 35 is the length of compressed bit-stream as expected Proved!!
  21. 21. Let d be the number of symbols, n be the length of the input Huffman’s algorithm runs in O(n + d log d) time
  22. 22. We can apply it to any bytestream Milestone of LZW compression
  23. 23. REFERENCES • Robert Sedgewick and Kevin Wayne - Algorithms, (4th edition) • https://blog.itu.dk/BADS-F2009/files/2009/04/46-huffman.pdf • Discrete Mathematics and Its Applications (7th Edition-Rosen)

×