O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
H O W G Z I P C O M P R E S S I O N W O R K S
R A U L F R A I L E
J S C O N F E U
B E R L I N
• P H P / J S S O F T WA R E D E V E L O P E R
!
• M S ( R E S ) S T U D E N T I N
C O M P U T I N G T E C H N O L O G I E...
D ATA C O M P R E S S I O N
N O T A N E X P E R T *
D ATA C O M P R E S S I O N I S A N A M A Z I N G T O P I C
R E A L LY !
M A G I C
I T C A N B E S E E N L I K E …
flickr.com/photos/jeffkrause/6799254170
flickr.com/photos/t_e_brown/8677750589
… I T ’ S N O T
I N F O R M AT I O N T H E O RY
C L A U D E S H A N N O N
E N T R O P Y
flickr.com/photos/95303997@N07/10074330416
H = - p ( x ) l o g 2 p ( x )⎲
⎳
AV E R A G E A M O U N T O F I N F O R M AT I O N C O N TA I N E D I N E A C H M E S S A ...
225 days/year
62 %
17 days/year
6 %
flickr.com/photos/aigle_dore/5952296478flickr.com/photos/mariano-mantel/13955110319
H U M A N B R A I N
I S D E S I G N E D T O C O M P R E S S D A TA
flickr.com/photos/birthintobeing/11841180046
flickr.com/photos/neolao/3105372669flickr.com/photos/tommiephotography/6840025942
flickr.com/photos/earlysound/2186172726
M O R S E C O D E
S H O R T E R S E Q U E N C E S F O R C O M M O N C H A R A C T E R S
flickr.com/photos/amboo213/9044879...
D ATA C O M P R E S S I O N I N H T T P
GET index.html
Accept-Encoding: gzip, deflate
G Z I P + H T T P
G Z I P C O M P R E S S I O N
• D E F L A T E A L G O R I T H M
!
• D E S I G N E D B Y P H I L K A T Z
!
• U S E D I N H T T P, P N G A N D P D F
G Z I...
D E F L AT E
L Z 7 7
H U F F M A N C O D I N G+
L Z 7 7 ( VA R I AT I O N )
T H I S F I L E I S H U G E ! T H AT ' S B E C A U S E T H E F I L E I S N O T C O M P R E S S...
T H I S F I L E I S H U G E ! T H AT ' S B E C A U S E T H E F I L E I S N O T C O M P R E S S E D
L Z 7 7 ( VA R I AT I O...
H U F F M A N C O D I N G
0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0
0 1 0 0 1 1 1 1 0 0 1 0 0 0 0 0 ...
H U F F M A N C O D I N G
C H A R A C T E R F R E Q U E N C Y:
0 0 0 1 0 0 1 0 0 1 1 0 1 1 1 0 0 0 0
L 3 0
O 2 1
H 1 0 0
E...
H U F F M A N C O D I N G
L 3 1 0
O 2 1 1 1
H 1 0 0 1
E 1 1 1 0 0
W 1 0 0 1
R 1 0 0 0
D 1 1 1 0 1
_ 1 0 1 0
H U F F M A N C O D I N G
L 3 1 0
O 2 1 1 1
H 1 0 0 1
E 1 1 1 0 0
W 1 0 0 1
R 1 0 0 0
D 1 1 1 0 1
_ 1 0 1 0
0 0 1 1 1 0 0 ...
H U F F M A N C O D I N G
TA B L E 1 : L I T E R A L S + L E N G T H S
TA B L E 2 : D I S TA N C E S
B L O C K S
B L O C K 1 B L O C K 2 … B L O C K NM M M M
M O D E 1 : N O C O M P R E S S I O N
M O D E 2 : F I X E D C O D...
flickr.com/photos/functoruser/2436979033
G Z I P C O M P R E S S I O N
I M P L E M E N TAT I O N S
G N U G Z I P Z O P F L I7 - Z I P
M O D E
FA S T
M O D E
H I G H
C O M P R E S S I O N
M O D E
N O R M A L
G E N E R A L ...
G Z I P C O M P R E S S I O N
W H Y G Z I P ?
• G O O D C O M P R E S S I O N R A T I O .
• FA S T T O ( U N ) C O M P R E S S .
• I N T H E W O R S T C A S E , E X PA ...
N E W E R A L G O R I T H M S
I S S U E S T RY I N G T O A D D B Z I P 2 S U P P O R T T O C H R O M E
G Z I P C O M P R E S S I O N
B E Y O N D G Z I P
P R E P R O C E S S D ATA T O O P T I M I Z E M AT C H E S
G Z I P ( T ( D ATA ) ) < G Z I P ( D ATA )
T R A N S P O S I N G J S O N
{
"name": "John",
"country": "USA"
},
{
"name": "Stephan",
"country": "Germany"
},
{
"name":...
X M L / H T M L AT T R I B U T E S O R D E R
<input id='f1' class='field' name="f1" type="text" />
<input class="field" id...
R E F E R E N C E S
“ C o m p re s s o r H e a d ”
C o l t M c A n l i s
“ D a t a C o m p re s s i o n : T h e C o m p l e t e R e f e re n c e ”
D a v i d S a l o m o n
“ A U n i v e r s a l A l g o r i t h m f o r S e q u e n t i a l D a t a C o m p re s s i o n ”
J a c o b Z i v & A b r a...
“ A m e t h o d f o r t h e c o n s t r u c t i o n o f m i n i m u m re d u n d a n c y c o d e s ”
D a v i d A . H u ff ...
T H A N K Y O U
R a ú l F r a i l e
@ r a u l f r a i l e
How GZIP compression works - JS Conf EU 2014
Terminou este documento.
Transfira e leia offline.
Próximos SlideShares
Cómo alcanzar el éxito SEO Internacional #SEOnthebeach
Avançar
Próximos SlideShares
Cómo alcanzar el éxito SEO Internacional #SEOnthebeach
Avançar
Transfira para ler offline e ver em ecrã inteiro.

5

Compartilhar

How GZIP compression works - JS Conf EU 2014

Baixar para ler offline

Data compression is an amazing topic. Even in today’s world, with fast networks and almost unlimited storage, data compression is still relevant, especially for mobile devices and countries with poor Internet connections.

For better or worse, GZIP compression is the de-facto lossless compression method for compressing text data in websites. It is not the fastest nor the better, but provides an excellent tradeoff between speed and compression ratio. The way Internet works makes it also difficult to use newer compression methods.

This talk examines how GZIP works internally, explaining the internals of the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding. Different implementations will be compared, such as GNU GZIP, 7-ZIP and zopfli, focusing on why and how some of these implementations perform better than others.

Finally, we will try to go beyond GZIP, preprocessing our data to achieve better results. For example, transposing JSON.

How GZIP compression works - JS Conf EU 2014

  1. 1. H O W G Z I P C O M P R E S S I O N W O R K S R A U L F R A I L E J S C O N F E U B E R L I N
  2. 2. • P H P / J S S O F T WA R E D E V E L O P E R ! • M S ( R E S ) S T U D E N T I N C O M P U T I N G T E C H N O L O G I E S . ! • M A D E I N S PA I N . A B O U T M E
  3. 3. D ATA C O M P R E S S I O N
  4. 4. N O T A N E X P E R T *
  5. 5. D ATA C O M P R E S S I O N I S A N A M A Z I N G T O P I C
  6. 6. R E A L LY !
  7. 7. M A G I C I T C A N B E S E E N L I K E … flickr.com/photos/jeffkrause/6799254170
  8. 8. flickr.com/photos/t_e_brown/8677750589 … I T ’ S N O T
  9. 9. I N F O R M AT I O N T H E O RY C L A U D E S H A N N O N
  10. 10. E N T R O P Y flickr.com/photos/95303997@N07/10074330416
  11. 11. H = - p ( x ) l o g 2 p ( x )⎲ ⎳ AV E R A G E A M O U N T O F I N F O R M AT I O N C O N TA I N E D I N E A C H M E S S A G E ≈ N U M B E R O F B I T S T O R E P R E S E N T T H E M E S S A G E
  12. 12. 225 days/year 62 % 17 days/year 6 % flickr.com/photos/aigle_dore/5952296478flickr.com/photos/mariano-mantel/13955110319
  13. 13. H U M A N B R A I N I S D E S I G N E D T O C O M P R E S S D A TA flickr.com/photos/birthintobeing/11841180046
  14. 14. flickr.com/photos/neolao/3105372669flickr.com/photos/tommiephotography/6840025942 flickr.com/photos/earlysound/2186172726
  15. 15. M O R S E C O D E S H O R T E R S E Q U E N C E S F O R C O M M O N C H A R A C T E R S flickr.com/photos/amboo213/9044879245
  16. 16. D ATA C O M P R E S S I O N I N H T T P
  17. 17. GET index.html Accept-Encoding: gzip, deflate G Z I P + H T T P
  18. 18. G Z I P C O M P R E S S I O N
  19. 19. • D E F L A T E A L G O R I T H M ! • D E S I G N E D B Y P H I L K A T Z ! • U S E D I N H T T P, P N G A N D P D F G Z I P
  20. 20. D E F L AT E L Z 7 7 H U F F M A N C O D I N G+
  21. 21. L Z 7 7 ( VA R I AT I O N ) T H I S F I L E I S H U G E ! T H AT ' S B E C A U S E T H E F I L E I S N O T C O M P R E S S E D < 3 3 , 9 > S E A R C H B U F F E R ( U P T O 3 2 K B ) L O O K - A H E A D
  22. 22. T H I S F I L E I S H U G E ! T H AT ' S B E C A U S E T H E F I L E I S N O T C O M P R E S S E D L Z 7 7 ( VA R I AT I O N ) < 3 3 , 9 > L I T E R A L S · L E N G T H S · D I S TA N C E S
  23. 23. H U F F M A N C O D I N G 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 1 1 0 0 1 0 0 0 0 0 0 1 0 1 0 1 1 1 0 1 0 0 1 1 1 1 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 H 0 0 0 E 0 0 1 L 0 1 0 O 0 1 1 W 1 0 0 R 1 0 1 D 1 1 0 _ 1 1 1 H E L L O W O R L D 8 8 B I T S F I X E D - L E N G T H C O D E S 0 0 0 0 0 1 0 1 0 0 1 0 0 1 1 1 1 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 0 3 3 B I T S
  24. 24. H U F F M A N C O D I N G C H A R A C T E R F R E Q U E N C Y: 0 0 0 1 0 0 1 0 0 1 1 0 1 1 1 0 0 0 0 L 3 0 O 2 1 H 1 0 0 E 1 0 1 W 1 1 0 R 1 1 1 D 1 0 0 0 _ 1 0 0 1 H E L L O W O R L D 1 9 B I T S I T ’ S A M B I G U O U S H E L H O D O … VA R I A B L E - L E N G T H C O D E S
  25. 25. H U F F M A N C O D I N G L 3 1 0 O 2 1 1 1 H 1 0 0 1 E 1 1 1 0 0 W 1 0 0 1 R 1 0 0 0 D 1 1 1 0 1 _ 1 0 1 0
  26. 26. H U F F M A N C O D I N G L 3 1 0 O 2 1 1 1 H 1 0 0 1 E 1 1 1 0 0 W 1 0 0 1 R 1 0 0 0 D 1 1 1 0 1 _ 1 0 1 0 0 0 1 1 1 0 0 1 0 1 0 1 1 1 0 1 0 0 0 1 1 1 1 0 0 0 1 0 1 1 0 1 H E L L O W O R L D 3 2 B I T S
  27. 27. H U F F M A N C O D I N G TA B L E 1 : L I T E R A L S + L E N G T H S TA B L E 2 : D I S TA N C E S
  28. 28. B L O C K S B L O C K 1 B L O C K 2 … B L O C K NM M M M M O D E 1 : N O C O M P R E S S I O N M O D E 2 : F I X E D C O D E TA B L E S M O D E 3 : G E N E R AT E D C O D E TA B L E S
  29. 29. flickr.com/photos/functoruser/2436979033
  30. 30. G Z I P C O M P R E S S I O N I M P L E M E N TAT I O N S
  31. 31. G N U G Z I P Z O P F L I7 - Z I P M O D E FA S T M O D E H I G H C O M P R E S S I O N M O D E N O R M A L G E N E R A L R U L E : M O R E T I M E , B E T T E R C O M P R E S S I O N R AT I O I M P L E M E N TAT I O N S
  32. 32. G Z I P C O M P R E S S I O N W H Y G Z I P ?
  33. 33. • G O O D C O M P R E S S I O N R A T I O . • FA S T T O ( U N ) C O M P R E S S . • I N T H E W O R S T C A S E , E X PA N D S T H E D A TA S L I G H T LY. • M E M O RY I N D E P E N D E N T. • F R E E I M P L E M E N TA T I O N S T H A T A V O I D PA T E N T S . T R A D E O F F
  34. 34. N E W E R A L G O R I T H M S I S S U E S T RY I N G T O A D D B Z I P 2 S U P P O R T T O C H R O M E
  35. 35. G Z I P C O M P R E S S I O N B E Y O N D G Z I P
  36. 36. P R E P R O C E S S D ATA T O O P T I M I Z E M AT C H E S
  37. 37. G Z I P ( T ( D ATA ) ) < G Z I P ( D ATA )
  38. 38. T R A N S P O S I N G J S O N { "name": "John", "country": "USA" }, { "name": "Stephan", "country": "Germany" }, { "name": "Rob", "country": "USA" } { "name": [ "John", "Stephan", "Rob" ], "country": [ "USA", "Germany", "USA" ] }
  39. 39. X M L / H T M L AT T R I B U T E S O R D E R <input id='f1' class='field' name="f1" type="text" /> <input class="field" id="f2" type="text" name="f2" /> <input id="f1" class="field" name="f1" type="text" /> <input class="field" id="f2" type="text" name="f2" /> <input id="f1" class="field" name="f1" type="text" /> <input id="f2" class="field" name="f2" type="text" /> <input type="text" class="field" id="f1" name="f1" /> <input type="text" class="field" id="f2" name="f2" /> 1 7 , 7 6 % 2 7 , 1 0 % 3 8 , 3 2 % 3 8 , 3 2 % h t t p : / / g o o . g l / G g M w 2 6
  40. 40. R E F E R E N C E S
  41. 41. “ C o m p re s s o r H e a d ” C o l t M c A n l i s
  42. 42. “ D a t a C o m p re s s i o n : T h e C o m p l e t e R e f e re n c e ” D a v i d S a l o m o n
  43. 43. “ A U n i v e r s a l A l g o r i t h m f o r S e q u e n t i a l D a t a C o m p re s s i o n ” J a c o b Z i v & A b r a h a m L e m p e l
  44. 44. “ A m e t h o d f o r t h e c o n s t r u c t i o n o f m i n i m u m re d u n d a n c y c o d e s ” D a v i d A . H u ff m a n
  45. 45. T H A N K Y O U R a ú l F r a i l e @ r a u l f r a i l e
  • shahbokhari

    Apr. 3, 2015
  • nerycz

    Nov. 10, 2014
  • meutkarshkr

    Sep. 17, 2014
  • jlaso

    Sep. 14, 2014
  • JavierCane

    Sep. 14, 2014

Data compression is an amazing topic. Even in today’s world, with fast networks and almost unlimited storage, data compression is still relevant, especially for mobile devices and countries with poor Internet connections. For better or worse, GZIP compression is the de-facto lossless compression method for compressing text data in websites. It is not the fastest nor the better, but provides an excellent tradeoff between speed and compression ratio. The way Internet works makes it also difficult to use newer compression methods. This talk examines how GZIP works internally, explaining the internals of the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding. Different implementations will be compared, such as GNU GZIP, 7-ZIP and zopfli, focusing on why and how some of these implementations perform better than others. Finally, we will try to go beyond GZIP, preprocessing our data to achieve better results. For example, transposing JSON.

Vistos

Vistos totais

3.865

No Slideshare

0

De incorporações

0

Número de incorporações

193

Ações

Baixados

50

Compartilhados

0

Comentários

0

Curtir

5

×