SlideShare uma empresa Scribd logo
1 de 53
Compression in Open Source
Databases
Peter Zaitsev
CEO, Percona
MySQL Central @ OOW
26 Oct 2015
Few Words About Percona
Your Partner in
MySQL and
MongoDB Success
100% Open Source
Software
“No Lock in
Required”
Solutions and
Services
We work with
MySQL, MariaDB,
MongoDB, Amazon
RDS and Aurora
2
About the Talk
A bit of the History
Approaches to Data Compression
What some of the popular systems
implement
3
Lets Define The Term
Compression - Any
Technique to make
data size smaller
4
A bit of History
Early Computers were too slow to compress
data in Software
Hardware Encryption (ie Tape)
Compression first appears for non
performance critical data
5
We did not need it much for space…
6
Welcome to the modern age
Data Growth
outpaces HDD
improvements
Powerful CPUs Flash
Cloud
Data we store
now
7
Exponential Data Size Growth
8
Powerful CPUs
High
Performance
Multiple
Cores
9
Can Compress and Decompress Fast!
•Up to 1GB/sec
compression
•Up to 2GB/sec
decompression
Snappy,
LZ4
10
Flash
Disk space is more costly than for HDDs
Write Endurance is expensive
Want to write less data
Decent at handling fragmentation
11
Cloud
Pay for Space Pay for IOPS
More limited
Storage
Performance
Network
Performance
may be limited
12
Data we store in Databases
•Text
•JSON
•XML
•Time Series Data
•Log Files
Modern
Data
Compresses
Well!
13
COMPRESSION BASICS
Introduction into a ways of making your data smaller
14
Lossy and Lossless
Database generally use
Lossless Compression
Lossy compression done on
the application level
15
Some ways of getting data smaller
Layout
Optimizations
“Encoding”
Dictionary
Compression
Block
Compression
16
Layout Optimizations
Column Store versus Row Store
Hybrid Formats
Variable Block Sizes
17
Encoding
Depends on Data Type and Domain
Delta Encoding, Run Length Encoding (RLE)
Can be faster than read of uncompressed data
UTF8 (strings) and VLQ (Integers)
Index Prefix Compression
18
Dictionary Compression
Replacing frequent
values with
Dictionary Pointers
Kind of like STL
String
ENUM type in
MySQL
19
Block Compression
Compress “block” of data so it is smaller for
storage
Finding Patterns in Data and Efficiently
encoding them
Many Algorithms Exist: Snappy, Zlib, LZ4,
LZMA
20
Block Compression Details
Compression rate highly depends on data
Compression rate depends on block size
Speed depends on block size and data
21
Block Size Dependence (by Leif Walsh)
22
There is no one size fits all
Typically Compression
Algorithm can be selected
Often with additional
settings
23
WHERE AND HOW
Where do we compress data and how do we do that
24
Where to Compress Data
In Memory ?
In the Database Data Store ?
As Part of File System ?
Storage Hardware ?
Application ?
25
Compression in Memory
Reduce amount of
memory needed for
same working set
Reduce IO for Fixed
amount of Memory
Typically in-Memory
Performance Hit
Encoding/Dictionary
Compression are
good fit
26
Database Data Store
Reduce Database
Size on Disk
Works with all file
systems and storage
With OS cache can
be used as In-
Memory
compression variant
Dealing with
fragmentation is
common issue
27
Compression on File System Level
Works with all
Databases/Storage
Engines
Performance
Impact can be
significant
Logical Space on
disk is not reduced
ZFS
28
Compression on Storage Hardware
Hardware
Dependent
Does not reduce
space on disk
Can result in
Performance Gains
rather than free
space (SSD)
Can become a
choke point
29
By Application
No Database Support needed
Reduce Database Load and Network Traffic
Application may know more about data
More Complexity
Give up many DBMS features (search, index)
30
DESIGN CONSIDERATIONS
What makes database system to do well with compression
31
The Goal
Minimize Negative
Impact for User
Operations (Reads and
Writes)
32
Design Principles
Fast Decompression Compression in Background
Parallel
Compression/Decompression
Reduce need of Re-
Compression on Update
33
Choosing Block Size
Large Blocks
•Most efficient
for compression
•Bulky Read
Writes
Small Blocks
•Fastest to
Decompress
•Best for point
lookups
34
IMPLEMENTATION EXAMPLES
What Database systems Really do with Compression
35
MySQL “Packed” MyISAM
Compress table “offline” with myisampack
Table Becomes Read Only
Variety of compression methods are used
Only data is compressed, not indexes
Note MyISAM support index prefix compression for all indexes
36
MySQL Archive Storage Engine
Does not support indexes
Essentially file of rows with sequential
access
Uses zlib compression
37
Innodb Table Compression
Available Since MySQL 5.1
Pages compressed using zlib
Compressed page target (1K, 4K, 8K) has to be set
Both Compressed and Uncompressed pages can be cached in Buffer Pool
Per Page “log” of changes to avoid recompression
Extenrally Stored BLOBs are compressed as single entity
38
Innodb Transparent Page Compression
Available in MySQL 5.7
Zib and LZ4 Compression
Compresses pages as they are written to disk
Free space on the page is given back using “hole punching”
Originally designed to work with FusionIO NVMFS
Can cause problems for current filesystem due to very high hole number
39
Disk usage (Linkbench data set by Sunny Bains)
40
Performance on Fast SSD (FusionIO NVMFS)
41
Results on Slower SSD (Intel 730*2, EXT4)
42
Fractal Trees Compression
Available as Storage Engine for MySQL and MongoDB
Can use many compression libraries
Tunable Compression Block Size
Reduce Re-Compression due to message buffering
43
Can get a lot of compression
44
MongoDB WiredTiger Storage Engine
Engine Has many compression settings
Indexes are using Index Prefix Compression
Data Pages can be compressed using zlib or
Snappy
45
Compression Size (results by Asya Kamsky)
46
Compression in RocksDB
RocksDB – LSM Based Storage Engine for MongoDB and
MySQL
LSM works very well with compression
Supports, zlib, lz4, bzip2 compression
Can use different compression methods for different Levels
in LSM
47
Compression results from Mike Kania
48
PostgreSQL
Uses compression by default with TOAST
2KB (default) or longer Strings, BlOBs
Unlike Innodb External Storage is not required for
Compression
Recommended to use File system compression ie ZFS if
Compression is Desired
49
Summary
Compression is Important in Modern Age
Consider it for your system
Many different techniques are used to make data smaller
by databases
Compression support is rapidly changing and improving
50
Want More ?
I’m talking about MySQL Replication Options
Free (as in Beer) Moscow MySQL Users Group meetup
November 6th,
Hosted by Mail.ru
http://www.meetup.com/moscowmysql/
51
Percona Live 2016 call for paper is Open
Call for Papers Open until November 29, 2016
MySQL, MongoDB, NoSQL, Data in The Cloud
Anything to make Data Happy!
http://bit.ly/PL16Call
52
53
Thank You!
Peter Zaitsev
pz@percona.com
https://www.linkedin.com/in/peterzaitsev

Mais conteúdo relacionado

Mais procurados

[db tech showcase Tokyo 2015] D25:The difference between logical and physical...
[db tech showcase Tokyo 2015] D25:The difference between logical and physical...[db tech showcase Tokyo 2015] D25:The difference between logical and physical...
[db tech showcase Tokyo 2015] D25:The difference between logical and physical...
Insight Technology, Inc.
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Community
 

Mais procurados (20)

Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data Ceph Day San Jose - Object Storage for Big Data
Ceph Day San Jose - Object Storage for Big Data
 
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
Ceph Day Beijing - Our journey to high performance large scale Ceph cluster a...
 
WiredTiger Overview
WiredTiger OverviewWiredTiger Overview
WiredTiger Overview
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
[db tech showcase Tokyo 2015] D25:The difference between logical and physical...
[db tech showcase Tokyo 2015] D25:The difference between logical and physical...[db tech showcase Tokyo 2015] D25:The difference between logical and physical...
[db tech showcase Tokyo 2015] D25:The difference between logical and physical...
 
Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce Ceph Day San Jose - Ceph at Salesforce
Ceph Day San Jose - Ceph at Salesforce
 
How to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
How to Get a Game Changing Performance Advantage with Intel SSDs and AerospikeHow to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
How to Get a Game Changing Performance Advantage with Intel SSDs and Aerospike
 
OVHcloud Partner Webinar - Data Processing
OVHcloud Partner Webinar - Data ProcessingOVHcloud Partner Webinar - Data Processing
OVHcloud Partner Webinar - Data Processing
 
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage EngineMongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
MongoDB Evenings Boston - An Update on MongoDB's WiredTiger Storage Engine
 
A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
Disrupt the Storage & Memory Hierarchy
Disrupt the Storage & Memory HierarchyDisrupt the Storage & Memory Hierarchy
Disrupt the Storage & Memory Hierarchy
 
Redis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs TalksRedis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs Talks
 
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
Ceph Day San Jose - All-Flahs Ceph on NUMA-Balanced Server
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 Meetup
 
Engineering an Encrypted Storage Engine
Engineering an Encrypted Storage EngineEngineering an Encrypted Storage Engine
Engineering an Encrypted Storage Engine
 
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
Ceph Day San Jose - Enable Fast Big Data Analytics on Ceph with Alluxio
 
Ceph Day Melbourne - Walk Through a Software Defined Everything PoC
Ceph Day Melbourne - Walk Through a Software Defined Everything PoCCeph Day Melbourne - Walk Through a Software Defined Everything PoC
Ceph Day Melbourne - Walk Through a Software Defined Everything PoC
 
How can you successfully migrate to hosted private cloud 2020
How can you successfully migrate to hosted private cloud 2020How can you successfully migrate to hosted private cloud 2020
How can you successfully migrate to hosted private cloud 2020
 
Walk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoCWalk Through a Software Defined Everything PoC
Walk Through a Software Defined Everything PoC
 
No sql but even less security
No sql but even less securityNo sql but even less security
No sql but even less security
 

Destaque

MyRocks Табличный Движок для MySQL / Алексей Майков (Facebook) / Сергей Петру...
MyRocks Табличный Движок для MySQL / Алексей Майков (Facebook) / Сергей Петру...MyRocks Табличный Движок для MySQL / Алексей Майков (Facebook) / Сергей Петру...
MyRocks Табличный Движок для MySQL / Алексей Майков (Facebook) / Сергей Петру...
Ontico
 
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...
Ontico
 
Мониторинг ожиданий в PostgreSQL / Курбангалиев Ильдус (Postgres Professional)
Мониторинг ожиданий в PostgreSQL / Курбангалиев Ильдус (Postgres Professional)Мониторинг ожиданий в PostgreSQL / Курбангалиев Ильдус (Postgres Professional)
Мониторинг ожиданий в PostgreSQL / Курбангалиев Ильдус (Postgres Professional)
Ontico
 
Танцующий кластер. Практическое руководство дрессировщика PostgreSQL / Алексе...
Танцующий кластер. Практическое руководство дрессировщика PostgreSQL / Алексе...Танцующий кластер. Практическое руководство дрессировщика PostgreSQL / Алексе...
Танцующий кластер. Практическое руководство дрессировщика PostgreSQL / Алексе...
Ontico
 
Основы индексирования и расширенные возможности EXPLAIN в MySQL / Василий Лук...
Основы индексирования и расширенные возможности EXPLAIN в MySQL / Василий Лук...Основы индексирования и расширенные возможности EXPLAIN в MySQL / Василий Лук...
Основы индексирования и расширенные возможности EXPLAIN в MySQL / Василий Лук...
Ontico
 
Как балансировать на «сетевом» канате под куполом тяжелой нагрузки? / Сергей ...
Как балансировать на «сетевом» канате под куполом тяжелой нагрузки? / Сергей ...Как балансировать на «сетевом» канате под куполом тяжелой нагрузки? / Сергей ...
Как балансировать на «сетевом» канате под куполом тяжелой нагрузки? / Сергей ...
Ontico
 
Эволюция процесса деплоя в проекте / Денис Яковлев (2ГИС)
Эволюция процесса деплоя в проекте / Денис Яковлев (2ГИС)Эволюция процесса деплоя в проекте / Денис Яковлев (2ГИС)
Эволюция процесса деплоя в проекте / Денис Яковлев (2ГИС)
Ontico
 

Destaque (20)

Движок LMDB - особенный чемпион
Движок LMDB - особенный чемпионДвижок LMDB - особенный чемпион
Движок LMDB - особенный чемпион
 
ORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, SmallerORC: 2015 Faster, Better, Smaller
ORC: 2015 Faster, Better, Smaller
 
nginx.CHANGES.2015 / Игорь Сысоев, Валентин Бартенев (Nginx)
nginx.CHANGES.2015 / Игорь Сысоев, Валентин Бартенев (Nginx)nginx.CHANGES.2015 / Игорь Сысоев, Валентин Бартенев (Nginx)
nginx.CHANGES.2015 / Игорь Сысоев, Валентин Бартенев (Nginx)
 
MyRocks Табличный Движок для MySQL / Алексей Майков (Facebook) / Сергей Петру...
MyRocks Табличный Движок для MySQL / Алексей Майков (Facebook) / Сергей Петру...MyRocks Табличный Движок для MySQL / Алексей Майков (Facebook) / Сергей Петру...
MyRocks Табличный Движок для MySQL / Алексей Майков (Facebook) / Сергей Петру...
 
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...
Deep dive into PostgreSQL internal statistics / Алексей Лесовский (PostgreSQL...
 
Бигдата — как добывать золото из данных / Александр Сербул (1С-Битрикс)
Бигдата — как добывать золото из данных / Александр Сербул (1С-Битрикс)Бигдата — как добывать золото из данных / Александр Сербул (1С-Битрикс)
Бигдата — как добывать золото из данных / Александр Сербул (1С-Битрикс)
 
Производительность WebGL-приложений / Дмитренко Кирилл (Яндекс)
Производительность WebGL-приложений / Дмитренко Кирилл (Яндекс)Производительность WebGL-приложений / Дмитренко Кирилл (Яндекс)
Производительность WebGL-приложений / Дмитренко Кирилл (Яндекс)
 
Мониторинг ожиданий в PostgreSQL / Курбангалиев Ильдус (Postgres Professional)
Мониторинг ожиданий в PostgreSQL / Курбангалиев Ильдус (Postgres Professional)Мониторинг ожиданий в PostgreSQL / Курбангалиев Ильдус (Postgres Professional)
Мониторинг ожиданий в PostgreSQL / Курбангалиев Ильдус (Postgres Professional)
 
Cautious: IPv6 is here / Александр Азимов (Qrator Labs)
Cautious: IPv6 is here / Александр Азимов (Qrator Labs)Cautious: IPv6 is here / Александр Азимов (Qrator Labs)
Cautious: IPv6 is here / Александр Азимов (Qrator Labs)
 
Танцующий кластер. Практическое руководство дрессировщика PostgreSQL / Алексе...
Танцующий кластер. Практическое руководство дрессировщика PostgreSQL / Алексе...Танцующий кластер. Практическое руководство дрессировщика PostgreSQL / Алексе...
Танцующий кластер. Практическое руководство дрессировщика PostgreSQL / Алексе...
 
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
 
Как строить архитектуру для отказоустойчивой службы такси / Минкин Андрей (Na...
Как строить архитектуру для отказоустойчивой службы такси / Минкин Андрей (Na...Как строить архитектуру для отказоустойчивой службы такси / Минкин Андрей (Na...
Как строить архитектуру для отказоустойчивой службы такси / Минкин Андрей (Na...
 
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
 
Основы индексирования и расширенные возможности EXPLAIN в MySQL / Василий Лук...
Основы индексирования и расширенные возможности EXPLAIN в MySQL / Василий Лук...Основы индексирования и расширенные возможности EXPLAIN в MySQL / Василий Лук...
Основы индексирования и расширенные возможности EXPLAIN в MySQL / Василий Лук...
 
Принципы и приёмы обработки очередей / Константин Осипов (Mail.ru)
Принципы и приёмы обработки очередей / Константин Осипов (Mail.ru)Принципы и приёмы обработки очередей / Константин Осипов (Mail.ru)
Принципы и приёмы обработки очередей / Константин Осипов (Mail.ru)
 
Что нового и полезного в PostgreSQL 9.5 / Илья Космодемьянский (PostgreSQL-Co...
Что нового и полезного в PostgreSQL 9.5 / Илья Космодемьянский (PostgreSQL-Co...Что нового и полезного в PostgreSQL 9.5 / Илья Космодемьянский (PostgreSQL-Co...
Что нового и полезного в PostgreSQL 9.5 / Илья Космодемьянский (PostgreSQL-Co...
 
Как устроен поиск / Андрей Аксенов (Sphinx)
Как устроен поиск / Андрей Аксенов (Sphinx)Как устроен поиск / Андрей Аксенов (Sphinx)
Как устроен поиск / Андрей Аксенов (Sphinx)
 
Горизонтальное масштабирование: что, зачем, когда и как /Александр Макаров (Y...
Горизонтальное масштабирование: что, зачем, когда и как /Александр Макаров (Y...Горизонтальное масштабирование: что, зачем, когда и как /Александр Макаров (Y...
Горизонтальное масштабирование: что, зачем, когда и как /Александр Макаров (Y...
 
Как балансировать на «сетевом» канате под куполом тяжелой нагрузки? / Сергей ...
Как балансировать на «сетевом» канате под куполом тяжелой нагрузки? / Сергей ...Как балансировать на «сетевом» канате под куполом тяжелой нагрузки? / Сергей ...
Как балансировать на «сетевом» канате под куполом тяжелой нагрузки? / Сергей ...
 
Эволюция процесса деплоя в проекте / Денис Яковлев (2ГИС)
Эволюция процесса деплоя в проекте / Денис Яковлев (2ГИС)Эволюция процесса деплоя в проекте / Денис Яковлев (2ГИС)
Эволюция процесса деплоя в проекте / Денис Яковлев (2ГИС)
 

Semelhante a Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)

RAID: High-Performance, Reliable Secondary Storage
RAID: High-Performance, Reliable Secondary StorageRAID: High-Performance, Reliable Secondary Storage
RAID: High-Performance, Reliable Secondary Storage
Uğur Tılıkoğlu
 

Semelhante a Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona) (20)

Unlocking Efficiency: B-Trees in Disk Storage Management.pptx
Unlocking Efficiency: B-Trees in Disk Storage Management.pptxUnlocking Efficiency: B-Trees in Disk Storage Management.pptx
Unlocking Efficiency: B-Trees in Disk Storage Management.pptx
 
MySQL Oslayer performace optimization
MySQL  Oslayer performace optimizationMySQL  Oslayer performace optimization
MySQL Oslayer performace optimization
 
Webinar NETGEAR - Storage ReadyNAS, le novità
Webinar NETGEAR - Storage ReadyNAS, le novitàWebinar NETGEAR - Storage ReadyNAS, le novità
Webinar NETGEAR - Storage ReadyNAS, le novità
 
S de2784 footprint-reduction-edge2015-v2
S de2784 footprint-reduction-edge2015-v2S de2784 footprint-reduction-edge2015-v2
S de2784 footprint-reduction-edge2015-v2
 
Linux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQLLinux and H/W optimizations for MySQL
Linux and H/W optimizations for MySQL
 
RAID: High-Performance, Reliable Secondary Storage
RAID: High-Performance, Reliable Secondary StorageRAID: High-Performance, Reliable Secondary Storage
RAID: High-Performance, Reliable Secondary Storage
 
Management and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - SlidesManagement and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - Slides
 
HPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big DataHPC DAY 2017 | HPE Storage and Data Management for Big Data
HPC DAY 2017 | HPE Storage and Data Management for Big Data
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
 
Open Ware Ramsan Dram Ssd
Open Ware Ramsan  Dram SsdOpen Ware Ramsan  Dram Ssd
Open Ware Ramsan Dram Ssd
 
A Front-Row Seat to Ticketmaster’s Use of MongoDB
A Front-Row Seat to Ticketmaster’s Use of MongoDBA Front-Row Seat to Ticketmaster’s Use of MongoDB
A Front-Row Seat to Ticketmaster’s Use of MongoDB
 
R1Soft CDP 3.0 Key Features
R1Soft CDP 3.0 Key FeaturesR1Soft CDP 3.0 Key Features
R1Soft CDP 3.0 Key Features
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
 
Oracle Exec Summary 7000 Unified Storage
Oracle Exec Summary 7000 Unified StorageOracle Exec Summary 7000 Unified Storage
Oracle Exec Summary 7000 Unified Storage
 
Final presentasi gnome asia
Final presentasi gnome asiaFinal presentasi gnome asia
Final presentasi gnome asia
 
Shared Hosting Hepsia.Co plan and feature
Shared Hosting Hepsia.Co plan and featureShared Hosting Hepsia.Co plan and feature
Shared Hosting Hepsia.Co plan and feature
 
Ovh webmail | Ovh Server
Ovh webmail | Ovh ServerOvh webmail | Ovh Server
Ovh webmail | Ovh Server
 
Dba tuning
Dba tuningDba tuning
Dba tuning
 
Apresentacao Solid Access Corp Presentation Openware 5 20 10
Apresentacao Solid Access Corp Presentation Openware 5 20 10Apresentacao Solid Access Corp Presentation Openware 5 20 10
Apresentacao Solid Access Corp Presentation Openware 5 20 10
 

Mais de Ontico

Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Ontico
 

Mais de Ontico (20)

One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
 
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
 
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
 
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
 
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
 
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
 
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
 
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
 
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
 
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
 
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
 
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
 
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
 
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
 
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
 
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
 
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
 
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
 

Último

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
Hazard Identification (HAZID) vs. Hazard and Operability (HAZOP): A Comparati...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 

Ужимай и властвуй алгоритмы компрессии в базах данных / Петр Зайцев (Percona)

  • 1. Compression in Open Source Databases Peter Zaitsev CEO, Percona MySQL Central @ OOW 26 Oct 2015
  • 2. Few Words About Percona Your Partner in MySQL and MongoDB Success 100% Open Source Software “No Lock in Required” Solutions and Services We work with MySQL, MariaDB, MongoDB, Amazon RDS and Aurora 2
  • 3. About the Talk A bit of the History Approaches to Data Compression What some of the popular systems implement 3
  • 4. Lets Define The Term Compression - Any Technique to make data size smaller 4
  • 5. A bit of History Early Computers were too slow to compress data in Software Hardware Encryption (ie Tape) Compression first appears for non performance critical data 5
  • 6. We did not need it much for space… 6
  • 7. Welcome to the modern age Data Growth outpaces HDD improvements Powerful CPUs Flash Cloud Data we store now 7
  • 10. Can Compress and Decompress Fast! •Up to 1GB/sec compression •Up to 2GB/sec decompression Snappy, LZ4 10
  • 11. Flash Disk space is more costly than for HDDs Write Endurance is expensive Want to write less data Decent at handling fragmentation 11
  • 12. Cloud Pay for Space Pay for IOPS More limited Storage Performance Network Performance may be limited 12
  • 13. Data we store in Databases •Text •JSON •XML •Time Series Data •Log Files Modern Data Compresses Well! 13
  • 14. COMPRESSION BASICS Introduction into a ways of making your data smaller 14
  • 15. Lossy and Lossless Database generally use Lossless Compression Lossy compression done on the application level 15
  • 16. Some ways of getting data smaller Layout Optimizations “Encoding” Dictionary Compression Block Compression 16
  • 17. Layout Optimizations Column Store versus Row Store Hybrid Formats Variable Block Sizes 17
  • 18. Encoding Depends on Data Type and Domain Delta Encoding, Run Length Encoding (RLE) Can be faster than read of uncompressed data UTF8 (strings) and VLQ (Integers) Index Prefix Compression 18
  • 19. Dictionary Compression Replacing frequent values with Dictionary Pointers Kind of like STL String ENUM type in MySQL 19
  • 20. Block Compression Compress “block” of data so it is smaller for storage Finding Patterns in Data and Efficiently encoding them Many Algorithms Exist: Snappy, Zlib, LZ4, LZMA 20
  • 21. Block Compression Details Compression rate highly depends on data Compression rate depends on block size Speed depends on block size and data 21
  • 22. Block Size Dependence (by Leif Walsh) 22
  • 23. There is no one size fits all Typically Compression Algorithm can be selected Often with additional settings 23
  • 24. WHERE AND HOW Where do we compress data and how do we do that 24
  • 25. Where to Compress Data In Memory ? In the Database Data Store ? As Part of File System ? Storage Hardware ? Application ? 25
  • 26. Compression in Memory Reduce amount of memory needed for same working set Reduce IO for Fixed amount of Memory Typically in-Memory Performance Hit Encoding/Dictionary Compression are good fit 26
  • 27. Database Data Store Reduce Database Size on Disk Works with all file systems and storage With OS cache can be used as In- Memory compression variant Dealing with fragmentation is common issue 27
  • 28. Compression on File System Level Works with all Databases/Storage Engines Performance Impact can be significant Logical Space on disk is not reduced ZFS 28
  • 29. Compression on Storage Hardware Hardware Dependent Does not reduce space on disk Can result in Performance Gains rather than free space (SSD) Can become a choke point 29
  • 30. By Application No Database Support needed Reduce Database Load and Network Traffic Application may know more about data More Complexity Give up many DBMS features (search, index) 30
  • 31. DESIGN CONSIDERATIONS What makes database system to do well with compression 31
  • 32. The Goal Minimize Negative Impact for User Operations (Reads and Writes) 32
  • 33. Design Principles Fast Decompression Compression in Background Parallel Compression/Decompression Reduce need of Re- Compression on Update 33
  • 34. Choosing Block Size Large Blocks •Most efficient for compression •Bulky Read Writes Small Blocks •Fastest to Decompress •Best for point lookups 34
  • 35. IMPLEMENTATION EXAMPLES What Database systems Really do with Compression 35
  • 36. MySQL “Packed” MyISAM Compress table “offline” with myisampack Table Becomes Read Only Variety of compression methods are used Only data is compressed, not indexes Note MyISAM support index prefix compression for all indexes 36
  • 37. MySQL Archive Storage Engine Does not support indexes Essentially file of rows with sequential access Uses zlib compression 37
  • 38. Innodb Table Compression Available Since MySQL 5.1 Pages compressed using zlib Compressed page target (1K, 4K, 8K) has to be set Both Compressed and Uncompressed pages can be cached in Buffer Pool Per Page “log” of changes to avoid recompression Extenrally Stored BLOBs are compressed as single entity 38
  • 39. Innodb Transparent Page Compression Available in MySQL 5.7 Zib and LZ4 Compression Compresses pages as they are written to disk Free space on the page is given back using “hole punching” Originally designed to work with FusionIO NVMFS Can cause problems for current filesystem due to very high hole number 39
  • 40. Disk usage (Linkbench data set by Sunny Bains) 40
  • 41. Performance on Fast SSD (FusionIO NVMFS) 41
  • 42. Results on Slower SSD (Intel 730*2, EXT4) 42
  • 43. Fractal Trees Compression Available as Storage Engine for MySQL and MongoDB Can use many compression libraries Tunable Compression Block Size Reduce Re-Compression due to message buffering 43
  • 44. Can get a lot of compression 44
  • 45. MongoDB WiredTiger Storage Engine Engine Has many compression settings Indexes are using Index Prefix Compression Data Pages can be compressed using zlib or Snappy 45
  • 46. Compression Size (results by Asya Kamsky) 46
  • 47. Compression in RocksDB RocksDB – LSM Based Storage Engine for MongoDB and MySQL LSM works very well with compression Supports, zlib, lz4, bzip2 compression Can use different compression methods for different Levels in LSM 47
  • 48. Compression results from Mike Kania 48
  • 49. PostgreSQL Uses compression by default with TOAST 2KB (default) or longer Strings, BlOBs Unlike Innodb External Storage is not required for Compression Recommended to use File system compression ie ZFS if Compression is Desired 49
  • 50. Summary Compression is Important in Modern Age Consider it for your system Many different techniques are used to make data smaller by databases Compression support is rapidly changing and improving 50
  • 51. Want More ? I’m talking about MySQL Replication Options Free (as in Beer) Moscow MySQL Users Group meetup November 6th, Hosted by Mail.ru http://www.meetup.com/moscowmysql/ 51
  • 52. Percona Live 2016 call for paper is Open Call for Papers Open until November 29, 2016 MySQL, MongoDB, NoSQL, Data in The Cloud Anything to make Data Happy! http://bit.ly/PL16Call 52