Enviar pesquisa
Carregar
HBaseCon 2013: Compaction Improvements in Apache HBase
•
46 gostaram
•
19,119 visualizações
Cloudera, Inc.
Seguir
Presented by: Sergey Shelukhin, Hortonworks
Leia menos
Leia mais
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 33
Recomendados
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Hadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
Apache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes
Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes
DataWorks Summit
Final terraform
Final terraform
Gourav Varma
Containers 101
Containers 101
Black Duck by Synopsys
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table Snapshots
Cloudera, Inc.
HBase Storage Internals
HBase Storage Internals
DataWorks Summit
Recomendados
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
Hadoop Security Architecture
Hadoop Security Architecture
Owen O'Malley
Apache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes
Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes
DataWorks Summit
Final terraform
Final terraform
Gourav Varma
Containers 101
Containers 101
Black Duck by Synopsys
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table Snapshots
Cloudera, Inc.
HBase Storage Internals
HBase Storage Internals
DataWorks Summit
Docker Tutorial For Beginners | What Is Docker And How It Works? | Docker Tut...
Docker Tutorial For Beginners | What Is Docker And How It Works? | Docker Tut...
Simplilearn
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
Jignesh Shah
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
DataWorks Summit/Hadoop Summit
Docker Commands With Examples | Docker Tutorial | DevOps Tutorial | Docker Tr...
Docker Commands With Examples | Docker Tutorial | DevOps Tutorial | Docker Tr...
Edureka!
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Cloudera, Inc.
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
VMware vSphere 6.0 - Troubleshooting Training - Day 5
VMware vSphere 6.0 - Troubleshooting Training - Day 5
Sanjeev Kumar
MongoDB WiredTiger Internals
MongoDB WiredTiger Internals
Norberto Leite
Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3
Jukka Zitting
An introduction to terraform
An introduction to terraform
Julien Pivotto
HDFS Federation
HDFS Federation
Hortonworks
Apache HBase™
Apache HBase™
Prashant Gupta
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
DataWorks Summit
Kubernetes Webinar - Using ConfigMaps & Secrets
Kubernetes Webinar - Using ConfigMaps & Secrets
Janakiram MSV
Une introduction à HBase
Une introduction à HBase
Modern Data Stack France
HBase Advanced - Lars George
HBase Advanced - Lars George
JAX London
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
Cloudera, Inc.
Apache web service
Apache web service
Manash Kumar Mondal
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Cloudera, Inc.
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
Mais conteúdo relacionado
Mais procurados
Docker Tutorial For Beginners | What Is Docker And How It Works? | Docker Tut...
Docker Tutorial For Beginners | What Is Docker And How It Works? | Docker Tut...
Simplilearn
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
Jignesh Shah
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
DataWorks Summit/Hadoop Summit
Docker Commands With Examples | Docker Tutorial | DevOps Tutorial | Docker Tr...
Docker Commands With Examples | Docker Tutorial | DevOps Tutorial | Docker Tr...
Edureka!
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Cloudera, Inc.
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Cloudera, Inc.
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
Apache Hadoop and HBase
Apache Hadoop and HBase
Cloudera, Inc.
VMware vSphere 6.0 - Troubleshooting Training - Day 5
VMware vSphere 6.0 - Troubleshooting Training - Day 5
Sanjeev Kumar
MongoDB WiredTiger Internals
MongoDB WiredTiger Internals
Norberto Leite
Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3
Jukka Zitting
An introduction to terraform
An introduction to terraform
Julien Pivotto
HDFS Federation
HDFS Federation
Hortonworks
Apache HBase™
Apache HBase™
Prashant Gupta
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
DataWorks Summit
Kubernetes Webinar - Using ConfigMaps & Secrets
Kubernetes Webinar - Using ConfigMaps & Secrets
Janakiram MSV
Une introduction à HBase
Une introduction à HBase
Modern Data Stack France
HBase Advanced - Lars George
HBase Advanced - Lars George
JAX London
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
Cloudera, Inc.
Apache web service
Apache web service
Manash Kumar Mondal
Mais procurados
(20)
Docker Tutorial For Beginners | What Is Docker And How It Works? | Docker Tut...
Docker Tutorial For Beginners | What Is Docker And How It Works? | Docker Tut...
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
Docker Commands With Examples | Docker Tutorial | DevOps Tutorial | Docker Tr...
Docker Commands With Examples | Docker Tutorial | DevOps Tutorial | Docker Tr...
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Apache Hadoop and HBase
Apache Hadoop and HBase
VMware vSphere 6.0 - Troubleshooting Training - Day 5
VMware vSphere 6.0 - Troubleshooting Training - Day 5
MongoDB WiredTiger Internals
MongoDB WiredTiger Internals
Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3
An introduction to terraform
An introduction to terraform
HDFS Federation
HDFS Federation
Apache HBase™
Apache HBase™
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Kubernetes Webinar - Using ConfigMaps & Secrets
Kubernetes Webinar - Using ConfigMaps & Secrets
Une introduction à HBase
Une introduction à HBase
HBase Advanced - Lars George
HBase Advanced - Lars George
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
Apache web service
Apache web service
Semelhante a HBaseCon 2013: Compaction Improvements in Apache HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
Cloudera, Inc.
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
Ozone and HDFS’s evolution
Ozone and HDFS’s evolution
DataWorks Summit
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Ozone and HDFS's Evolution
Ozone and HDFS's Evolution
DataWorks Summit
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
DataWorks Summit/Hadoop Summit
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
feng1212
HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014
larsgeorge
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
HBaseCon
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
Ozone and HDFS’s evolution
Ozone and HDFS’s evolution
DataWorks Summit
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
HBaseCon
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
DataWorks Summit
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
Optimizing Hive Queries
Optimizing Hive Queries
DataWorks Summit
Optimizing Hive Queries
Optimizing Hive Queries
Owen O'Malley
HBase for Architects
HBase for Architects
Nick Dimiduk
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale
Perforce
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
DataWorks Summit
Semelhante a HBaseCon 2013: Compaction Improvements in Apache HBase
(20)
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
Ozone and HDFS’s evolution
Ozone and HDFS’s evolution
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
Ozone and HDFS's Evolution
Ozone and HDFS's Evolution
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Ozone and HDFS’s evolution
Ozone and HDFS’s evolution
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
Optimizing Hive Queries
Optimizing Hive Queries
Optimizing Hive Queries
Optimizing Hive Queries
HBase for Architects
HBase for Architects
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
Still All on One Server: Perforce at Scale
Still All on One Server: Perforce at Scale
LLAP: Building Cloud First BI
LLAP: Building Cloud First BI
Mais de Cloudera, Inc.
Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
Mais de Cloudera, Inc.
(20)
Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Último
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
Igalia
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
lior mazor
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
The Digital Insurer
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ThousandEyes
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Andrey Devyatkin
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Remote DBA Services
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
sudhanshuwaghmare1
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
SynarionITSolutions
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
UK Journal
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Principled Technologies
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
apidays
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Edi Saputra
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
wesley chun
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Último
(20)
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
HBaseCon 2013: Compaction Improvements in Apache HBase
1.
© Hortonworks Inc.
2011 Compaction Improvements in Apache HBase Sergey Shelukhin sergey@hortonworks.com
2.
© Hortonworks Inc.
2011 About me •HBase committer since February 2013 •Member of Technical Staff at Hortonworks •Twitter @sershe84 Architecting the Future of Big Data
3.
© Hortonworks Inc.
2011 Overview •What are compactions? •Default algorithm and improvements •Enabling different implementations •Algorithms for various scenarios •Conclusions Architecting the Future of Big Data
4.
© Hortonworks Inc.
2011 What are compactions?
5.
© Hortonworks Inc.
2011 What are compactions? •HBase writes out immutable files as data is added –Each Store (CF+region) consists of these rowkey-ordered files –Immutable => more files accumulate over time –More files => slower reads •Compaction rewrites several files into one –Less files => faster reads • Major compaction rewrites all files in a Store into one –Can drop deleted records, tombstones and old versions •In minor compaction, files to compact are selected based on a heuristic Architecting the Future of Big Data
6.
© Hortonworks Inc.
2011 Compactions example Architecting the Future of Big Data •Memstore fills up, files are flushed •When enough files accumulate, they are compacted MemStore HDFS writes HFile … HFile HFile HFileHFile
7.
© Hortonworks Inc.
2011 Reads slow down w/o compactions •If too many files accumulate, reads slow down •Read latency over time without compactions: Architecting the Future of Big Data 0 5 10 15 20 25 0 3600 7200 10800 14400 Readlatency,ms. Load test time, sec
8.
© Hortonworks Inc.
2011 But, compaction cause slowdowns •Looks like lots of I/O for no apparent benefit •Example effect on reads (note better average) Architecting the Future of Big Data 0 5 10 15 20 25 0 3600 7200 10800 Readlatency,ms Load test time, sec
9.
© Hortonworks Inc.
2011 Default algorithm and improvements
10.
© Hortonworks Inc.
2011 Compaction tradeoffs •Hbase resolves key conflicts by file age –Therefore, can only compact contiguous files •Large compactions are more efficient (less total I/O) –However, they can cause long slowdown for clients •Small compactions have less effect on clients –However, in total you do more rewriting •We want to compact similar files Architecting the Future of Big Data
11.
© Hortonworks Inc.
2011 Default algorithm in 0.94 •Ratio-based selection –Look for files at most F times larger than the following files –Also allows limiting file numbers and sizes •Higher ratio => more aggressive (default 1.2) •Example: 2 files minimum, 3 maximum, ratio 1.2 Architecting the Future of Big Data HFile HFile HFile HFile HFile Too big!Too many files!OK. •Usually good for typical accumulation of flushed files •Not good for bulk load – unpredictable file sizes!
12.
© Hortonworks Inc.
2011 Off-peak compactions •Good if you have variable load through the day •HBASE-4463 - present in 0.94 (since 2011) •Compact more aggressively during certain hours of the day, when load is lower •Set off-peak period via – hbase.offpeak.start.hour,hbase.offpeak.end.hour (0-23) •Then, set ratio via – hbase.hstore.compaction.ratio.offpeak (default is 5) •Only one "off-peak" compaction at a time, so load is not totally prohibitive Architecting the Future of Big Data
13.
© Hortonworks Inc.
2011 Inefficiencies in default algorithm •First valid selection is chosen •Ratio is only considered for the first selected file –Thus, other files in compaction may not be similar •The solution found may not be the best one –especially for bulk load, with unpredictable file sizes Architecting the Future of Big Data HFile HFile HFile HFile HFile Matches the ratio, but this is a bad selection HFile
14.
© Hortonworks Inc.
2011 Exploring compaction selection •There are usually not so many files, so looking at all valid permutations and comparing quality is viable •HBASE-7842 - "exploring" compaction selection –Ratio checked for each file to choose good permutations –When store is ok, try to compact the most files –When store has too many files, try to eliminate some as fast as possible •On by default in 0.95/0.96 •Works with your old configuration settings Architecting the Future of Big Data
15.
© Hortonworks Inc.
2011 Examples and results •In previous example Architecting the Future of Big Data HFile HFile HFile HFile HFile Not in ratio, dissimilar files HFile •On bulk loads of random size, depending on settings: –loses only 0-10% efficiency in reducing files count; –While reducing I/O 3-10 times •Best results with ratio 1.3-1.4, 4 minimum files In ratio, may be valid… But this has more files!
16.
© Hortonworks Inc.
2011 Enabling different implementations
17.
© Hortonworks Inc.
2011 Making compactions pluggable •To allow further improvements, the code should be easy to replace; not the case as of 0.94 •Initial implementation – p/o HBASE-7055, HBASE-7516 – make just the selection pluggable •This is called "policy" (CompactionPolicy) •Example usages –exploring selection, mentioned previously –tier-based selection (port from Facebook) Architecting the Future of Big Data
18.
© Hortonworks Inc.
2011 Making compactions more pluggable • Other potential improvements are more involved • Need to change other things (HBASE-7678) • The meta-structure of the files (StoreFileManager, HBASE-7603) –Group files by some key/time/… based scheme –In memory/metadata only - filesystem structure or file format changes would be a compatibility nightmare –Example – LeveDB-style compactions, stripes • Compactor to compact the files (Compactor) –Example – large object store, levels, stripes • Can replace parts together or separately (StoreEngine) –E.g. level compactor only makes sense with level-aware store Architecting the Future of Big Data
19.
© Hortonworks Inc.
2011 Enabling compaction tuning •Different tables (or even column families) have different data and access patterns •Compactions already have large number of knobs •Starting with 0.96, they can be configured on table/CF level (HBASE-7236) •Example from the shell: alter 'table1', CONFIGURATION => {'hbase.hstore.engine.class' => 'org.apache.hadoop.hbase.regionserver.StripeStoreEngine', ... } Architecting the Future of Big Data
20.
© Hortonworks Inc.
2011 Algorithms for various scenarios
21.
© Hortonworks Inc.
2011 Key ways to improve compactions Architecting the Future of Big Data • Read from fewer files –Separate files by row key, version, time, etc. –Allows large number of files to be present, uncompacted • Don't compact the data you don't need to compact –For example, old data in OpenTSDB-like systems –Obviously, results in less I/O • Make compactions smaller –Without too much I/O amplification or too many files –Results in less compaction-related outages • HBase works better with few large regions; however, large compactions cause unavailability
22.
© Hortonworks Inc.
2011 How to avoid large compactions Architecting the Future of Big Data •LevelDB compactions –Files live on multiple levels –Files on each level have non-overlapping row-key ranges –…except level 0 (L0), where memstore flushes go –Compact overlapping subsets of 2 level, data goes up a level –Most read requests need only one file per level, plus all of L0 •Small compactions, few files per read, however... –More I/O, as the data moves from level to level –No major compactions – dropping deletes is not trivial –Messes up file ordering due to file boundary overlaps between levels – not readable correctly by default store
23.
© Hortonworks Inc.
2011 Stripe compactions (HBASE-7667) Architecting the Future of Big Data • Somewhat like LevelDB, partition the keys inside each region/store • But, only 1 level (plus optional L0) • Compared to regions, partitioning is more flexible –The default is a number of ~equal-sized stripes • To read, just read relevant stripes + L0, if present HFile HFile Region start key: ccc eee Row-key axis iii: region end keyggg H HFileHFileHFile HFile L0 get 'hbase'
24.
© Hortonworks Inc.
2011 Stripe compactions – writes Architecting the Future of Big Data •Data flushed from MemStore into several files •Each stripe compacts separately most of the time MemStore HDFS HFile HFile H HFileHFileHFile H H H HFile
25.
© Hortonworks Inc.
2011 Stripe compactions – other Architecting the Future of Big Data •Why L0? –Bulk loaded files go to L0 –Flushes can also go into single L0 files (to avoid tiny files) –Several L0 files are then compacted into striped files •Can drop deletes if compacting one entire stripe +L0 –No need for major compactions, ever •Compact 2 stripes together – rebalance if unbalanced –Very rare, however - unbalanced stripes are not a huge deal • Boundaries could be used to improve region splits in future
26.
© Hortonworks Inc.
2011 Stripe compactions - performance Architecting the Future of Big Data •EC2, c1.xlarge, preload; then measure random read perf –LoadTestTool + deletes + overwrites; measure random reads 0 500 1000 1500 2000 2500 3500 4500 5500 6500 7500 8500 Randomgetspersecond Test time, sec. Default gets-per-second, 30sec. MA Stripe gets-per-second, 30sec. MA
27.
© Hortonworks Inc.
2011 Stripe compactions - performance Architecting the Future of Big Data • On individual request level: median latency – same (1.6ms) • However 90th pct - 15% improvement (~13ms to ~11ms), • 99th pct – 20% improvement (~60 to ~47ms) • While also sending ~18% more reads in ~4% less time 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 2 4 6 8 10 12 14 16 18 20 Latency (ms) CDF Default Stripes (12)
28.
© Hortonworks Inc.
2011 Other stripe boundary schemes •For sharded sequential keys (like OpenTSDB), compacting old data again and again is not useful •What if stripes split dynamically as they grow? –If data is sequential, only a subset of stripes will grow –Non-growing stripes never need to be compacted Architecting the Future of Big Data HFileHFile HFile HFile H H HFile HFile HFile H Rowkey space Too big! HFile H Now this will hardly ever compact
29.
© Hortonworks Inc.
2011 Others in development – tier-based Architecting the Future of Big Data •Tier-based compaction selection (HBASE-7055; originally developed in Facebook) –Old data may not be read as frequently, new data may all be in cache so # of files does not matter, etc. –So, during selection, dynamically arrange files into tiers, and apply different rules (ratios, etc.) to them •Simple example (only 2 tiers) HFile HFile HFile However, if old files are rarely read, it's better to compact new first HFile HFile HFile HFile Looks like a good selection…
30.
© Hortonworks Inc.
2011 Others in development, or considered Architecting the Future of Big Data •Large Object store (HBASE-7949) •Partition files based on versions, timestamp, etc. •LevelDB compactions (HBASE-7519) •…more to come?
31.
© Hortonworks Inc.
2011 Resources •HBase book section contains a lot of details on tuning the default selection –http://hbase.apache.org/book.html#compaction –There are other knobs that may be poorly documented •JIRAs to track the work done for compactions –https://issues.apache.org/jira/browse/HBASE/component/12319905 •Design and configuration documentation for the new compactions are attached to JIRAs –Tier-based: HBASE-7055, stripe: HBASE-7667 –Book will be updated as things make it into trunk Architecting the Future of Big Data
32.
© Hortonworks Inc.
2011 Summary •Compactions are a way to reduce the number of files to read when getting data •Compactions are expensive, so efficiency is important •HBase 0.96 compactions –contain automatic improvements to default algo –are easier to improve, build upon, and configure •Work in progress to improve compactions for Big Data •Scenario-specific compaction algorithms are also possible, and being worked on Architecting the Future of Big Data
33.
© Hortonworks Inc.
2011 Q & A
Notas do Editor
Example of CF delete processing