- The document summarizes the state of Apache HBase, including recent releases, compatibility between versions, and new developments.
- Key releases include HBase 1.1, 1.2, and 1.3, which added features like async RPC client, scan improvements, and date-tiered compaction. HBase 2.0 is targeting compatibility improvements and major changes to data layout and assignment.
- New developments include date-tiered compaction for time series data, Spark integration, and ongoing work on async operations, replication 2.0, and reducing garbage collection overhead.
5. Semantic Versioning
Starting with the 1.0 release, HBase works toward
Semantic Versioning
MAJOR.MINOR.PATCH[-identifiers]
PATCH: only BC bug fixes.
MINOR: BC new features
MAJOR: Incompatible changes
6. SemVer in Action
1.0 Released last year. Started following semantic versioning
10 releases with 1.x.y versions. More coming!
Release notes contain “compatibility” report for source / binary
Patch upgrades do not have new features. Drop in replacement.
Minor versions are “compatible”
8. To be, or not to be (Compatible)
Compatibility is NOT a simple yes or no
Many dimensions
• source, binary, wire, command line, dependencies etc
What is client interface?
• InterfaceAudience.{Public,Private,LimitedPrivate}
Read https://hbase.apache.org/book.html#upgrading
9. Major Minor Patch
Client-Server Wire Compatibility
✗ ✓ ✓
Server-Server Compatibility
✗ ✓ ✓
File Format Compatibility
✗* ✓ ✓
Client API Compatibility
✗ ✓ ✓
Client Binary Compatibility
✗ ✗ ✓
Server Side Limited API
Compatibility ✗ ✗*/✓* ✓
Dependency Compatibility
✗ ✓ ✓
Operation Compatibility
✗ ✗ ✓
13. RTFM – HBase-1.1 Release Notes
• Async RPC client
• Simple RPC throttling
• Improved compaction controls
• Scan improvements
• Procedure V2 for improved reliability
of cluster operations (HBASE-12439)
• New extension interfaces for
coprocessor users
• Per-column family flush
• WAL on SSD
• BlockCache in Memcached
• Region replica enhancements around
META, WAL, and bulk loading
14. RTFM – HBase-1.2 Release Notes
• JDK8 is now supported
• Hadoop 2.6.1+ and Hadoop 2.7.1+
are now supported
• Per column-family time ranges for
scan
• Daemons respond to SIGHUP to
reload configs
• Region location methods added to
thrift2 proxy
• Table-level sync that sends deltas
• Client side metrics via JMX
15. RTFM – HBase-1.3 Release Notes
• Date-based tiered compactions
• Maven archetypes for HBase client
applications
• Throughput controller for flushes
Controlled delay (CoDel) based RPC
scheduler (HBASE-15136)
• Bulk loaded HFile replication
• More improvements to Procedure V2
• Improvements to Multi WAL
• Many improvements and
optimizations in metrics subsystem
• Reduced memory allocation in RPC
layer
• Region location lookups optimizations
in HBase client
16. Releases – How to choose
0.98 is still released frequently, likely will continue till end of 2016
1.0 is EOL’ed. Move to 1.1 at least
Both 1.1 and 1.2 are pretty stable
Starting from scratch, use 1.2 or 1.3
1.3 is coming shortly
Moving between minor versions is easy for 1.x
18. New Compaction Policies for Time series
FIFO: First In, First Out
• No Compaction!
• Only data with very short TTL
Date Tiered Compaction
• Dramatic reduction in IO!
• Partition hfiles and compaction by time windows
• Scans with time ranges filters whole files
23. New Development – In Progress
RPC Scheduling improvements
Replication 2.0
Reduce Garbage
C++ Client
Backup / Restore
24. New Development – In Progress
Offheaping
Read path (done)
Write path in development
In-memory flushes/compactions
Compact in-memory representations
Fatter flushes
Assignment Manager/Master
26. HBase-2.0
Target is 2016 EOY
Learnt from singularity (0.94 -> 0.96+)
2.0 will be rolling upgradable!
• Disclaimer: to the extend that we can make it
JDK-8 only
Will work with Hadoop-3?
Assignment and data layout changes is the big driver
27. How to prepare for HBase-2.0
2.0 contains more API clean up
Cleanup PB and guava “leaks” into the API
Some deprecated APIs (HConnection, HTable, HBaseAdmin, etc) going away
Start using JDK-8 (and G1). You will like it.
1.x client should be able to do read / write / scan against 2.0 clusters
Some DDL / Admin operations may not work
28. Other HBase talks
Today
(3:00pm) Omid: A Transactional Framework
for HBase
(4:10pm) Hive Hbase Metastore - Improving
Hive with a Big Data Metadata Storage
(5:00pm) Operating and Supporting Apache
HBase - Best Practices and Improvements
Thursday
(2:10pm) Managing Hadoop, HBase, and
Storm Clusters at Yahoo Scale
(3:00pm) Phoenix + HBase: An Enterprise
Grade Data-Warehouse Appliance for
Interactive Analytics?
(4:10pm) The DAP: Where Yarn, HBase,
Kafka and Spark go to Production
(5:00pm) HBase BoF