Apache HBase, Accelerated: In-Memory Flush and Compaction

HBase Accelerated:
In-Memory Flush and Compaction
E s h c a r H i l l e l , A n a s t a s i a B r a g i n s k y , E d w a r d B o r t n i k o v ⎪ H B a s e C o n , S a n
F r a n c i s c o , M a y 2 4 , 2 0 1 6

Outline
2
 Background
 In-Memory Compaction
› Design & Evaluation
 In-Memory Index Reduction

Motivation: Dynamic Content Processing on Top of HBase
3
 Real-time content processing pipelines
› Store intermediate results in persistent map
› Notification mechanism is prevalent
 Storage and notifications on the same platform
 Sieve – Yahoo’s real-time content management platform
Crawl
Docpro
c
Link
Analysis Queue
Crawl
schedule
Content
Queue
Links
Serving
Apache Storm
Apache HBase

Notification Mechanism is Like a Sliding Window
4
 Small working set but not necessarily FIFO queue
 Short life-cycle delete message after processing it
 High-churn workload message state can be updated
 Frequent scans to consume message

HBase Accelerated: Mission Definition
5
Goal:
Real-time performance in persistent KV-stores
How:
Use less in-memory space  less I/O

HBase Accelerated: Two Base Ideas
6
In-Memory Compaction
 Exploit redundancies in the workload to eliminate duplicates in memory
 Gain is proportional to the duplicate ratio
In-Memory Index Reduction
 Reduce the index memory footprint, less overhead per cell
 Gain is proportional to the cell size
Prolong in-memory lifetime, before flushing to disk
 Reduce amount of new files
 Reduce write amplification effect (overall I/O)
 Reduce retrieval latencies

Outline
7
 Background
In-Memory
Compaction
Design

 Random writes absorbed in active segment (Cm)
 When active segment is full
› Becomes immutable segment (snapshot)
› A new mutable (active) segment serves writes
› Flushed to disk, truncate WAL
 On-Disk compaction reads a few files, merge-sorts them, writes back new files
9
HBase Writes
C’m
flush
memory HDFS
Cm
prepare-for-flush
Cd
WAL
write

10
HBase Reads
C’m
memory HDFS
Cm
Cd
Read
 Random reads from Cm or C’m or Cd (Block Cache)
 When data piles-up on disk
› Hit ratio drops
› Retrieval latency up
 Compaction re-writes small files into fewer bigger files
› Causes replication-related network and disk IO
Block cache

12
Hbase In-Memory Compaction
C’m
flush
memory HDFS
Cm
in-memory-flush
Cd
WAL
Block cache
Compaction
pipeline

 New compaction pipeline
› Active segment flushed to pipeline
› Pipeline segments compacted in memory
› Flush to disk only when needed
13
New Design: In-Memory Flush and Compaction
C’m
flush-to-disk
memory HDFS
Cm
in-memory-flush
Cd
prepare-for-flush
Compaction
pipeline
WAL
Block
cache
memory

Trade read cache (BlockCache) for write cache (compaction pipeline)
14
write read
cache cache
C’m
flush-to-disk
memory HDFS
Cm
in-memory-flush
Cd
prepare-for-flush
Compaction
pipeline
WAL
Block
cache
memory

15
CPU
IO
C’m
flush-to-disk
memory HDFS
Cm
in-memory-flush
Cd
prepare-for-flush
Compaction
pipeline
WAL
Block
cache
memory
Trade read cache (BlockCache) for write cache (compaction pipeline)
More CPU cycles for less I/O

Outline
16
 Background

Evaluation Settings: In-Memory Working Set
17
 YCSB: compares compacting vs. default memstore
 Small cluster: 3 HDFS nodes on a single rack, 1 RS
› 1GB heap space, MSLAB enabled (2MB chunks)
› Default: 128MB flush size, 100MB block-cache
› Compacting: 192MB flush size, 36MB block-cache
 High-churn workload, small working set
› 128,000 records, 1KB value field
› 10 threads running 5 millions operations, various key distributions
› 50% reads 50% updates, target 1000ops
› 1% (long) scans 99% updates, target 500ops
 Measure average latency over time
› Latencies accumulated over intervals of 10 seconds

Evaluation Results: Read Latency (Zipfian Distribution)
18
Flush
to disk
Compaction
Data
fits into
cache

Evaluation Results: Read Latency (Uniform Distribution)
19
Region
split

Evaluation Results: Scan Latency (Uniform Distribution)
20

Evaluation Settings: Handling Tombstones
21
 YCSB: compares compacting vs. default memstore
 Small cluster: 3 HDFS nodes on a single rack, 1 RS
› 1GB heap space, MSLAB enabled (2MB chunks), 128MB flush size, 64MB block-cache
› Default: Minimum 4 files for compaction
› Compaction: Minimum 2 files for compaction
 High-churn workload, small working set with deletes
› 128,000 records, 1KB value field
› 10 threads running 5 millions operations, various key distributions
› 40% reads 40% updates 20% deletes (with 50,000 updates head start), target 1000ops
› 1% (long) scans 66% updates 33% deletes (with head start), target 500ops
 Measure average latency over time
› Latencies accumulated over intervals of 10 seconds

Evaluation Results: Read Latency (Zipfian Distribution)
22
(total 2 flushes and 1 disk compactions)

Evaluation Results: Read Latency (Uniform Distribution)
23

Evaluation Results: Scan Latency (Zipfian Distribution)
24

Outline
25
 Background
In-Memory
Index Reduction
Design

26
New Design: Effective in-memory representation
C’m
flush-to-disk
memory HDFS
Cm
in-memory-flush
Cd
prepare-for-flush
Compaction
pipeline
WAL
Block
cache
memory

27
C’m
flush-to-disk
memory HDFS
Cm
in-memory-flush
Cd
prepare-for-flush
Compaction
pipeline
WAL
Block
cache
memory

28
Segment for dynamic updates
Cell
A
Cell G
Cell F
Cell
BCell D Cell E
MSLAB

Exploit the Immutability of Segment after Compaction
29
 Current design
› Data stored in flat buffers, index is a skip-list
› All memory allocated on-heap
 New Design: Flat layout for immutable segments index
› Less overhead per cell
› Manage (allocate, store, release) data buffers off-heap
 Pros
› Locality in access to index
› Reduce memory fragmentation
› Significantly reduce GC work
› Better utilization of memory and CPU

30
Read-Only Segment
Cell
A
Cell G
Cell F
Cell
BCell D Cell E
MSLAB

31
C’m
flush-to-disk
memory HDFS
Cm
in-memory-flush
Cd
prepare-for-flush
Compaction
pipeline
WAL
memory
Are there redundancies
to compact?
yesno
Flatten the index –
less overhead per
cell
Flatten the index &
compact – less cells
& less overhead per
cell

Evaluation Results: Read Latency 1K Cell (Small Cache)
32
0
500
1000
1500
2000
2500
10
110
210
310
410
510
610
710
810
910
1010
1110
1210
1310
1410
1510
1610
1710
1810
1910
2010
2110
2210
2310
2410
2510
2610
2710
2810
2910
3010
3110
3210
3310
3410
3510
3610
3710
3810
3910
4010
4110
4210
4310
4410
Latency(us)
Timeline (seconds)
Uniform (Reads 50% - Writes 50%) - Read Latency
skip-list based compaction
cell-array based compaction

Evaluation Results: Read Latency 100Byte Cell (Small Cache)
33
0
200
400
600
800
1000
1200
10
140
270
400
530
660
790
920
1050
1180
1310
1440
1570
1700
1830
1960
2090
2220
2350
2480
2610
2740
2870
3000
3130
3260
3390
3520
3650
3780
3910
4040
4170
4300
4430
4560
4690
4820
4950
5080
5210
5340
5470
5600
Latency(us)
Timeline (seconds)
Zipfian (Reads 50% - Writes 50%) - Read Latency

Evaluation Results: Scan Latency (Uniform Distribution)
34
0
50000
100000
150000
200000
250000
10
250
490
730
970
1210
1450
1690
1930
2170
2410
2650
2890
3130
3370
3610
3850
4090
4330
4570
4810
5050
5290
5530
5770
6010
6250
6490
6730
6970
7210
7450
7690
7930
8170
8410
8650
8890
9130
9370
9610
9850
10090
10330
10570
10810
11050
11290
11530
11770
12010
12250
12490
12730
12970
13210
13450
13690
13930
Latency(us)
Timeline (seconds)

Status Umbrella Jira HBASE-14918
35
 HBASE-14919 HBASE-15016 HBASE-15359 Infrastructure refactoring
› Status: committed
 HBASE-14920 new compacting memstore
› Status: pre-commit
 HBASE-14921 memory optimizations (memory layout, off-heaping)
› Status: under code review

Summary
36
 Feature intended for HBase 2.0.0
 New design pros over default implementation
› Predictable retrieval latency by serving (mainly) from memory
› Less compaction on disk reduces write amplification effect
› Less disk I/O and network traffic reduces load on HDFS
› New space efficient index representation
 We would like to thank the reviewers
› Michael Stack, Anoop Sam John, Ramkrishna s. Vasudevan, Ted Yu

38
Evaluation Results: Write Latency

Apache HBase, Accelerated: In-Memory Flush and Compaction

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Apache HBase, Accelerated: In-Memory Flush and Compaction

Semelhante a Apache HBase, Accelerated: In-Memory Flush and Compaction (20)

Mais de HBaseCon

Mais de HBaseCon (20)

Último

Último (20)

Apache HBase, Accelerated: In-Memory Flush and Compaction