SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
HBase	
  Cache	
  &	
  Performance	
  
Biju	
  Nair	
  
Boston	
  Hadoop	
  User	
  Group	
  Meet-­‐up	
  
28	
  May	
  2015	
  
HBase	
  Overview	
  
•  Key	
  value	
  store	
  
•  Column	
  family	
  oriented	
  
•  Data	
  stored	
  as	
  byte[]	
  
•  Data	
  indexed	
  by	
  key	
  value	
  
•  Data	
  stored	
  in	
  sorted	
  order	
  by	
  key	
  
•  Data	
  model	
  doesn’t	
  have	
  to	
  be	
  pre-­‐defined	
  
•  Scales	
  horizontally	
  
2	
  
HBase	
  Overview	
  
create	
  ‘stock’,	
  ‘company’,	
  ‘financials’	
  
3	
  
…
msft,company,loc,ts1,Seattle
msft,company,name,ts1,Microsoft
…
orcl,company,loc,ts1,Redwood
orcl,company,name,ts1,Oracle
…
…
msft,financials,cap,ts1,379B
msft,financials,pe,ts1,20
…
orcl,financials,cap,ts1,190B
orcl,financials,pe,ts1,18
…
Physical	
  Storage	
  
put	
  ‘stock’,	
  ’ms9’,	
  ‘company:name’,	
  ‘microso9’	
  
get	
  ‘stock’,	
  ’ms9’	
  
company:loc,ts1,Seattle
company:name,ts1,Microsoft
financials:cap,ts1,379B
financials:PE,ts1,20
HBase	
  Overview	
  
4	
  
…
appl,company…
…
ge,company…
…
ibm,company…
…
msft,company…
msft,company…
…
orcl,company…
orcl,company…
…
…
appl,financials…
…
ge,financials…
…
ibm,financials…
…
msft,financials…
msft,financials…
…
orcl,financials…
orcl,financials…
…
…
appl,company…
…
…
ge,company…
…
…
ibm,company…
…
…
msft,company…
…
…
orcl,company…
…
…
appl,financials…
…
…
ge,financials…
…
…
ibm,financials…
…
…
msft,financials…
…
…
orcl,financials…
…
Regions	
  
HBase	
  Overview	
  
5	
  
Region	
  Server	
  Region	
  Server	
  Region	
  Server	
  Region	
  Server	
  Region	
  Server	
  
…
appl,company…
…
…
ge,company…
…
…
ibm,company…
…
…
msft,company…
…
…
orcl,company…
…
HBase	
  Master	
  
ZooKeeper	
  
Client	
  
Use	
  Case:	
  Data	
  and	
  Query	
  
•  Time	
  series	
  data	
  
– Tickers	
  and	
  aYributes	
  
– Monthly	
  data	
  stored	
  in	
  a	
  column;	
  256	
  bytes	
  
– Up	
  to	
  20	
  years	
  worth	
  of	
  data	
  
•  Queries	
  
– “get”s	
  for	
  up	
  to	
  1	
  year	
  data;	
  3072	
  bytes	
  
6	
  
Use	
  Case:	
  Requirements	
  
•  Meet	
  “get”	
  query	
  performance	
  requirements	
  
– Under	
  10	
  ms	
  for	
  99%	
  of	
  queries	
  
– Median	
  latency	
  2	
  to	
  3	
  ms	
  
– 99.99%	
  latency	
  under	
  50	
  ms	
  
•  Efficient	
  HBase	
  cluster	
  capacity	
  uelizaeon	
  
– 32	
  cores	
  per	
  node	
  
– 128	
  GB	
  of	
  memory	
  per	
  node	
  
– SSD	
  storage	
  in	
  all	
  nodes	
  
7	
  
Baseline	
  Test	
  Observaeons	
  
•  Spikes	
  in	
  read	
  response	
  emes	
  
•  Less	
  than	
  10%	
  uelizaeon	
  of	
  RS	
  node	
  CPUs	
  
•  Less	
  than	
  15%	
  uelizaeon	
  of	
  RS	
  node	
  memory	
  
•  Block	
  cache	
  uelizaeon	
  was	
  inefficient	
  
– Low	
  hit	
  raeo	
  and	
  high	
  eviceon	
  rates	
  
8	
  
HBase	
  Internals	
  (Simplified)	
  
HBase	
  Memory	
  (RS)	
  
Mem	
  Store	
  
Block	
  cache	
  
HBase	
  Storage	
  
WAL	
  
HFiles	
  
9	
  
HBase	
  Write	
  Path	
  (Simplified)	
  
HBase	
  Memory	
  (RS)	
  
Mem	
  Store	
  
Block	
  cache	
  
HBase	
  Storage	
  
WAL	
  
HFiles	
  
1
10	
  
3	
  
2
HBase	
  Read	
  Path	
  (Simplified)	
  
HBase	
  Memory	
  (RS)	
  
Mem	
  Store	
  
Block	
  cache	
  
HBase	
  Storage	
  
WAL	
  
HFiles	
  
1
2
11	
  
Cache	
  Uelizaeon	
  
•  Low	
  hit	
  raeo	
  and	
  high	
  eviceon	
  rates	
  
•  Frequently	
  read	
  data	
  size	
  
– ~	
  3	
  K	
  
•  Table	
  block	
  size	
  
– 65	
  K	
  
•  Proposed	
  change	
  
– Reduce	
  block	
  size	
  
12	
  
Impact	
  of	
  Table	
  Blk	
  Size	
  Change	
  
Avg 3.002 5.362 5.361 5.357 6.419 6.369 6.405 6.383 6.188 6.196 6.182 6.174 6.246 6.264 6.268 6.253 5.194 5.207 5.219 3.031
Median 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
95% 10 15 15 15 18 18 18 18 18 18 17 17 18 18 18 18 15 15 15 10
99% 15 26 26 26 30 30 30 30 28 28 28 28 29 29 29 29 25 24 25 15
99.90% 26 41 41 41 45 45 45 45 43 43 43 43 44 44 44 44 41 41 41 26
Max 2261 127 185 102 90 106 92 102 93 106 119 114 89 140 132 82 81 150 93 1910
BAvg 16.731 16.728 16.761 16.763 16.418 16.371 16.37 16.431 16.152 16.14 16.169 16.158 16.308 16.29 16.325 16.307 16.34 16.381 16.391 16.352
BMedian 14 14 14 14 13 13 13 13 15 15 15 15 13 13 13 13 13 13 13 13
B95% 41 41 41 41 41 41 41 41 43 43 43 43 40 40 40 40 41 41 41 41
B99% 55 55 55 55 54 54 54 54 55 55 55 55 54 54 54 54 54 54 55 54
B99.9% 71 71 71 71 70 70 70 70 67 67 67 67 71 70 70 71 71 71 71 70
BMax 545 1062 559 567 1075 1027 561 567 564 541 558 1062 1062 561 1075 1072 1067 563 1035 1032
Get	
  Performance	
  (ms)	
  –	
  64	
  K	
  Blk	
  
Get	
  Performance	
  (ms)	
  –	
  16	
  K	
  Blk	
  
Note:	
  Smaller	
  block	
  size	
  increases	
  the	
  overhead	
  of	
  increased	
  index	
  blocks	
  	
  
13	
  
Memory	
  uelizaeon/Latency	
  Spikes	
  
•  JVM	
  GC	
  contributed	
  to	
  latency	
  spikes	
  
•  Increase	
  in	
  heap	
  size	
  increased	
  GC	
  eme	
  
– Prevented	
  using	
  all	
  the	
  available	
  memory	
  	
  	
  
•  Proposed	
  change:	
  Use	
  off-­‐heap	
  caching	
  
– Minimize	
  spikes	
  in	
  response	
  eme	
  due	
  to	
  GC	
  
– Increased	
  uelizaeon	
  of	
  node	
  memory	
  
14	
  
HBase	
  Off-­‐Heap	
  Caching	
  
HBase	
  Memory	
  (RS)	
  
Mem	
  Store	
  
Block	
  cache	
  (L1)	
  Idx	
  &	
  BF	
  data	
  
HBase	
  Storage	
  
WAL	
  
HFiles	
  
Off-­‐heap	
  cache	
  (L2)	
  Tbl	
  Data	
  
(Bucket	
  Cache)	
  
15	
  
HBase	
  Read	
  Path	
  (Simplified)	
  
HBase	
  Memory	
  (RS)	
  
Mem	
  Store	
  
Block	
  cache	
  
HBase	
  Storage	
  
WAL	
  
HFile	
  
1
2
L2	
  Cache	
  
3
4
16	
  
Bucket	
  Cache	
  Configuraeon	
  
•  Hbase env.sh HBASE_REGIONSERVER_OPTS
parameters
–  Xmx
–  XX:MaxDirectMemorySize
•  Hbase site.xml properties
–  hbase.regionserver.global.memstore.upperLimit
–  hfile.block.cache.size
–  hbase.bucketcache.size
–  hbase.bucketcache.ioengine
–  hbase.bucketcache.percentage.in.combinedcache	
  
17	
  
Bucket	
  Cache	
  Configuraeon	
  
Item	
   id	
   Values	
  
Total	
  RS	
  memory	
   Tot	
  
Memstore	
  size	
   MSz	
  
L1	
  (LRU)	
  Cache	
   L1Sz	
  
Heap	
  for	
  JVM	
   JHSz	
  
XX:MaxDirectMemorySize	
   DMem	
   Tot-­‐MSz-­‐L1Sz-­‐JHSz	
  
Xmx	
   Xmx	
   MSz+L1Sz+JHSz	
  
hbase.regionserver.global.memstore.upperLimit	
   ULim	
   MSz/Xmx	
  
hfile.block.cache.size	
   blksz	
   0.8-­‐ULim	
  
hbase.bucketcache.size	
   bucsz	
   Dmem+(blksz*Xmx)	
  
hbase.bucketcache.percentage.in.combinedcache	
   ccsz	
   1-­‐((blksz*Xmx)/bucsz))	
  
hbase.bucketcache.ioengine	
   Oueap/”file:/localfile”	
  
18	
  
Bucket	
  Cache	
  Configuraeon	
  
Item	
   id	
   Values	
  
Total	
  RS	
  memory	
   Tot	
   96000	
  
Memstore	
  size	
   MSz	
   2000	
  
L1	
  (LRU)	
  Cache	
   L1Sz	
   2000	
  
Heap	
  for	
  JVM	
   JHSz	
   1000	
  
XX:MaxDirectMemorySize	
   DMem	
   91000	
  
Xmx	
   Xmx	
   5000	
  
hbase.regionserver.global.memstore.upperLimit	
   ULim	
   0.4	
  
hfile.block.cache.size	
   blksz	
   0.4	
  
hbase.bucketcache.size	
   bucsz	
   93000	
  
hbase.bucketcache.percentage.in.combinedcache	
   ccsz	
   0.97849	
  
hbase.bucketcache.ioengine	
   ”file:/localfile”	
  
19	
  
Impact	
  of	
  Using	
  Off-­‐Heap	
  Cache	
  
Get	
  Performance	
  with	
  L1	
  cache	
  	
  
Get	
  Performance	
  with	
  L1	
  &	
  L2	
  cache	
  
Note:	
  L1	
  cache	
  test	
  used	
  38	
  GB	
  of	
  data,	
  L1+L2	
  test	
  used	
  3	
  TB	
  of	
  data	
  	
  
Avg 3.872 3.995 3.936 4.007 4.052
Median 1 1 1 1 1
95% 14 14 14 15 15
99% 20 20 20 20 20
99.90% 27 27 27 28 28
99.99% 36 36 36 37 37
99.999% 208 310 332 207 232
Max 1360 1906 1736 1359 1363
807Mil797107Mil7Requests
BAvg 3.429 2.552 3.447 3.502 3.554
BMedian 2 2 2 2 2
B95% 10 8 10 10 10
B99% 18 14 18 18 18
B99.9% 30 23 30 30 31
BMax 78 1135 58 77 67
18Mil8Rows8>818Mil8Requests
20	
  
Maximize	
  CPU	
  &	
  Memory	
  Uelizaeon	
  
•  Run	
  addieonal	
  RS	
  per	
  node	
  
•  Throughput	
  increased	
  50%	
  when	
  RS	
  increased	
  
to	
  2	
  
– Through	
  put	
  reduced	
  on	
  AWS	
  cluster	
  
– There	
  was	
  no	
  degradaeon	
  on	
  the	
  response	
  eme	
  
– Through	
  put	
  increase	
  tapered	
  awer	
  3	
  RS	
  per	
  node	
  
•  Note:	
  Maintenance	
  over	
  head	
  using	
  mule-­‐RS	
  
21	
  
Known	
  Issues	
  
•  Using	
  “oueap”	
  opeon	
  of	
  BucketCache	
  prevents	
  
RS	
  start	
  
–  [HBASE-­‐10643]	
  
–  Can	
  be	
  miegated	
  using	
  tempfs	
  
•  LoadIncrementalHFiles	
  doesn’t	
  work	
  with	
  
BucketCache	
  	
  
–  [HBase-­‐10500]	
  
•  BucketCache	
  for	
  different	
  block	
  sizes	
  is	
  not	
  
configurable	
  
–  [HBASE-­‐10641]	
  Fixed	
  
22	
  
Key	
  Takeaways	
  
•  Store	
  what	
  is	
  really	
  required	
  
– Understand	
  the	
  query	
  paYern	
  
– Leverage	
  column	
  family	
  (CF)	
  to	
  group	
  data	
  
•  Choose	
  appropriate	
  block	
  size	
  for	
  table/CF	
  
•  Use	
  off	
  heap	
  cache	
  to	
  minimize	
  latency	
  spikes	
  
•  Test	
  all	
  assumpeons	
  
23	
  
Further	
  Reading	
  
•  hYp://blog.asquareb.com/blog/2014/11/21/leverage-­‐hbase-­‐cache-­‐and-­‐
improve-­‐read-­‐performance	
  
•  hYp://blog.asquareb.com/blog/2014/11/24/how-­‐to-­‐leverage-­‐large-­‐
physical-­‐memory-­‐to-­‐improve-­‐hbase-­‐read-­‐performance	
  
•  hYps://issues.apache.org/jira/browse/HBASE-­‐7404	
  
•  hYp://www.n10k.com/blog/blockcache-­‐101/	
  	
  
•  hYp://www.n10k.com/blog/blockcache-­‐showdown/	
  
24	
  
25	
  
bnair@asquareb.com
blog.asquareb.com
https://github.com/bijugs
@gsbiju

Mais conteúdo relacionado

Mais procurados

Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deploymentYoshinori Matsunobu
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
Tutorial ceph-2
Tutorial ceph-2Tutorial ceph-2
Tutorial ceph-2Tommy Lee
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...Michael Stack
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Getting started with DSpace 7 REST API
Getting started with DSpace 7 REST APIGetting started with DSpace 7 REST API
Getting started with DSpace 7 REST API4Science
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixRajeshbabu Chintaguntla
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slidesMohamed Farouk
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage systemItalo Santos
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberFlink Forward
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesFlink Forward
 

Mais procurados (20)

HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
Internal Hive
Internal HiveInternal Hive
Internal Hive
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Tutorial ceph-2
Tutorial ceph-2Tutorial ceph-2
Tutorial ceph-2
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Getting started with DSpace 7 REST API
Getting started with DSpace 7 REST APIGetting started with DSpace 7 REST API
Getting started with DSpace 7 REST API
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Local Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache PhoenixLocal Secondary Indexes in Apache Phoenix
Local Secondary Indexes in Apache Phoenix
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Understanding oracle rac internals part 1 - slides
Understanding oracle rac internals   part 1 - slidesUnderstanding oracle rac internals   part 1 - slides
Understanding oracle rac internals part 1 - slides
 
Ceph - A distributed storage system
Ceph - A distributed storage systemCeph - A distributed storage system
Ceph - A distributed storage system
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Extending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use casesExtending Flink SQL for stream processing use cases
Extending Flink SQL for stream processing use cases
 

Destaque

Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseDataWorks Summit
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster Cloudera, Inc.
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationSchubert Zhang
 
HBaseCon 2013: A Developer’s Guide to Coprocessors
HBaseCon 2013: A Developer’s Guide to CoprocessorsHBaseCon 2013: A Developer’s Guide to Coprocessors
HBaseCon 2013: A Developer’s Guide to CoprocessorsCloudera, Inc.
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceBiju Nair
 
Websphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsWebsphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsBiju Nair
 
Project Risk Management
Project Risk ManagementProject Risk Management
Project Risk ManagementBiju Nair
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar DatabaseBiju Nair
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User ReferenceBiju Nair
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload managementBiju Nair
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developersBiju Nair
 
NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaBiju Nair
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...StampedeCon
 

Destaque (15)

Apache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL DatabaseApache Phoenix: Transforming HBase into a SQL Database
Apache Phoenix: Transforming HBase into a SQL Database
 
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
HBaseCon 2013: Using Coprocessors to Index Columns in an Elasticsearch Cluster
 
HBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance EvaluationHBase 0.20.0 Performance Evaluation
HBase 0.20.0 Performance Evaluation
 
HBase 훑어보기
HBase 훑어보기HBase 훑어보기
HBase 훑어보기
 
HBaseCon 2013: A Developer’s Guide to Coprocessors
HBaseCon 2013: A Developer’s Guide to CoprocessorsHBaseCon 2013: A Developer’s Guide to Coprocessors
HBaseCon 2013: A Developer’s Guide to Coprocessors
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve Performace
 
Websphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentalsWebsphere MQ (MQSeries) fundamentals
Websphere MQ (MQSeries) fundamentals
 
Project Risk Management
Project Risk ManagementProject Risk Management
Project Risk Management
 
Row or Columnar Database
Row or Columnar DatabaseRow or Columnar Database
Row or Columnar Database
 
HDFS User Reference
HDFS User ReferenceHDFS User Reference
HDFS User Reference
 
Netezza workload management
Netezza workload managementNetezza workload management
Netezza workload management
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Netezza fundamentals for developers
Netezza fundamentals for developersNetezza fundamentals for developers
Netezza fundamentals for developers
 
NENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezzaNENUG Apr14 Talk - data modeling for netezza
NENUG Apr14 Talk - data modeling for netezza
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 

Semelhante a HBase Application Performance Improvement

Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction HBaseCon
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme MakeoverHBaseCon
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New FeaturesAmazon Web Services
 
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...Joao Galdino Mello de Souza
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best PracticesVenu Anuganti
 
Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerYongseok Oh
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalabilityWim Godden
 
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesHBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesMichael Stack
 
Sql server scalability fundamentals
Sql server scalability fundamentalsSql server scalability fundamentals
Sql server scalability fundamentalsChris Adkin
 
Deploying ssd in the data center 2014
Deploying ssd in the data center 2014Deploying ssd in the data center 2014
Deploying ssd in the data center 2014Howard Marks
 
HBase: Extreme makeover
HBase: Extreme makeoverHBase: Extreme makeover
HBase: Extreme makeoverbigbase
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterAaron Joue
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar storesIstvan Szukacs
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar storesIstvan Szukacs
 
FlashSQL 소개 & TechTalk
FlashSQL 소개 & TechTalkFlashSQL 소개 & TechTalk
FlashSQL 소개 & TechTalkI Goo Lee
 
S016827 pendulum-swings-nola-v1710d
S016827 pendulum-swings-nola-v1710dS016827 pendulum-swings-nola-v1710d
S016827 pendulum-swings-nola-v1710dTony Pearson
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Community
 
VMworld 2013: Extreme Performance Series: Storage in a Flash
VMworld 2013: Extreme Performance Series: Storage in a Flash VMworld 2013: Extreme Performance Series: Storage in a Flash
VMworld 2013: Extreme Performance Series: Storage in a Flash VMworld
 
Thoughts on kafka capacity planning
Thoughts on kafka capacity planningThoughts on kafka capacity planning
Thoughts on kafka capacity planningJamieAlquiza
 

Semelhante a HBase Application Performance Improvement (20)

Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
 
HBase: Extreme Makeover
HBase: Extreme MakeoverHBase: Extreme Makeover
HBase: Extreme Makeover
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
z/VM 6.3 - Mudanças de Comportamento do hypervisor para suporte de partições ...
 
HBase Operations and Best Practices
HBase Operations and Best PracticesHBase Operations and Best Practices
HBase Operations and Best Practices
 
Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS Scheduler
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devicesHBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
HBaseConAsia2018 Track1-2: WALLess HBase with persistent memory devices
 
Sql server scalability fundamentals
Sql server scalability fundamentalsSql server scalability fundamentals
Sql server scalability fundamentals
 
Deploying ssd in the data center 2014
Deploying ssd in the data center 2014Deploying ssd in the data center 2014
Deploying ssd in the data center 2014
 
HBase: Extreme makeover
HBase: Extreme makeoverHBase: Extreme makeover
HBase: Extreme makeover
 
How Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver ClusterHow Ceph performs on ARM Microserver Cluster
How Ceph performs on ARM Microserver Cluster
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
FlashSQL 소개 & TechTalk
FlashSQL 소개 & TechTalkFlashSQL 소개 & TechTalk
FlashSQL 소개 & TechTalk
 
S016827 pendulum-swings-nola-v1710d
S016827 pendulum-swings-nola-v1710dS016827 pendulum-swings-nola-v1710d
S016827 pendulum-swings-nola-v1710d
 
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Melbourne - Ceph on All-Flash Storage - Breaking Performance Barriers
 
VMworld 2013: Extreme Performance Series: Storage in a Flash
VMworld 2013: Extreme Performance Series: Storage in a Flash VMworld 2013: Extreme Performance Series: Storage in a Flash
VMworld 2013: Extreme Performance Series: Storage in a Flash
 
Thoughts on kafka capacity planning
Thoughts on kafka capacity planningThoughts on kafka capacity planning
Thoughts on kafka capacity planning
 

Mais de Biju Nair

Chef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleChef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleBiju Nair
 
HBase Internals And Operations
HBase Internals And OperationsHBase Internals And Operations
HBase Internals And OperationsBiju Nair
 
Apache Kafka Reference
Apache Kafka ReferenceApache Kafka Reference
Apache Kafka ReferenceBiju Nair
 
Serving queries at low latency using HBase
Serving queries at low latency using HBaseServing queries at low latency using HBase
Serving queries at low latency using HBaseBiju Nair
 
Multi-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalMulti-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalBiju Nair
 
Cursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixCursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixBiju Nair
 
Hadoop security
Hadoop securityHadoop security
Hadoop securityBiju Nair
 
Chef patterns
Chef patternsChef patterns
Chef patternsBiju Nair
 

Mais de Biju Nair (8)

Chef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scaleChef conf-2015-chef-patterns-at-bloomberg-scale
Chef conf-2015-chef-patterns-at-bloomberg-scale
 
HBase Internals And Operations
HBase Internals And OperationsHBase Internals And Operations
HBase Internals And Operations
 
Apache Kafka Reference
Apache Kafka ReferenceApache Kafka Reference
Apache Kafka Reference
 
Serving queries at low latency using HBase
Serving queries at low latency using HBaseServing queries at low latency using HBase
Serving queries at low latency using HBase
 
Multi-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalMulti-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-final
 
Cursor Implementation in Apache Phoenix
Cursor Implementation in Apache PhoenixCursor Implementation in Apache Phoenix
Cursor Implementation in Apache Phoenix
 
Hadoop security
Hadoop securityHadoop security
Hadoop security
 
Chef patterns
Chef patternsChef patterns
Chef patterns
 

Último

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

HBase Application Performance Improvement

  • 1. HBase  Cache  &  Performance   Biju  Nair   Boston  Hadoop  User  Group  Meet-­‐up   28  May  2015  
  • 2. HBase  Overview   •  Key  value  store   •  Column  family  oriented   •  Data  stored  as  byte[]   •  Data  indexed  by  key  value   •  Data  stored  in  sorted  order  by  key   •  Data  model  doesn’t  have  to  be  pre-­‐defined   •  Scales  horizontally   2  
  • 3. HBase  Overview   create  ‘stock’,  ‘company’,  ‘financials’   3   … msft,company,loc,ts1,Seattle msft,company,name,ts1,Microsoft … orcl,company,loc,ts1,Redwood orcl,company,name,ts1,Oracle … … msft,financials,cap,ts1,379B msft,financials,pe,ts1,20 … orcl,financials,cap,ts1,190B orcl,financials,pe,ts1,18 … Physical  Storage   put  ‘stock’,  ’ms9’,  ‘company:name’,  ‘microso9’   get  ‘stock’,  ’ms9’   company:loc,ts1,Seattle company:name,ts1,Microsoft financials:cap,ts1,379B financials:PE,ts1,20
  • 4. HBase  Overview   4   … appl,company… … ge,company… … ibm,company… … msft,company… msft,company… … orcl,company… orcl,company… … … appl,financials… … ge,financials… … ibm,financials… … msft,financials… msft,financials… … orcl,financials… orcl,financials… … … appl,company… … … ge,company… … … ibm,company… … … msft,company… … … orcl,company… … … appl,financials… … … ge,financials… … … ibm,financials… … … msft,financials… … … orcl,financials… … Regions  
  • 5. HBase  Overview   5   Region  Server  Region  Server  Region  Server  Region  Server  Region  Server   … appl,company… … … ge,company… … … ibm,company… … … msft,company… … … orcl,company… … HBase  Master   ZooKeeper   Client  
  • 6. Use  Case:  Data  and  Query   •  Time  series  data   – Tickers  and  aYributes   – Monthly  data  stored  in  a  column;  256  bytes   – Up  to  20  years  worth  of  data   •  Queries   – “get”s  for  up  to  1  year  data;  3072  bytes   6  
  • 7. Use  Case:  Requirements   •  Meet  “get”  query  performance  requirements   – Under  10  ms  for  99%  of  queries   – Median  latency  2  to  3  ms   – 99.99%  latency  under  50  ms   •  Efficient  HBase  cluster  capacity  uelizaeon   – 32  cores  per  node   – 128  GB  of  memory  per  node   – SSD  storage  in  all  nodes   7  
  • 8. Baseline  Test  Observaeons   •  Spikes  in  read  response  emes   •  Less  than  10%  uelizaeon  of  RS  node  CPUs   •  Less  than  15%  uelizaeon  of  RS  node  memory   •  Block  cache  uelizaeon  was  inefficient   – Low  hit  raeo  and  high  eviceon  rates   8  
  • 9. HBase  Internals  (Simplified)   HBase  Memory  (RS)   Mem  Store   Block  cache   HBase  Storage   WAL   HFiles   9  
  • 10. HBase  Write  Path  (Simplified)   HBase  Memory  (RS)   Mem  Store   Block  cache   HBase  Storage   WAL   HFiles   1 10   3   2
  • 11. HBase  Read  Path  (Simplified)   HBase  Memory  (RS)   Mem  Store   Block  cache   HBase  Storage   WAL   HFiles   1 2 11  
  • 12. Cache  Uelizaeon   •  Low  hit  raeo  and  high  eviceon  rates   •  Frequently  read  data  size   – ~  3  K   •  Table  block  size   – 65  K   •  Proposed  change   – Reduce  block  size   12  
  • 13. Impact  of  Table  Blk  Size  Change   Avg 3.002 5.362 5.361 5.357 6.419 6.369 6.405 6.383 6.188 6.196 6.182 6.174 6.246 6.264 6.268 6.253 5.194 5.207 5.219 3.031 Median 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 95% 10 15 15 15 18 18 18 18 18 18 17 17 18 18 18 18 15 15 15 10 99% 15 26 26 26 30 30 30 30 28 28 28 28 29 29 29 29 25 24 25 15 99.90% 26 41 41 41 45 45 45 45 43 43 43 43 44 44 44 44 41 41 41 26 Max 2261 127 185 102 90 106 92 102 93 106 119 114 89 140 132 82 81 150 93 1910 BAvg 16.731 16.728 16.761 16.763 16.418 16.371 16.37 16.431 16.152 16.14 16.169 16.158 16.308 16.29 16.325 16.307 16.34 16.381 16.391 16.352 BMedian 14 14 14 14 13 13 13 13 15 15 15 15 13 13 13 13 13 13 13 13 B95% 41 41 41 41 41 41 41 41 43 43 43 43 40 40 40 40 41 41 41 41 B99% 55 55 55 55 54 54 54 54 55 55 55 55 54 54 54 54 54 54 55 54 B99.9% 71 71 71 71 70 70 70 70 67 67 67 67 71 70 70 71 71 71 71 70 BMax 545 1062 559 567 1075 1027 561 567 564 541 558 1062 1062 561 1075 1072 1067 563 1035 1032 Get  Performance  (ms)  –  64  K  Blk   Get  Performance  (ms)  –  16  K  Blk   Note:  Smaller  block  size  increases  the  overhead  of  increased  index  blocks     13  
  • 14. Memory  uelizaeon/Latency  Spikes   •  JVM  GC  contributed  to  latency  spikes   •  Increase  in  heap  size  increased  GC  eme   – Prevented  using  all  the  available  memory       •  Proposed  change:  Use  off-­‐heap  caching   – Minimize  spikes  in  response  eme  due  to  GC   – Increased  uelizaeon  of  node  memory   14  
  • 15. HBase  Off-­‐Heap  Caching   HBase  Memory  (RS)   Mem  Store   Block  cache  (L1)  Idx  &  BF  data   HBase  Storage   WAL   HFiles   Off-­‐heap  cache  (L2)  Tbl  Data   (Bucket  Cache)   15  
  • 16. HBase  Read  Path  (Simplified)   HBase  Memory  (RS)   Mem  Store   Block  cache   HBase  Storage   WAL   HFile   1 2 L2  Cache   3 4 16  
  • 17. Bucket  Cache  Configuraeon   •  Hbase env.sh HBASE_REGIONSERVER_OPTS parameters –  Xmx –  XX:MaxDirectMemorySize •  Hbase site.xml properties –  hbase.regionserver.global.memstore.upperLimit –  hfile.block.cache.size –  hbase.bucketcache.size –  hbase.bucketcache.ioengine –  hbase.bucketcache.percentage.in.combinedcache   17  
  • 18. Bucket  Cache  Configuraeon   Item   id   Values   Total  RS  memory   Tot   Memstore  size   MSz   L1  (LRU)  Cache   L1Sz   Heap  for  JVM   JHSz   XX:MaxDirectMemorySize   DMem   Tot-­‐MSz-­‐L1Sz-­‐JHSz   Xmx   Xmx   MSz+L1Sz+JHSz   hbase.regionserver.global.memstore.upperLimit   ULim   MSz/Xmx   hfile.block.cache.size   blksz   0.8-­‐ULim   hbase.bucketcache.size   bucsz   Dmem+(blksz*Xmx)   hbase.bucketcache.percentage.in.combinedcache   ccsz   1-­‐((blksz*Xmx)/bucsz))   hbase.bucketcache.ioengine   Oueap/”file:/localfile”   18  
  • 19. Bucket  Cache  Configuraeon   Item   id   Values   Total  RS  memory   Tot   96000   Memstore  size   MSz   2000   L1  (LRU)  Cache   L1Sz   2000   Heap  for  JVM   JHSz   1000   XX:MaxDirectMemorySize   DMem   91000   Xmx   Xmx   5000   hbase.regionserver.global.memstore.upperLimit   ULim   0.4   hfile.block.cache.size   blksz   0.4   hbase.bucketcache.size   bucsz   93000   hbase.bucketcache.percentage.in.combinedcache   ccsz   0.97849   hbase.bucketcache.ioengine   ”file:/localfile”   19  
  • 20. Impact  of  Using  Off-­‐Heap  Cache   Get  Performance  with  L1  cache     Get  Performance  with  L1  &  L2  cache   Note:  L1  cache  test  used  38  GB  of  data,  L1+L2  test  used  3  TB  of  data     Avg 3.872 3.995 3.936 4.007 4.052 Median 1 1 1 1 1 95% 14 14 14 15 15 99% 20 20 20 20 20 99.90% 27 27 27 28 28 99.99% 36 36 36 37 37 99.999% 208 310 332 207 232 Max 1360 1906 1736 1359 1363 807Mil797107Mil7Requests BAvg 3.429 2.552 3.447 3.502 3.554 BMedian 2 2 2 2 2 B95% 10 8 10 10 10 B99% 18 14 18 18 18 B99.9% 30 23 30 30 31 BMax 78 1135 58 77 67 18Mil8Rows8>818Mil8Requests 20  
  • 21. Maximize  CPU  &  Memory  Uelizaeon   •  Run  addieonal  RS  per  node   •  Throughput  increased  50%  when  RS  increased   to  2   – Through  put  reduced  on  AWS  cluster   – There  was  no  degradaeon  on  the  response  eme   – Through  put  increase  tapered  awer  3  RS  per  node   •  Note:  Maintenance  over  head  using  mule-­‐RS   21  
  • 22. Known  Issues   •  Using  “oueap”  opeon  of  BucketCache  prevents   RS  start   –  [HBASE-­‐10643]   –  Can  be  miegated  using  tempfs   •  LoadIncrementalHFiles  doesn’t  work  with   BucketCache     –  [HBase-­‐10500]   •  BucketCache  for  different  block  sizes  is  not   configurable   –  [HBASE-­‐10641]  Fixed   22  
  • 23. Key  Takeaways   •  Store  what  is  really  required   – Understand  the  query  paYern   – Leverage  column  family  (CF)  to  group  data   •  Choose  appropriate  block  size  for  table/CF   •  Use  off  heap  cache  to  minimize  latency  spikes   •  Test  all  assumpeons   23  
  • 24. Further  Reading   •  hYp://blog.asquareb.com/blog/2014/11/21/leverage-­‐hbase-­‐cache-­‐and-­‐ improve-­‐read-­‐performance   •  hYp://blog.asquareb.com/blog/2014/11/24/how-­‐to-­‐leverage-­‐large-­‐ physical-­‐memory-­‐to-­‐improve-­‐hbase-­‐read-­‐performance   •  hYps://issues.apache.org/jira/browse/HBASE-­‐7404   •  hYp://www.n10k.com/blog/blockcache-­‐101/     •  hYp://www.n10k.com/blog/blockcache-­‐showdown/   24