SlideShare uma empresa Scribd logo
1 de 52
Supporting HBase:
How to Stabilize, Diagnose and Repair


     Jeff Bean, Jonathan Hsieh, Kathleen Ting
      {jwfbean,jon,kathleen}@cloudera.com
                     5/22/12

                         HBaseCon 2012. 5/22/12
              Copyright 2012 Cloudera Inc. All rights reserved
Who Are We?

• Jeff Bean
   • Designated Support Engineer, Cloudera
   • Education Program Lead, Cloudera
• Kathleen Ting
   • Support Manager, Cloudera
   • ZooKeeper Subject Matter Expert
• Jonathan Hsieh
   • Software Engineer, Cloudera
   • Apache HBase Committer and PMC member


                               HBaseCon 2012. 5/22/12                  2
                    Copyright 2012 Cloudera Inc. All rights reserved
Outline

• Preventative HBase Medicine:
  • Tips for a healthy HBase


• The HBase Triage:
  • Fixes for acute HBase pains


• The HBase Surgery:
  • Repairing a Corrupted HBase




                               HBaseCon 2012. 5/22/12                  3
                    Copyright 2012 Cloudera Inc. All rights reserved
Outline

• Preventative HBase Medicine:
  • Tips for a healthy HBase


• The HBase Triage:
  • Fixes for acute HBase pains


• The HBase Surgery:
  • Repairing a Corrupted HBase




                               HBaseCon 2012. 5/22/12                  4
                    Copyright 2012 Cloudera Inc. All rights reserved
“Monitor your system, exercise your
workload, and eat your vegetables.”




                          HBaseCon 2012. 5/22/12                  5
               Copyright 2012 Cloudera Inc. All rights reserved
HBaseCon 2012. 5/22/12                  6
Copyright 2012 Cloudera Inc. All rights reserved
HBaseCon 2012. 5/22/12                  7
Copyright 2012 Cloudera Inc. All rights reserved
HBase Cross-Section

                       App                             MR



                                      HBase



                  ZooKeeper                           HDFS



                                 JVM / Linux



                              Disk / Network


                          HBaseCon 2012. 5/22/12                  8
               Copyright 2012 Cloudera Inc. All rights reserved
Doctor’s Advice: “A ounce of prevention worth a
pound of cure.”
                                            • Understand your
                                              workload and test for it
                                            • Size your cluster
                                              properly (see Cluster
                                              Sizer)
                                            • Monitor, alert, and
                                              manage your cluster
                                              with
                                              Ganglia, Nagios, and/or
                                              Cloudera Manager
                                                     • Don’t be Dr. House!

                          HBaseCon 2012. 5/22/12                             9
               Copyright 2012 Cloudera Inc. All rights reserved
A Case Study




               Copyright 2012 Cloudera Inc. All rights reserved   10
Symptom: Long Running MapReduce job with
blacklisted TaskTrackers
                                                                      App           MR
          TaskTracker                No. of
                                    Failures
       NodeX                    4                                           HBase
       NodeY                    3
       NodeQ                    7
                                                                      ZK        HDFS
       NodeB                    10
       NodeP                    8
       NodeV                    6                                      JVM / Linux



                                                                      Disk / Network


                              HBaseCon 2012. 5/22/12                                     11
                   Copyright 2012 Cloudera Inc. All rights reserved
Symptom: Node B Task Logs
$ find . | xargs grep "giving up“
                                                                                  App           MR
./attempt_201107261334_0221_m_000962_1/syslog:2011-08-02
     11:09:34,248 INFO org.apache.hadoop.ipc.HbaseRPC: Server at
     NodeA:60020 could not be reached after 1 tries, giving up.
./attempt_201107261334_0221_m_000962_1/syslog:2011-08-02
     11:09:37,328 INFO org.apache.hadoop.ipc.HbaseRPC: Server at                        HBase
     NodeA:60020 could not be reached after 1 tries, giving up.
./attempt_201107261334_0221_m_000962_1/syslog:2011-08-02
     11:09:40,465 INFO org.apache.hadoop.ipc.HbaseRPC: Server at
     NodeA:60020 could not be reached after 1 tries, giving up.
                                                                                  ZK        HDFS



                                                                                   JVM / Linux



                                                                                  Disk / Network


                                          HBaseCon 2012. 5/22/12                                     12
                               Copyright 2012 Cloudera Inc. All rights reserved
Symptom: RegionServer logs of Node A:
2011-08-02 11:04:20,324 FATAL                                                              MR
                                                                             App
    org.apache.hadoop.hbase.regionserver.HRegionServer
: ABORTING region server
    serverName=NodeA,60020,1312228900706,
    load=(requests=10847, regions=342, usedHeap=8193,
                                                                                   HBase
    maxHeap=15350): regions
erver:60020-0x4316487a73e1626 regionserver:60020-
    0x4316487a73e1626 received expired from ZooKeeper,
    aborting                                                                 ZK        HDFS
org.apache.zookeeper.KeeperException$SessionExpiredEx
    ception: KeeperErrorCode = Session expired

                                                                              JVM / Linux



                                                                             Disk / Network


                                     HBaseCon 2012. 5/22/12                                     13
                          Copyright 2012 Cloudera Inc. All rights reserved
Cascading failure! Some other node says ouch…
2011-08-01 12:55:39,356 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook:
      Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-
      15,5,main]                                                                            App           MR
2011-08-01 12:55:39,629 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
      STOPPED: Shutdown hook
2011-08-01 12:55:39,629 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook:
      Starting fs shutdown hook thread.
2011-08-01 12:55:39,695 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closing file
      /hbase/.logsNodeA,60020,1311651881177NodeA%3A60020.1311656326143 :
      java.io.IOException: Error Recovery for block blk_1102151039331207284_16350929              HBase
      failed because recovery from primary datanode NodeA:50010 failed 6 times. Pipeline
      was NodeA:50010. Aborting...
java.io.IOException: Error Recovery for block blk_1102151039331207284_16350929 failed
      because recovery from primary datanode NodeA:50010 failed 6 times. Pipeline was
      NodeA:50010. Aborting...
    at
      org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSCli
                                                                                            ZK        HDFS
      ent.java:2841)
    at
      org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2
      305)
    at
      org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.j
      ava:2477)
                                                                                             JVM / Linux
2011-08-01 12:55:39,842 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook:
      Shutdown hook finished.



                                                                                            Disk / Network


                                                    HBaseCon 2012. 5/22/12                                     14
                                         Copyright 2012 Cloudera Inc. All rights reserved
Symptom: Ganglia Memory Graph on Node A…

                                                                App           MR



                                                                      HBase



                                                                ZK        HDFS



                                                                 JVM / Linux



                                                                Disk / Network


                        HBaseCon 2012. 5/22/12                                     15
             Copyright 2012 Cloudera Inc. All rights reserved
Symptom: Ganglia swap_free on Node A…

                                                                 App           MR



                                                                       HBase



                                                                 ZK        HDFS



                                                                  JVM / Linux



                                                                 Disk / Network


                         HBaseCon 2012. 5/22/12                                     16
              Copyright 2012 Cloudera Inc. All rights reserved
A Case study: Radiant Pain
“I was having back pains, and it turned out to be my heart!”




                                                                                                       Masters Take
                                     Node A swaps
• Too many MR Slots                                               • MapReduce tasks fail                 Action
• MR Slots too large                                              • HDFS datanode
                            • “Arbitrary” processes                 operations time out       • JobTracker blacklists TT
• Too many non-HBase
                              pause or unresponsive                                             on node B
  small files (HDFS-2379)                                         • HBase client operations
                                                                    fail                      • Jobs fail or run slow
                                                                                              • NameNode re-replicates
                                                                                                blocks from node A
        Node A Under                                                       Node B can’t
            Load                                                         connect to node A




                                               HBaseCon 2012. 5/22/12                                                      17
                                    Copyright 2012 Cloudera Inc. All rights reserved
Event Trail and Evidence Trail

    Node A        Node A                              Node B         Master
   condition       event                             symptom          Action
     (load)       (swap)                             (connect)      (blacklist)




                   Transient                                          Master
    Node A                                                 Node B
                   swap not                                          Logs and
   Monitoring                                               Logs
                    logged!                                            UIs




                           HBaseCon 2012. 5/22/12                                 18
                Copyright 2012 Cloudera Inc. All rights reserved
DOs and DON’Ts for keeping HBase Healthy

 DOs                                            DON’Ts
 • Monitor and Alert                            • Swap
 • Optimize network                             • Oversubscribe MR
 • Know your logs                               • Share the network




                           HBaseCon 2012. 5/22/12
                Copyright 2012 Cloudera Inc. All rights reserved      19
Outline

• Preventative HBase Medicine:
  • Tips for a healthy HBase


• The HBase Triage:
  • Fixes for acute HBase pains


• The HBase Surgery:
  • Repairing a Corrupted HBase




                               HBaseCon 2012. 5/22/12                  20
                    Copyright 2012 Cloudera Inc. All rights reserved
“Cloudera 911 here, how can we help?”
HBase Support Tickets



        28%                                                       HBase, ZK, MR, HDFS
                                                                  Misconfig
                                        44%                       Patch Required

                                                                  Fix HW/NW

                                                                  Repair Needed
        16%


                 12%


                          HBaseCon 2012. 5/22/12                                   22
               Copyright 2012 Cloudera Inc. All rights reserved
Understanding the logs helps us diagnose issues
• Related events logged by different processes in different places
• Log messages point at each other
   • HDFS accesses by RS logged by NN and DN
   • HBase accesses by MR logged by JT, RS, NN, ZK
   • ZK logs indicate HBase health




                                HBaseCon 2012. 5/22/12                  23
                     Copyright 2012 Cloudera Inc. All rights reserved
The HBase Triage: Fixes for acute HBase pains


• Severe Pain

• Complete Unconsciousness




                Copyright 2012 Cloudera Inc. All rights reserved   24
The HBase Triage: Fixes for acute HBase pains


• Severe Pain

• Complete Unconsciousness




                Copyright 2012 Cloudera Inc. All rights reserved   25
Connection Reset
WARN - Session <id> for server <server                                  App           MR
id>, unexpected error, closing socket
connection and attempting reconnect
java.io.IOException: Connection reset by peer                                 HBase



What causes this?                                                       ZK        HDFS
• Running out of ZK connections

                                                                         JVM / Linux
How can it be resolved?
• Manually close connections
                                                                        Disk / Network
• Fixed in HBASE-5466 and HBASE-4773

                                HBaseCon 2012. 5/22/12                                     26
                     Copyright 2012 Cloudera Inc. All rights reserved
Running out of DN Threads & File Descriptors
INFO hdfs.DFSClient: Could not obtain block <blk                          App           MR
id> from any node: java.io.IOException: No live
nodes contain current block.
ERROR java.io.IOException: Too many open files                                  HBase



What causes this?                                                         ZK        HDFS
• HBase likes to keep data files open
                                                                           JVM / Linux
How can it be resolved?
• Increase dfs.datanode.max.xcievers to 4096
                                                                          Disk / Network
• Increase /etc/security/limits.conf
   • hbase - nofile 32768
                                  HBaseCon 2012. 5/22/12                                     27
                       Copyright 2012 Cloudera Inc. All rights reserved
“Long Garbage Collecting Pause”
WARN org.apache.hadoop.hbase.util.Sleeper:                                 App           MR
We slept 19118ms instead of 1000ms, this is
likely due to a long garbage collecting pause and
it's usually bad                                                                 HBase



How can it be resolved?                                                    ZK        HDFS
• zoo.cfg: maxSessionTimeout=180000
   hbase-site.xml: zookeeper.session.timeout=180000
                                                                            JVM / Linux
• Oversubscribed if MR & HBase are co-located

                                                                           Disk / Network


                                   HBaseCon 2012. 5/22/12                                     28
                        Copyright 2012 Cloudera Inc. All rights reserved
Heap Allocation Per Node



                                                            (Map + Red) x Child Heap
                                                                       +
                                                                   DN heap
                                                                       +
     Total RAM                                                      TT heap
                                                                       +
                                                                    RS heap
                                                                       +
                                                               OS (20% of RAM)




              HBaseCon 2012. 5/22/12                                                   29
              Copyright 2012 Cloudera Inc. All rights reserved
The HBase Triage: Fixes for acute HBase pains


• Severe Pain

• Complete Unconsciousness




                Copyright 2012 Cloudera Inc. All rights reserved   30
ZK can’t start & HBase hangs
INFO org.apache.hadoop.hdfs.DFSClient: Could                            App           MR
not complete file <name> retrying…

                                                                              HBase
What causes this?
• High dfs.replication.min causes HBase hang -
  can’t close file until created all replicas                           ZK        HDFS
How can it be resolved?
• Remove dfs.replication.min                                             JVM / Linux
• Temp increase dfs.balance.bandwidthPerSec
• Fixed in HDFS-2936
                                                                        Disk / Network


                                HBaseCon 2012. 5/22/12                                     31
                     Copyright 2012 Cloudera Inc. All rights reserved
Unable to Load Database
FATAL                                                                    App           MR
org.apache.zookeeper.server.quorum.QuorumP
eer: Unable to load database on disk
                                                                               HBase

What causes this?
• ZK data directories filled up                                          ZK        HDFS


How can it be resolved?                                                   JVM / Linux
• Wipe out /var/zookeeper/version-2
• Run zkCleanup.sh script via cron
                                                                         Disk / Network


                                 HBaseCon 2012. 5/22/12                                     32
                      Copyright 2012 Cloudera Inc. All rights reserved
Downed HBase Master and RegionServers
WARN                                                                    App           MR
org.apache.zookeeper.server.quorum.Learner:
Exception when following the leader
java.net.SocketTimeoutException: Read timed out                               HBase



What causes this?                                                       ZK        HDFS
• Session Timeout + Session Expiration = NW Prob

                                                                         JVM / Linux
How can it be resolved?
• Monitor network (e.g. ifconfig)
                                                                        Disk / Network
• Run ≥ 3 ZK servers (majority rules)

                                HBaseCon 2012. 5/22/12                                     33
                     Copyright 2012 Cloudera Inc. All rights reserved
The HBase Triage: Fixes for acute HBase pains



• Severe Pain

• Complete Unconsciousness




                Copyright 2012 Cloudera Inc. All rights reserved   34
Outline

• Preventative HBase Medicine:
  • Tips for a healthy HBase


• The HBase Triage:
  • Fixes for acute HBase pains


• The HBase Surgery:
  • Repairing a Corrupted HBase




                               HBaseCon 2012. 5/22/12                  35
                    Copyright 2012 Cloudera Inc. All rights reserved
“To the operating room, please”

• Hbase refuses to start
• Hbase’s HBCK reports inconsistencies




                                         36
HBase Support Tickets



        28%                                                       HBase, ZK, MR, HDFS
                                                                  Misconfig
                                               44%                Patch Required

                                                                  Fix HW/NW

                                                                  Repair Needed
        16%


                 12%


                          HBaseCon 2012. 5/22/12                                   37
               Copyright 2012 Cloudera Inc. All rights reserved
HBase Support Tickets



        28%                                                       HBase, ZK, MR, HDFS
                                                                  Misconfig
                                               44%                Patch Required

                                                                  Fix HW/NW

                                                                  Repair Needed
        16%


                 12%


                          HBaseCon 2012. 5/22/12                                   38
               Copyright 2012 Cloudera Inc. All rights reserved
Detecting internal problems with hbck
• HBase since 0.90 has
  included a tool for
  scanning an HBase
  instance’s internals to
  find corruptions.

hbase hbck

hbase hbck -details




                    Copyright 2012 Cloudera Inc. All rights reserved   39
Tables are sharded into regions

                                                            0000000000




                                                            1111111111




                                                            2222222222
                                                                         *‘’, A)
     0000000000




     1111111111




     2222222222




                                                                         [A, B)
                                                            3333333333



     3333333333
                                                            4444444444



     4444444444




     5555555555


                                                            5555555555




                                                                         *B, ‘’)
     6666666666


                                                            6666666666


     7777777777


                                                            7777777777




    Invariants: Maintain table integrity and region consistency !

                               HBaseCon 2012. 5/22/12                              40
                    Copyright 2012 Cloudera Inc. All rights reserved
Table Integrity Invariants

    *‘ ‘,A)    • Every key shall get assigned to a
    [A,B)        single region.
    [B, C)
    [C, D)
               • Table Regions shall:
    [D, E)           • Cover the entire range of possible
    [E, F)             keys,
    [F, G)           • from the absolute start (‘’)
    *G, ‘ ‘)
                     • to the absolute end
                       (unfortunately, also ‘’).




                           HBaseCon 2012. 5/22/12                  41
                Copyright 2012 Cloudera Inc. All rights reserved
Region Consistency Invariants

                                    info:regioninfo
                                        in META                                        Orphans
                                          


                                                                    
                                             
                                       Region
                                      Consistent

              
                                                                               
          Assigned on                                                   .regioninfo
          Region server                                                     in HDFS


                             HBaseCon 2012. 5/22/12                                              42
                  Copyright 2012 Cloudera Inc. All rights reserved
Repairing internal problems with hbck
• Newer and upcoming
  versions of HBase                                                    Look’s like you’ve
  include an hbck that can                                            broken an invariant
  fix internal problem as
  well as detect.
   • 0.90.7
   • 0.92.2
   • 0.94.0
   • CDH3u4+
   • CDH4b2+


                   Copyright 2012 Cloudera Inc. All rights reserved                         43
Bad region assignment

                                        info:regioninfo
                                            in META                                    Orphans




                                           Region
                                          Consistent



              Assigned on                                                .regioninfo
              Region server                                                 in HDFS
 hbck -fix (0.90.x)
 hbck –fixAssignments (0.90.7+, 0.92.2+, 0.94+)
                                 HBaseCon 2012. 5/22/12                                          44
                      Copyright 2012 Cloudera Inc. All rights reserved
Region not in META

                                        info:regioninfo
                                            in META                                    Orphans




                                           Region
                                          Consistent



              Assigned on                                                .regioninfo
              Region server                                                 in HDFS
 hbck –fixAssignments -fixMeta

                                 HBaseCon 2012. 5/22/12                                          45
                      Copyright 2012 Cloudera Inc. All rights reserved
Regioninfo not in HDFS

                                        info:regioninfo
                                            in META                                    Orphans




                                           Region
                                          Consistent



              Assigned on                                                .regioninfo
              Region server                                                 in HDFS
 hbck –fixAssignments -fixMeta

                                 HBaseCon 2012. 5/22/12                                          46
                      Copyright 2012 Cloudera Inc. All rights reserved
Table Regions must not have holes

    *‘ ‘,A)                   • Where to I put row key “CRUD”?
    [A,B)
    [B, C)
                              • Where is region [C,D)?
                    ?

    [D, E)
                              • Repair:
    [E, F)
                                       • Find the orphan and adopt it.
    [F, G)
    *G, ‘ ‘)                           • Fabricate a new region to fill the
                                         hole


 # NOTE! HBase should be idle (no get/put/split/compacts)
 hbck –fixHdfsHoles –fixHdfsOrphans –fixAssignments -fixMeta
                                   HBaseCon 2012. 5/22/12                     47
                        Copyright 2012 Cloudera Inc. All rights reserved
Table Regions must not overlap

    *‘ ‘,A)                                     • Hm.. Which region should
    [A,B)                 ?
                                    ?
                                                  “BAD” go?
    [B, D)     [B,C)
               [C, D)
                                                • Is it [B, D) or is it [B,C)?
    [D, E)                                      • Likely due to a bad split.
    [E, F)
                                                • Repair:
    [F, G)
    *G, ‘ ‘)                                            • Merge regions or,
                                                        • Sideline and bulk load


 # NOTE! HBase should be idle (no get/put/split/compacts)
 hbck –fixHdfsOverlaps –fixAssignments -fixMeta
                                   HBaseCon 2012. 5/22/12                          48
                        Copyright 2012 Cloudera Inc. All rights reserved
Consistency problem summary

                                        info:regioninfo
                                            in META                                    Orphans




                                           Region
                                          Consistent



              Assigned on                                                .regioninfo
              Region server                                                 in HDFS
 hbck –fixAssignments –fixMeta –fixHdfsHoles –fixHdfsOrphans –
 fixHdfsOverlaps
                                 HBaseCon 2012. 5/22/12                                          49
                      Copyright 2012 Cloudera Inc. All rights reserved
Investigating further

• HFile – examine contents of HFiles
• Hlog – examine contents of HLog file
• OfflineMetaRepair – Rebuild meta table from file
  system.
• Also, some scripts for manual repairs:
  https://github.com/jmhsieh/hbase-repair-scripts




                             HBaseCon 2012. 5/22/12                  50
                  Copyright 2012 Cloudera Inc. All rights reserved
Outline

• Preventative HBase Medicine:
  • Tips for a healthy HBase


• The HBase Triage:
  • Fixes for acute HBase pains


• The HBase Surgery:
  • Repairing a Corrupted HBase




                               HBaseCon 2012. 5/22/12                  51
                    Copyright 2012 Cloudera Inc. All rights reserved
Questions?



           HBaseCon 2012. 5/22/12                  52
Copyright 2012 Cloudera Inc. All rights reserved

Mais conteúdo relacionado

Mais procurados

Data Locality, Latency and Caching: JSR-107 and the new Java JCACHE Standard
Data Locality, Latency and Caching:  JSR-107 and the new Java JCACHE StandardData Locality, Latency and Caching:  JSR-107 and the new Java JCACHE Standard
Data Locality, Latency and Caching: JSR-107 and the new Java JCACHE StandardBen Cotton
 
Upgrading to Oracle 11gR2
Upgrading to Oracle 11gR2Upgrading to Oracle 11gR2
Upgrading to Oracle 11gR2Syed Hussain
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systexJames Chen
 
Data Guard Deep Dive UKOUG 2012
Data Guard Deep Dive UKOUG 2012Data Guard Deep Dive UKOUG 2012
Data Guard Deep Dive UKOUG 2012Emre Baransel
 
Tx lf propercareandfeedmysql
Tx lf propercareandfeedmysqlTx lf propercareandfeedmysql
Tx lf propercareandfeedmysqlDave Stokes
 

Mais procurados (6)

Data Locality, Latency and Caching: JSR-107 and the new Java JCACHE Standard
Data Locality, Latency and Caching:  JSR-107 and the new Java JCACHE StandardData Locality, Latency and Caching:  JSR-107 and the new Java JCACHE Standard
Data Locality, Latency and Caching: JSR-107 and the new Java JCACHE Standard
 
Upgrading to Oracle 11gR2
Upgrading to Oracle 11gR2Upgrading to Oracle 11gR2
Upgrading to Oracle 11gR2
 
Troug 11 gnf
Troug 11 gnfTroug 11 gnf
Troug 11 gnf
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
 
Data Guard Deep Dive UKOUG 2012
Data Guard Deep Dive UKOUG 2012Data Guard Deep Dive UKOUG 2012
Data Guard Deep Dive UKOUG 2012
 
Tx lf propercareandfeedmysql
Tx lf propercareandfeedmysqlTx lf propercareandfeedmysql
Tx lf propercareandfeedmysql
 

Semelhante a 4 supporting h base jeff, jon, kathleen - cloudera - final 2

Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clustersenissoz
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsDataWorks Summit/Hadoop Summit
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaCloudera, Inc.
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introducejhao niu
 
Extend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop EcosystemExtend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop EcosystemFei Dong
 
HBase tales from the trenches
HBase tales from the trenchesHBase tales from the trenches
HBase tales from the trencheswchevreuil
 
Hadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and RepairHadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and RepairCloudera, Inc.
 
Building an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen TingBuilding an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen Tingjaxconf
 
Hadoop Technical Presentation
Hadoop Technical PresentationHadoop Technical Presentation
Hadoop Technical PresentationErwan Alliaume
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
2013 11-19-hoya-status
2013 11-19-hoya-status2013 11-19-hoya-status
2013 11-19-hoya-statusSteve Loughran
 
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?  Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You? EMC
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYWangda Tan
 

Semelhante a 4 supporting h base jeff, jon, kathleen - cloudera - final 2 (20)

Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
HBase with MapR
HBase with MapRHBase with MapR
HBase with MapR
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
 
Hoya for Code Review
Hoya for Code ReviewHoya for Code Review
Hoya for Code Review
 
Extend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop EcosystemExtend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop Ecosystem
 
HBase tales from the trenches
HBase tales from the trenchesHBase tales from the trenches
HBase tales from the trenches
 
Hadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and RepairHadoop Summit 2012 | Improving HBase Availability and Repair
Hadoop Summit 2012 | Improving HBase Availability and Repair
 
Building an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen TingBuilding an Impenetrable ZooKeeper - Kathleen Ting
Building an Impenetrable ZooKeeper - Kathleen Ting
 
Hadoop Technical Presentation
Hadoop Technical PresentationHadoop Technical Presentation
Hadoop Technical Presentation
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
2013 11-19-hoya-status
2013 11-19-hoya-status2013 11-19-hoya-status
2013 11-19-hoya-status
 
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?  Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
 
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NYApache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
Apache hadoop 3.x state of the union and upgrade guidance - Strata 2019 NY
 

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

4 supporting h base jeff, jon, kathleen - cloudera - final 2

  • 1. Supporting HBase: How to Stabilize, Diagnose and Repair Jeff Bean, Jonathan Hsieh, Kathleen Ting {jwfbean,jon,kathleen}@cloudera.com 5/22/12 HBaseCon 2012. 5/22/12 Copyright 2012 Cloudera Inc. All rights reserved
  • 2. Who Are We? • Jeff Bean • Designated Support Engineer, Cloudera • Education Program Lead, Cloudera • Kathleen Ting • Support Manager, Cloudera • ZooKeeper Subject Matter Expert • Jonathan Hsieh • Software Engineer, Cloudera • Apache HBase Committer and PMC member HBaseCon 2012. 5/22/12 2 Copyright 2012 Cloudera Inc. All rights reserved
  • 3. Outline • Preventative HBase Medicine: • Tips for a healthy HBase • The HBase Triage: • Fixes for acute HBase pains • The HBase Surgery: • Repairing a Corrupted HBase HBaseCon 2012. 5/22/12 3 Copyright 2012 Cloudera Inc. All rights reserved
  • 4. Outline • Preventative HBase Medicine: • Tips for a healthy HBase • The HBase Triage: • Fixes for acute HBase pains • The HBase Surgery: • Repairing a Corrupted HBase HBaseCon 2012. 5/22/12 4 Copyright 2012 Cloudera Inc. All rights reserved
  • 5. “Monitor your system, exercise your workload, and eat your vegetables.” HBaseCon 2012. 5/22/12 5 Copyright 2012 Cloudera Inc. All rights reserved
  • 6. HBaseCon 2012. 5/22/12 6 Copyright 2012 Cloudera Inc. All rights reserved
  • 7. HBaseCon 2012. 5/22/12 7 Copyright 2012 Cloudera Inc. All rights reserved
  • 8. HBase Cross-Section App MR HBase ZooKeeper HDFS JVM / Linux Disk / Network HBaseCon 2012. 5/22/12 8 Copyright 2012 Cloudera Inc. All rights reserved
  • 9. Doctor’s Advice: “A ounce of prevention worth a pound of cure.” • Understand your workload and test for it • Size your cluster properly (see Cluster Sizer) • Monitor, alert, and manage your cluster with Ganglia, Nagios, and/or Cloudera Manager • Don’t be Dr. House! HBaseCon 2012. 5/22/12 9 Copyright 2012 Cloudera Inc. All rights reserved
  • 10. A Case Study Copyright 2012 Cloudera Inc. All rights reserved 10
  • 11. Symptom: Long Running MapReduce job with blacklisted TaskTrackers App MR TaskTracker No. of Failures NodeX 4 HBase NodeY 3 NodeQ 7 ZK HDFS NodeB 10 NodeP 8 NodeV 6 JVM / Linux Disk / Network HBaseCon 2012. 5/22/12 11 Copyright 2012 Cloudera Inc. All rights reserved
  • 12. Symptom: Node B Task Logs $ find . | xargs grep "giving up“ App MR ./attempt_201107261334_0221_m_000962_1/syslog:2011-08-02 11:09:34,248 INFO org.apache.hadoop.ipc.HbaseRPC: Server at NodeA:60020 could not be reached after 1 tries, giving up. ./attempt_201107261334_0221_m_000962_1/syslog:2011-08-02 11:09:37,328 INFO org.apache.hadoop.ipc.HbaseRPC: Server at HBase NodeA:60020 could not be reached after 1 tries, giving up. ./attempt_201107261334_0221_m_000962_1/syslog:2011-08-02 11:09:40,465 INFO org.apache.hadoop.ipc.HbaseRPC: Server at NodeA:60020 could not be reached after 1 tries, giving up. ZK HDFS JVM / Linux Disk / Network HBaseCon 2012. 5/22/12 12 Copyright 2012 Cloudera Inc. All rights reserved
  • 13. Symptom: RegionServer logs of Node A: 2011-08-02 11:04:20,324 FATAL MR App org.apache.hadoop.hbase.regionserver.HRegionServer : ABORTING region server serverName=NodeA,60020,1312228900706, load=(requests=10847, regions=342, usedHeap=8193, HBase maxHeap=15350): regions erver:60020-0x4316487a73e1626 regionserver:60020- 0x4316487a73e1626 received expired from ZooKeeper, aborting ZK HDFS org.apache.zookeeper.KeeperException$SessionExpiredEx ception: KeeperErrorCode = Session expired JVM / Linux Disk / Network HBaseCon 2012. 5/22/12 13 Copyright 2012 Cloudera Inc. All rights reserved
  • 14. Cascading failure! Some other node says ouch… 2011-08-01 12:55:39,356 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook starting; hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread- 15,5,main] App MR 2011-08-01 12:55:39,629 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown hook 2011-08-01 12:55:39,629 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting fs shutdown hook thread. 2011-08-01 12:55:39,695 ERROR org.apache.hadoop.hdfs.DFSClient: Exception closing file /hbase/.logsNodeA,60020,1311651881177NodeA%3A60020.1311656326143 : java.io.IOException: Error Recovery for block blk_1102151039331207284_16350929 HBase failed because recovery from primary datanode NodeA:50010 failed 6 times. Pipeline was NodeA:50010. Aborting... java.io.IOException: Error Recovery for block blk_1102151039331207284_16350929 failed because recovery from primary datanode NodeA:50010 failed 6 times. Pipeline was NodeA:50010. Aborting... at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSCli ZK HDFS ent.java:2841) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2 305) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.j ava:2477) JVM / Linux 2011-08-01 12:55:39,842 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook finished. Disk / Network HBaseCon 2012. 5/22/12 14 Copyright 2012 Cloudera Inc. All rights reserved
  • 15. Symptom: Ganglia Memory Graph on Node A… App MR HBase ZK HDFS JVM / Linux Disk / Network HBaseCon 2012. 5/22/12 15 Copyright 2012 Cloudera Inc. All rights reserved
  • 16. Symptom: Ganglia swap_free on Node A… App MR HBase ZK HDFS JVM / Linux Disk / Network HBaseCon 2012. 5/22/12 16 Copyright 2012 Cloudera Inc. All rights reserved
  • 17. A Case study: Radiant Pain “I was having back pains, and it turned out to be my heart!” Masters Take Node A swaps • Too many MR Slots • MapReduce tasks fail Action • MR Slots too large • HDFS datanode • “Arbitrary” processes operations time out • JobTracker blacklists TT • Too many non-HBase pause or unresponsive on node B small files (HDFS-2379) • HBase client operations fail • Jobs fail or run slow • NameNode re-replicates blocks from node A Node A Under Node B can’t Load connect to node A HBaseCon 2012. 5/22/12 17 Copyright 2012 Cloudera Inc. All rights reserved
  • 18. Event Trail and Evidence Trail Node A Node A Node B Master condition event symptom Action (load) (swap) (connect) (blacklist) Transient Master Node A Node B swap not Logs and Monitoring Logs logged! UIs HBaseCon 2012. 5/22/12 18 Copyright 2012 Cloudera Inc. All rights reserved
  • 19. DOs and DON’Ts for keeping HBase Healthy DOs DON’Ts • Monitor and Alert • Swap • Optimize network • Oversubscribe MR • Know your logs • Share the network HBaseCon 2012. 5/22/12 Copyright 2012 Cloudera Inc. All rights reserved 19
  • 20. Outline • Preventative HBase Medicine: • Tips for a healthy HBase • The HBase Triage: • Fixes for acute HBase pains • The HBase Surgery: • Repairing a Corrupted HBase HBaseCon 2012. 5/22/12 20 Copyright 2012 Cloudera Inc. All rights reserved
  • 21. “Cloudera 911 here, how can we help?”
  • 22. HBase Support Tickets 28% HBase, ZK, MR, HDFS Misconfig 44% Patch Required Fix HW/NW Repair Needed 16% 12% HBaseCon 2012. 5/22/12 22 Copyright 2012 Cloudera Inc. All rights reserved
  • 23. Understanding the logs helps us diagnose issues • Related events logged by different processes in different places • Log messages point at each other • HDFS accesses by RS logged by NN and DN • HBase accesses by MR logged by JT, RS, NN, ZK • ZK logs indicate HBase health HBaseCon 2012. 5/22/12 23 Copyright 2012 Cloudera Inc. All rights reserved
  • 24. The HBase Triage: Fixes for acute HBase pains • Severe Pain • Complete Unconsciousness Copyright 2012 Cloudera Inc. All rights reserved 24
  • 25. The HBase Triage: Fixes for acute HBase pains • Severe Pain • Complete Unconsciousness Copyright 2012 Cloudera Inc. All rights reserved 25
  • 26. Connection Reset WARN - Session <id> for server <server App MR id>, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Connection reset by peer HBase What causes this? ZK HDFS • Running out of ZK connections JVM / Linux How can it be resolved? • Manually close connections Disk / Network • Fixed in HBASE-5466 and HBASE-4773 HBaseCon 2012. 5/22/12 26 Copyright 2012 Cloudera Inc. All rights reserved
  • 27. Running out of DN Threads & File Descriptors INFO hdfs.DFSClient: Could not obtain block <blk App MR id> from any node: java.io.IOException: No live nodes contain current block. ERROR java.io.IOException: Too many open files HBase What causes this? ZK HDFS • HBase likes to keep data files open JVM / Linux How can it be resolved? • Increase dfs.datanode.max.xcievers to 4096 Disk / Network • Increase /etc/security/limits.conf • hbase - nofile 32768 HBaseCon 2012. 5/22/12 27 Copyright 2012 Cloudera Inc. All rights reserved
  • 28. “Long Garbage Collecting Pause” WARN org.apache.hadoop.hbase.util.Sleeper: App MR We slept 19118ms instead of 1000ms, this is likely due to a long garbage collecting pause and it's usually bad HBase How can it be resolved? ZK HDFS • zoo.cfg: maxSessionTimeout=180000 hbase-site.xml: zookeeper.session.timeout=180000 JVM / Linux • Oversubscribed if MR & HBase are co-located Disk / Network HBaseCon 2012. 5/22/12 28 Copyright 2012 Cloudera Inc. All rights reserved
  • 29. Heap Allocation Per Node (Map + Red) x Child Heap + DN heap + Total RAM TT heap + RS heap + OS (20% of RAM) HBaseCon 2012. 5/22/12 29 Copyright 2012 Cloudera Inc. All rights reserved
  • 30. The HBase Triage: Fixes for acute HBase pains • Severe Pain • Complete Unconsciousness Copyright 2012 Cloudera Inc. All rights reserved 30
  • 31. ZK can’t start & HBase hangs INFO org.apache.hadoop.hdfs.DFSClient: Could App MR not complete file <name> retrying… HBase What causes this? • High dfs.replication.min causes HBase hang - can’t close file until created all replicas ZK HDFS How can it be resolved? • Remove dfs.replication.min JVM / Linux • Temp increase dfs.balance.bandwidthPerSec • Fixed in HDFS-2936 Disk / Network HBaseCon 2012. 5/22/12 31 Copyright 2012 Cloudera Inc. All rights reserved
  • 32. Unable to Load Database FATAL App MR org.apache.zookeeper.server.quorum.QuorumP eer: Unable to load database on disk HBase What causes this? • ZK data directories filled up ZK HDFS How can it be resolved? JVM / Linux • Wipe out /var/zookeeper/version-2 • Run zkCleanup.sh script via cron Disk / Network HBaseCon 2012. 5/22/12 32 Copyright 2012 Cloudera Inc. All rights reserved
  • 33. Downed HBase Master and RegionServers WARN App MR org.apache.zookeeper.server.quorum.Learner: Exception when following the leader java.net.SocketTimeoutException: Read timed out HBase What causes this? ZK HDFS • Session Timeout + Session Expiration = NW Prob JVM / Linux How can it be resolved? • Monitor network (e.g. ifconfig) Disk / Network • Run ≥ 3 ZK servers (majority rules) HBaseCon 2012. 5/22/12 33 Copyright 2012 Cloudera Inc. All rights reserved
  • 34. The HBase Triage: Fixes for acute HBase pains • Severe Pain • Complete Unconsciousness Copyright 2012 Cloudera Inc. All rights reserved 34
  • 35. Outline • Preventative HBase Medicine: • Tips for a healthy HBase • The HBase Triage: • Fixes for acute HBase pains • The HBase Surgery: • Repairing a Corrupted HBase HBaseCon 2012. 5/22/12 35 Copyright 2012 Cloudera Inc. All rights reserved
  • 36. “To the operating room, please” • Hbase refuses to start • Hbase’s HBCK reports inconsistencies 36
  • 37. HBase Support Tickets 28% HBase, ZK, MR, HDFS Misconfig 44% Patch Required Fix HW/NW Repair Needed 16% 12% HBaseCon 2012. 5/22/12 37 Copyright 2012 Cloudera Inc. All rights reserved
  • 38. HBase Support Tickets 28% HBase, ZK, MR, HDFS Misconfig 44% Patch Required Fix HW/NW Repair Needed 16% 12% HBaseCon 2012. 5/22/12 38 Copyright 2012 Cloudera Inc. All rights reserved
  • 39. Detecting internal problems with hbck • HBase since 0.90 has included a tool for scanning an HBase instance’s internals to find corruptions. hbase hbck hbase hbck -details Copyright 2012 Cloudera Inc. All rights reserved 39
  • 40. Tables are sharded into regions 0000000000 1111111111 2222222222 *‘’, A) 0000000000 1111111111 2222222222 [A, B) 3333333333 3333333333 4444444444 4444444444 5555555555 5555555555 *B, ‘’) 6666666666 6666666666 7777777777 7777777777 Invariants: Maintain table integrity and region consistency ! HBaseCon 2012. 5/22/12 40 Copyright 2012 Cloudera Inc. All rights reserved
  • 41. Table Integrity Invariants *‘ ‘,A) • Every key shall get assigned to a [A,B) single region. [B, C) [C, D) • Table Regions shall: [D, E) • Cover the entire range of possible [E, F) keys, [F, G) • from the absolute start (‘’) *G, ‘ ‘) • to the absolute end (unfortunately, also ‘’). HBaseCon 2012. 5/22/12 41 Copyright 2012 Cloudera Inc. All rights reserved
  • 42. Region Consistency Invariants info:regioninfo in META Orphans      Region Consistent   Assigned on  .regioninfo Region server in HDFS HBaseCon 2012. 5/22/12 42 Copyright 2012 Cloudera Inc. All rights reserved
  • 43. Repairing internal problems with hbck • Newer and upcoming versions of HBase Look’s like you’ve include an hbck that can broken an invariant fix internal problem as well as detect. • 0.90.7 • 0.92.2 • 0.94.0 • CDH3u4+ • CDH4b2+ Copyright 2012 Cloudera Inc. All rights reserved 43
  • 44. Bad region assignment info:regioninfo in META Orphans Region Consistent Assigned on .regioninfo Region server in HDFS hbck -fix (0.90.x) hbck –fixAssignments (0.90.7+, 0.92.2+, 0.94+) HBaseCon 2012. 5/22/12 44 Copyright 2012 Cloudera Inc. All rights reserved
  • 45. Region not in META info:regioninfo in META Orphans Region Consistent Assigned on .regioninfo Region server in HDFS hbck –fixAssignments -fixMeta HBaseCon 2012. 5/22/12 45 Copyright 2012 Cloudera Inc. All rights reserved
  • 46. Regioninfo not in HDFS info:regioninfo in META Orphans Region Consistent Assigned on .regioninfo Region server in HDFS hbck –fixAssignments -fixMeta HBaseCon 2012. 5/22/12 46 Copyright 2012 Cloudera Inc. All rights reserved
  • 47. Table Regions must not have holes *‘ ‘,A) • Where to I put row key “CRUD”? [A,B) [B, C) • Where is region [C,D)? ? [D, E) • Repair: [E, F) • Find the orphan and adopt it. [F, G) *G, ‘ ‘) • Fabricate a new region to fill the hole # NOTE! HBase should be idle (no get/put/split/compacts) hbck –fixHdfsHoles –fixHdfsOrphans –fixAssignments -fixMeta HBaseCon 2012. 5/22/12 47 Copyright 2012 Cloudera Inc. All rights reserved
  • 48. Table Regions must not overlap *‘ ‘,A) • Hm.. Which region should [A,B) ? ? “BAD” go? [B, D) [B,C) [C, D) • Is it [B, D) or is it [B,C)? [D, E) • Likely due to a bad split. [E, F) • Repair: [F, G) *G, ‘ ‘) • Merge regions or, • Sideline and bulk load # NOTE! HBase should be idle (no get/put/split/compacts) hbck –fixHdfsOverlaps –fixAssignments -fixMeta HBaseCon 2012. 5/22/12 48 Copyright 2012 Cloudera Inc. All rights reserved
  • 49. Consistency problem summary info:regioninfo in META Orphans Region Consistent Assigned on .regioninfo Region server in HDFS hbck –fixAssignments –fixMeta –fixHdfsHoles –fixHdfsOrphans – fixHdfsOverlaps HBaseCon 2012. 5/22/12 49 Copyright 2012 Cloudera Inc. All rights reserved
  • 50. Investigating further • HFile – examine contents of HFiles • Hlog – examine contents of HLog file • OfflineMetaRepair – Rebuild meta table from file system. • Also, some scripts for manual repairs: https://github.com/jmhsieh/hbase-repair-scripts HBaseCon 2012. 5/22/12 50 Copyright 2012 Cloudera Inc. All rights reserved
  • 51. Outline • Preventative HBase Medicine: • Tips for a healthy HBase • The HBase Triage: • Fixes for acute HBase pains • The HBase Surgery: • Repairing a Corrupted HBase HBaseCon 2012. 5/22/12 51 Copyright 2012 Cloudera Inc. All rights reserved
  • 52. Questions? HBaseCon 2012. 5/22/12 52 Copyright 2012 Cloudera Inc. All rights reserved

Notas do Editor

  1. We get a lot of complaints about HBase. Here are some classic objections from real conversations:“I thought this was a scalable system!”“A process should never crash...”“Randomly!”“When it’s running on a node that swaps!”“(but only sometimes)”“When it can’t connect to another node!”“When it’s sharing a node with misbehaving processes”“When it’s colocated with MapReduce”EVER!“I thought Hadoop was meant to tolerate failures!”“HBASE CRASHES TOO MUCH!”“HBASE IS UNSTABLE!”I’m a big fan of Bloom County, and since Cloudera is a startup that sells its services and subscriptions to more mature companies, I often feel like Binkley trying to get his father to stop smoking. When people stand up HBase they often take the default configurations, or do what feels good, like allocate lots of MR slots per server running Hbase, causing Hbase to get sick. It’s important to sit down and discuss the recommendations.
  2. Because we support a lot of Hbase clusters that get really unhealthy, we have the luxury of having taken some full color photos of unhealthy Hbase clusters. Common complaints, from real support tickets:“Why Did HBase fall down and go boom?”“en masse failure of HBase Region Servers”“Region Server Massive Failure”“Problems with Zookeeper and/or regionserver”“Regionservers becoming unavailable”“Hbase region servers crashing”“Hbase fail-over/reliability issues”“Yesterday Hbase daemons started crashing”“hbase RS keep crashing”“Hbaseregionservers shut down running large mapreduce jobs”“Hbase region server shut down one by one”Etc.
  3. HBase has two requirements: strong consistency and low latency access.HBase is a distributed, scalable fault-tolerent system with dependencies on two other distributed, scalable, fault tolerant systems: HDFS and ZK.It’s the complexity of one distributed system depending on two other distributed systems in the light of strong consistency and low-latency access that leads to the perceived instability of Hbase. When a connection to or integration with either HDFS or ZK becomes unreliable, HBase does not trust that it can meet its requirements and crashes rather than serve old data or with bad latency.You also have a stack of dependencies: JVM and OS depends on the network/disk. HDFS and ZK depend on JVM and the OS. Hbase depends on ZK and HDFS. MR and the Hbase application require HBase being available. And this whole thing is a distributed system, so the dependencies have to be met between and across nodes as well.
  4. Is this the right position?
  5. First symptom: MRFailed tasksBlacklisted tasktrackersLong execution timeUltimate failure of the jobDue to Hbase:Dead region servers!Due to HDFS:Lease Expiry.Due to Zookeeper:Connection timeouts
  6. First symptom: MRFailed tasksBlacklisted tasktrackersLong execution timeUltimate failure of the jobDue to Hbase:Dead region servers!Due to HDFS:Lease Expiry.Due to Zookeeper:Connection timeouts
  7. When the preventative measures mentioned by Jeff are not taken, your cluster might land up in the ER with a critical production failure.
  8. This pie chart is a product from analyzing critical production Hbase tickets over the past 6 months: misconfig 44%, patch 12%,hw/nw 16%, repair 28%. Meaning that correcting a misconfig was all that it took to bring Hbase back up again. As you can see, misconfigurations and bugs break the most HBase clusters. Fixing bugs is up to the community. Fixing misconfigurations is up to you and the focus of the next segment. Because it’s hard to diagnose, misconfigurations are not what you want to spend your time on.If your cluster is broken, it’s probably a misconfiguration. This is a hard problem becausethe error messages are not tightly tied to the root cause.
  9. Unlike a patient, a cluster can’t tell us what’s wrong. Therefore, when support tickets come in the first things we generally ask for are logs and configurations. These are what we look at for evidence and symptoms. Given that Hbase depends on both ZK and HDFS, a ZK or HDFS misconfig can cause erratic behavior in Hbase. Thus we need to look at those logs too.
  10. Just as an ER triage doctor will treat patients based on their symptoms’ severity, we do the same with clusters.
  11. HConnectionManager.deleteConnection(config,true);
  12. A datanode has an upper bound on the number of files that it will serve at any one time, called xcievers. By increasing dfs.datanode.max.xcievers, you are increasing the max limit of transfer threads allowed at a DN. HBase, in particular, needs the extra transfer threads since its use of HDFS is very different from how it&apos;s used by MR. In HBase, data files are opened on cluster startup and kept open so that we avoid paying the file open costs on each access. This can lead to HDFS running out of file descriptors and datanode threads. Although there will be exceptions in logs, often you&apos;ll first see erratic behavior in HBase.
  13. This is a deceiving error message since in some cases, it’s not caused by a longgc pause.In RSlog:INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, sessionTimeout=180000Seeing a different value in the ZKlog: INFO org.apache.zookeeper.server.ZooKeeperServer: Expiring session &lt;id&gt;, timeout of 40000ms exceededThis needs to be set to the same value on the ZK servers as what is used by the HBase clients. Then restart ZK and Hbase.
  14. @tlipcon if sum(max heap size) &gt; physical RAM - 3GB, go directly to jail. do not pass go. do not collect $200.TT counted twice because it forks.JT not accounted for because it runs on a separate node.Hadoop streaming jobs don&apos;t pay attention to heap size specified in mapred.child.java.opts because they spawn processes. That&apos;s where ulimit takes over. OS needs 20% of heap due to optimizations provided by filesystem cache for HBase. MR jobs have a way of completely clearing out FS cache which can impact HBase performance.
  15. It is not good to raise dfs.replication.min from its default of 1, because the moment you runinto a DFS replication issue or a single failure in pipeline, yourfile.close() is going to hang to guarantee minimumreplication.As a result, this form of hard-guarantee canbring down clusters of HBase during high xceiver load on DN, or disk fill-ups on many of them.
  16. Given the many cluster resources leveraged by distributed ZooKeeper, it&apos;s frequently the first to notice issues affecting cluster health, which explains its moniker,the canary in the Hadoop coal mine.
  17. The more members an ensemble has, the more tolerant the ensemble is of host failures.ZK achieves high-availability through replication, and can provide a service as long as a majority of the machines in the ensemble are up - this is why usually there are an odd number of machines in an ensemble.
  18. Because it’s hard to diagnose, misconfigurations are not what you want to spend your time on.If your cluster is broken, it’s probably a misconfiguration. This is a hard problem becausethe error messages are not tightly tied, as you saw, to the root cause. Keeping in mind these six common misconfigurations, it’ll be less painful when you configure your Hbase cluster. But if the pain persists, you know who to call.