The document discusses HBCK2, a tool for fixing issues in HBase 2. Some key points:
1. HBCK2 is simpler than HBCK1, with fewer fix commands and no diagnosis commands. It requires a deeper understanding of HBase internals.
2. HBCK2 commands are master-oriented and fix issues one at a time. Common issues include regions not online, stuck procedures, and tables in the wrong state.
3. Recipes are provided to fix specific issues like missing meta regions or regions in transition using HBCK2 commands like assigns and bypass.
4. HBCK2 is still a work in progress but contributions are welcome
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
Ā
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HBase 2
1.
2. HBCK2: Concepts, trends and recipes
for fixing issues within HBase 2
Wellington Chevreuil
HBase Committer Cloudera HBase SW Engineer
3. HBCK (1) - Little bit of history
ā¢ Main tool for general inconsistencies in hbase-1.x
ā¢ The Swiss Knife for operators
ā¢ Packaged together with hbase main project
ā¢ Provides both diagnosing and ļ¬xing commands
ā¢ Some reports may be misleading, e.g., "holes in the region chain"
ā¢ Some options can cause damages if not well understood, e.g., "-sidelineBigOverlaps", "-removeParents"
ā¢ Commands often work independent of Master
ā¢ Can introduce conļ¬icts on meta information maintained by Master
ā¢ Lack of implementation details on documentation/help guide
5. HBCK2 in a nutshell
ā¢ Simpler tool
ā¢ Less ļ¬x commands
ā¢ No diagnosis command
ā¢ Requires deeper HBase internal workings from operators
ā¢ Shipped independently from hbase
ā¢ Packaged with hbase-operators-tool project
ā¢ https://github.com/apache/hbase-operator-tools
ā¢ Can evolve on its own pace
ā¢ New versions can be run without needing whole hbase upgrade
ā¢ Master oriented (more later)
ā¢ More detailed documentation about each command
ā¢ Still a WIP
ā¢ By the time of this presentation, there's still no oļ¬cial release for HBCK2
6. HBCK2 Concepts
ā¢ AMv2 compliant
ā¢ HBCK1 does not work with HBase 2 AssignmentManager re-implementation
ā¢ Thinner, but more interactive commands
ā¢ No such thing as hbck1 -ļ¬x command
ā¢ Operators required to ļ¬x an issue at a time
ā¢ Master oriented
ā¢ Master must be online
ā¢ Commands implementation should use Master HbckService as much as possible
ā¢ However, new commands may initially require a client side implementation, then get ported to Master's
HbckService facade
ā¢ Fix only, requires other tools for issue diagnosing
ā¢ Available only for 2.0.3 onwards, and 2.1.1 onwards
8. HBCK2 Usage trends
ā¢ Master not completing initialisation
ā¢ Meta/Namespace table "NOT online" issues
ā¢ Table RIT issues
ā¢ Procedures stuck
ā¢ Table in wrong state
ā¢ Missing regions in META
ā¢ User induced via incompatible Oļ¬ineMetaRepair tool
9. HBCK2 for Operators: How do I get and run it?
ā¢ Not released so far, requires local build
ā¢ Requirements
ā¢ JDK 1.8 or higher
ā¢ Git
ā¢ Maven
ā¢ Checkout related apache github repository:
ā¢ $ git clone https://github.com/apache/hbase-operator-tools.git
ā¢ Build HBCK2 upon desired hbase version:
ā¢ $ mvn -Dhbase.version=2.1.5 clean install
ā¢ Above command will produce HBCK2 jar ļ¬le under ./hbase-hbck2/target/, named
hbase-hbck2-1.0.0-SNAPSNOT.jar (assuming current version is 1.0.0-SNAPSHOT)
ā¢ Upload generated jar to the given hbase cluster and run it as below:
ā¢ $ hbase hbck -j ../hbase-hbck2-1.0.0-SNAPSHOT.jar
10. HBCK2 for Operators: Recipes
ā¢ Meta/Namespace table regions "NOT online"
ā¢ Due to corruption or manual deletion of /hbase/MasterProcWALs ļ¬les
ā¢ Meta may miss info about RS assignment
ā¢ Master logs show regions assigned to an old RS start code
ā¢ Run HBCK2 assigns command for META region 1588230740:
ā¢ $ hbase hbck -j ../hbase-hbck2-1.0.0-SNAPSHOT.jar assigns 1588230740
ā¢ Similar issue may affect namespace and user tables regions
ā¢ Affected regions names would be mentioned on log messages similar to above
WARN org.apache.hadoop.hbase.master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPENING,
ts=1550754721289, server=regionserver01.example.com,16020,1550676598448}; ServerCrashProcedures=true. Master startup cannot
progress, in holding-pattern until region onlined.
11. HBCK2 for Operators: Recipes
ā¢ Table RIT issues
ā¢ Usually, due several RSes crashes/slowness while regions are transitioning
ā¢ Run HBCK2 assigns command for the given region encoded name
11bf6b18ddacdd864728e6cf1199b2a7:
ā¢ $ hbase hbck -j ../hbase-hbck2-1.0.0-SNAPSHOT.jar assigns 11bf6b18ddacdd864728e6cf1199b2a7
WARN org.apache.hadoop.hbase.master.assignment.AssignmentManager: STUCK Region-In-Transition rit=OPENING,
location=regionserver01.example.com,16020,1542314816394, table=hbase:acl, region=11bf6b18ddacdd864728e6cf1199b2a7
...
WARN org.apache.hadoop.hbase.ipc.RpcServer: Dropping timed out call: callId: 702 service: ClientService methodName: Mutate size: 272
connection: 1.1.1.1:56492 deadline: 1542316740911 param: region= hbase:meta,,1,
row=hbase:acl,,1404406671604.11bf6b18ddacdd864728e6cf1199b2a7. connection: 1.1.1.1:56492
12. HBCK2 for Operators: Recipes
ā¢ Procedures stuck
ā¢ While troubleshooting causes for RITs, check for procedures attempting to transition regions states:
ā¢ $ echo "list_procedures" | hbase shell
ā¢ Output for list_procedures shows WAITING_TIMEOUT and/or procedures running for
days
ā¢ Other procedures fail to acquire lock owned by one of the stuck procedures:
ā¢ Run HBCK2 bypass command to get rid of stuck procedures:
ā¢ $ hbase hbck -j ../hbase-hbck2-1.0.0-SNAPSHOT.jar bypass 6 7hbase hbck -j ../hbase-hbck2-1.0.0-SNAPSHOT.jar bypass 6 7
PID Name State Submitted Last_Update Parameters
6 org.apache.hadoop.hbase.master.assignment.UnassignProcedure WAITING_TIMEOUT 2019-03-29 11:15:06 2019-04-08 06:33:35 ...
7 org.apache.hadoop.hbase.master.procedure.DeleteTableProcedure RUNNABLE 2019-03-29 11:24:39 2019-03-29 11:24:39 ...
ERROR: org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: f7910bfc9c9... owned by pid=6, CANNOT run 'this' (pid=347).
13. HBCK2 for Operators: Recipes
ā¢ Table in wrong state
ā¢ Can happen after hanging enable/disable table procedures, or related sub-procedures
ā¢ Bypassing procedures can lead to this as well
ā¢ Table indeļ¬nitely in temporary states ENABLING/DISABLING
ā¢ scan 'hbase:meta', {COLUMN => "table:state"}
ā¢ enable 'usertable'
ā¢ Run HBCK2 setTableState to manually bring table state to one of the ļ¬nal ones
ENABLED/DISABLED:
ā¢ $ hbase hbck -j ../hbase-hbck2-1.0.0-SNAPSHOT.jar setTableState usertable DISABLEDbase hbck -j ../hbase-hbck2-1.0.0-SNAPSHOT.jar bypass 6 7
usertable column=table:state, timestamp=1555406568751, value=x08x03.
ERROR: Table tableName=usertable, state=ENABLING should be disabled!
14. HBCK2 for Operators: Recipes
ā¢ Missing regions in META
ā¢ Operator induced when running incompatible tool Oļ¬ineMetaRepair (HBASE-21665)
ā¢ Typically manifests as holes on the region chain, or in the case of namespace region missing, master fails
initialisation
ā¢ scan 'hbase:meta', {COLUMN => "table:state", ROWPREFIXFILTER => 'hbase:namespace'}
ā¢ Still under development through HBASE-22567, HBCK2 addMissingRegionsInMeta
can be used to re-add missing regions:
ā¢ $ hbase hbck -j ../hbase-hbck2-1.0.0-SNAPSHOT.jar addMissingRegionInMeta hbase:namespace
ā¢ Still WIP, so syntax might change.
ā¢ Check HBASE-22567 for latest developments
e-hbck2-1.0.0-SNAPSHOT.jar bypass 6 7
ROW COLUMN+CELL
0 row(s)
15. HBCK2 for Contributors
ā¢ Apache github repository: https://github.com/apache/hbase-operator-tools
ā¢ HBCK2 deļ¬ned as sub-module hbase-hbck2 of hbase-operator-tools
ā¢ HBASE-21745
ā¢ Umbrella jira for tracking potential new HBCK2 features
ā¢ Faced a new issue in HBase 2? Have a new idea for HBCK2 command?
ā¢ Great! Contributions are welcome!
ā¢ Start a [DISCUSS] mail thread on dev@hbase.apache.org
ā¢ Post a comment on HBASE-21745 describing your idea
e-hbck2-1.0.0-SNAPSHOT.jar bypass 6 7