The document discusses backup strategies for HBase at IBM BigInsights and Salesforce. It describes:
1) IBM's backups use HBase snapshots and write-ahead logs to enable full and incremental backups. Their restore process involves replaying write-ahead logs.
2) Salesforce performs table-level MapReduce jobs to backup tables, storing backups as file chunks with manifests. They validate backups by restoring and comparing against live data.
3) Both companies aim to optimize for fast restores, with IBM focusing on tools and Salesforce on validation after restoring from backups.
5. Backup Solution - IBM
• Customer Requirements
• Feature Overview
• Technical Design
• User Interface: CLI and Web UI
• Data Structures
5 HBase Backups - HBaseCon 2014
6. Customer Requirements
• Backup and Restore
– Critical requirements from enterprise customers
– General solution
– Easy-to-use user interfaces: CLI and Web UI
– Multiple file systems: HDFS and GPFS*
– Multiple MR frameworks: Hadoop and PSMR*
6 HBase Backups - HBaseCon 2014
*GPFS: IBM General Parallel File System
*PSMR: Platform Symphony MapReduce
7. Feature Overview
• Full Backup based on HBase Snapshot
• Incremental Backup based on HBase transaction logs
• Table-level Incremental Backup
• Point-In-Time Restore
• On-the-fly and Off-line Convert from HLogs to HFiles
• Off-line Merge Backup Images
• Self-contained Backup Image with Manifest File
• Usability features:
– progress, status, and history reports
– purge old Backup Images
7 HBase Backups - HBaseCon 2014
16. User Interface - CLI
$ hbase backup help
Usage: hbase backup COMMAND
where COMMAND is one of:
create create a new backup
cancel cancel an ongoing backup
delete delete an existing backup
describe show the detailed information of a backup
history show history of all successful backups
status show the status of the latest backup request
convert convert incremental backup WAL files into HFiles
merge merge backup images
stop remove table(s) from backup table set
show show table(s) in backup table set
Enter 'help COMMAND' to see help message for each command
16 HBase Backups - HBaseCon 2014
18. User Interface – Web UI Restore
18 HBase Backups - HBaseCon 2014
19. Data Structure - Backup Image
• Table Info and Region Info
• Backup Manifest
– Table Name
– Type: Full or Incremental
– Size
– Timestamp Info
– State Info: Converted, Merged, Compacted, etc.
– Dependency Lineage
• Data
– HFiles
– WALs (For Incremental Backup before convert)
19 HBase Backups - HBaseCon 2014
20. Data Structure - ZooKeeper/backup/hbase
startcode {backup marker}
complete/
backupId_1 {contains backup metadata}
……
backupId_n
ongoing {contains the progress status of the current operation}
failed {contains error code and message of the current operation}
cancel {triggers a cancel operation }
incr/
tablelogtimestamp/
table_1 {list of region servers and associated log timestamp for this table}
……
table_n
last-roll-log-ts/
rs_1 {contains the log timestamp from last roll log}
……
rs_n
20 HBase Backups - HBaseCon 2014
21. HBase Backups - HBaseCon 2014
Sincere gratitude is hereby extended to the following
developers who contributed to this effort:
Richard Ding, Jing Chen He, Enoch Hsu, Yu Li, Jihong Ma,
Demai Ni, Kan Zhang, Liping Zhang, Xiang Zhou
* ordered by last name
21
23. Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements
that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the
results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All
statements other than statements of historical fact could be deemed forward-looking, including any projections of subscriber growth,
earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations,
statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or
use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new
functionality for our service, our new business model, our past operating losses, possible fluctuations in our operating results and rate of
growth, interruptions or delays in our Web hosting, breach of our security measures, risks associated with possible mergers and acquisitions,
the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees
and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com
products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial
results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year ended January 31, 2011. This
document and others are available on the SEC Filings section of the Investor Information section of our Web site.
Any unreleased services or features referenced in this or other press releases or public statements are not currently available and may not be
delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently
available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
23 HBase Backups - HBaseCon 2014
Safe Harbor
24. Salesforce Environment
• Many tenants per cluster
• At least 90 days of recovery
• DR failover to remote DC
• All writes through Phoenix
– Timestamp control
24 HBase Backups - HBaseCon 2014
25. Design Goals
• Validate backups regularly
• Minimize time to restore a tenant
• Validate replication is up to date
• Minimize data storage
25 HBase Backups - HBaseCon 2014
26. Backups
• M/R a table at a given point in time
– Point-in-time view of the table
• Chunked by file size + tenant (per server)
• Chunk manifest
– Chunk info (min/max/hash/tenant ids)
26 HBase Backups - HBaseCon 2014
27. Backups
27 HBase Backups - HBaseCon 2014
Key CF CQ TS Value
user1_a fam qual 14 value10
user1_a fam qual 12 Value5
user1_a fam qual 10 Valu2
user1_a fam qual 8 value4
user1_a fam qual 3 value13
user1_a fam qual 2 value56
1. http://phoenix.incubator.apache.org/
28. Backups
28 HBase Backups - HBaseCon 2014
Some HBase Table
M M M M M M M
Hadoop Distributed File System
29. Backups
• Each backup is an incremental
– Lineage by convention
• Never write too far back in time
• Data retained by custom coprocessor
– Retained up to last successful backup
29 HBase Backups - HBaseCon 2014
30. “Backup isn’t a backup until you’ve restored it
and tested it”
-- Some Ops Guy
30 HBase Backups - HBaseCon 2014
31. Restore + Validation
• Restore each backup to a new table
• Validate that backup has same data a existing
table
– Within backup timerange
• Move ‘retained timestamps’ forward
31 HBase Backups - HBaseCon 2014
32. Restore
32 HBase Backups - HBaseCon 2014
HDFS
/hbase
…
/salesforce
/backup
/somehbasetable
/03/14/14
backup.properties
chunk1
chunk1.manifest
….
chunk1000
chunk1000.manifest
M
M
M
SomeHBaseTable_Restore
33. Restore
• Configurable validation percent
– Start high, move lower
• Backup only valid if restore is successful
33 HBase Backups - HBaseCon 2014
34. 34 HBase Backups - HBaseCon 2014
90 Days of Backup is
LOTS of Data
Even without any duplicates!
35. Granularity Reduction
• Combine backups every ‘period’
– Week, month, 3 months
– Specified in table metadata
• Keep latest version of the row
• Helpful with lots of updates
– Not useful for unique data (e.g. time series)
35 HBase Backups - HBaseCon 2014
40. Validation By Backup
• Validate replication is working
• Validate backup process consistent
• Validate granularity reduction consistent
40 HBase Backups - HBaseCon 2014
41. Validation By Backup
• Build up hash of hashes
– Two level Merkle Tree
• Check that both DCs have the same hash
– Can easily identify differences per-manifest
• Requires time-delay for backups
– <= replication delay
41 HBase Backups - HBaseCon 2014
42. Hash Validation
42 HBase Backups - HBaseCon 2014
Backup Manifest
• chunk size
• start time
• end time
• combined hash
• version
Chunk
Manifest
• key prefix
• stats
• hash
Chunk
Manifest
• key prefix
• stats
• hash
…
Primary Data Center
Backup Manifest
• chunk size
• start time
• end time
• combined hash
• version
Chunk
Manifest
• key prefix
• stats
• hash
Chunk
Manifest
• key prefix
• stats
• hash
…
Buddy Data Center
Mismatch!
43. Tracking Status
• Daily emails
• Progress stored in Phoenix Table
• Easy access for auditing
• Easy display for UI (coming soon)
43 HBase Backups - HBaseCon 2014
44. Future Work
• Extensive tooling around per-tenant restore
• M/R from snapshot
44 HBase Backups - HBaseCon 2014
45. Lessons Learned
• Track Properties
– Version, table, lineage, etc
• Fast Restore is Important
– Consider your business case
• Validation!
45 HBase Backups - HBaseCon 2014
46. Special Thanks
All the members of the Salesforce HBase team,
particularly:
Vasu Mariyala, Sukumar Maddineni, Alex Araujo, Lars
Hofhansl, Ian Varley, Santosh Rau
46 HBase Backups - HBaseCon 2014
47. Summary
• Per-Table Backups
• IBM
– WAL based
– Extra tooling for fast restores
– Extensive lineage tracking
• Salesforce
– M/R over HTable
– Multi-tenant
– Multiple Validation vectors
47 HBase Backups - HBaseCon 2014
48. 48 HBase Backups - HBaseCon 2014
Thanks!
Questions?
Jesse Yates Demai Ni
Jing He Chen
Richard Ding
Notas do Editor
Provides a snapshot of the table from time 11 backwards. Even if we are writing to the table from the client, we won’t see any of those updates.
Caveat of special CPs that ensure we don’t lose data that we haven’t backed up yet (at cost of some extra versions everyday)