SlideShare a Scribd company logo
1 of 23
Download to read offline
© Hortonworks Inc. 2014
HBase Backup and Restore
Vladimir Rodionov
Ted Yu
Page 1
© Hortonworks Inc. 2014
About the authors
• Ted Yu:
• Been working on HBase for over 6 years
• HBase committer / PMC
• Senior Staff Engineer at Hortonworks
• Vladimir:
• active contributor to hbase (over 100 HBase JIRAs)
• completed most of the backup work based on IBM’s initial contribution
• Senior Staff Engineer at Hortonworks
Page 2
© Hortonworks Inc. 2014
HBase Backup – Why We Need It
• Database needs disaster recovery tool
• Previously users can perform snapshot
• However, execution cost for snapshot may be high – flush across region
servers is involved
• There was no incremental snapshot – whole dataset is captured by
snapshot
• Incremental backup doesn’t involve flushing, making continuous backup
possible
Page 3
© Hortonworks Inc. 2014
Brief History of Backup / Restore work
• Started by engineers at IBM – see HBASE-7912
• Initial design included backup manifest
• Vladimir / Ted picked up the work last year
• Vladimir rendered many iterations of patches for phase 2 work (see
HBASE-14123)
• Due to feedback from community, the design has gone thru major
changes
• Mostly tested by developers and QA engineers so far
Page 4
© Hortonworks Inc. 2014
HBase Backup Types
• Full backup – foundation for incremental backups
• Incremental backup – can be periodic to capture changes over time
• Supports table level backup
Page 5
© Hortonworks Inc. 2014
Required Configuration
• Set hbase.backup.enable to true
• BackupLogCleaner for
hbase.master.logcleaner.plugins
• LogRollMasterProcedureManager for
hbase.procedure.master.classes
• LogRollRegionServerProcedureManager for
hbase.procedure.regionserver.classes
• Backup may get stuck if not configured properly
Page 6
© Hortonworks Inc. 2014
Backup Strategy
• Intra-cluster backup is appropriate for testing
Page 7
© Hortonworks Inc. 2014
Backup Strategy: Dedicated HDFS Cluster
• backup on a separate HDFS archive cluster
Page 8
© Hortonworks Inc. 2014
Backup Strategy: Cloud or a Storage Vendor
• vendor can be a public cloud provider or a storage vendor who uses a
Hadoop compatible file system
Page 9
© Hortonworks Inc. 2014
Best Practices for Backup-and-Restore
• Secure a full backup image first
• Formulate a restore strategy and test it
• Define and use backup sets for groups of tables that are logical subsets
of the entire dataset
• Document the backup-and-restore strategy, and ideally log information
about each backup
Page 10
© Hortonworks Inc. 2014
Creating/Maintaining Backup Image
• Run the following command as hbase superuser:
• hbase backup create {{ full | incremental }
{backup_root_path} {[-t tables] | [-set
backup_set_name]}} [[-silent] | [-w
number_of_workers] | [-b bandwidth_per_worker]]
Page 11
© Hortonworks Inc. 2014
Using Backup Sets
• Reduces the amount of repetitive input of table names.
• “hbase backup set add” command.
• You can have multiple backup sets
• Backup set can be used in the “hbase backup create” or “hbase backup
restore” commands
Page 12
© Hortonworks Inc. 2014
Restoring a Backup Image
• You can only restore on a live HBase cluster
• Run the following command as hbase superuser
• hbase restore {[-set backup_set_name] | [backup_root_path] | [backupId]
| -t [tables]} [-m [table_mapping] | [-overwrite] | [-check]]
• hbase restore /tmp/backup_incremental backupId_1467823988425 -t
mytable1,mytable2 -overwrite
Page 13
© Hortonworks Inc. 2014
Backup table
• Backup table will keep track of all backup sessions
–Write/Read backup session state
–Write/Read backup session progress (per region server).
–Stores last backed up WAL file timestamp (per region server).
–Stores list of all backed up WAL files (for BackupLogCleaner )
–Stores backup sets
• Must be backed up and restored separately from other tables
• Information needed for restore is on hdfs
Page 14
© Hortonworks Inc. 2014
Incremental backups
• Use Write Ahead Logs (WALs) to capture the data changes since the
previous backup
• Log roll is executed across all RegionServers
• All the WAL files from incremental backups between the last full backup
and the incremental backup are converted to HFiles
• A process similar to the DistCp tool is used to move the source backup
files to the target file system
Page 15
© Hortonworks Inc. 2014
Filter WALs on backup to only include relevant edits
• Suppose incremental backup request is for table t, all the tables already
registered in a backup system, T, are union’ed with t
• For every table K in the union:
1. Convert new WAL files into HFile applying table filter for K
2. Move these HFile(s) to backup destination
Page 16
© Hortonworks Inc. 2014
Restore
• The full backup is restored from the full backup image.
• HFileSplitter job will collect all HFile(s), split them into new region
boundaries
• HBase Bulk Load utility is invoked by restore to import the HFiles as
restored data in the table.
Page 17
© Hortonworks Inc. 2014
Backup Manifest
• Backup image has the following:
• Backup Id, Backup Type, Backup Rootdir, Table List, start timestamp,
completion timestamp
• Mapping between region server and last recorded WAL timestamp
• Backup image keeps lineage of all previously created backup images
(ancestors)
• When backup image list covers the image being considered, it is
removed from restore
• See message BackupImage in Backup.proto
Page 18
© Hortonworks Inc. 2014
Bulk load support
• Bulk loaded Hfiles are recorded in backup table at the end of bulk load,
thru preCommitStoreFile() hook
• During incremental backup, these Hfiles are copied to backup
destination
• During restore, these Hfiles are loaded into target table
Page 19
© Hortonworks Inc. 2014
Limitations of the Backup-Restore
• Only one active backup session is supported.
• Both backup and restore can’t be canceled while in progress. (HBASE-
15997,15998)
• Single backup destination only is supported. HBASE-15476
• There is no merge for incremental images (HBASE-14135)
• Only superuser (hbase) is allowed to perform backup/restore
Page 20
© Hortonworks Inc. 2014
Credit
• Richard Ding
• Vladimir Rodionov
Page 21
© Hortonworks Inc. 2014
Q/A
Page 22
© Hortonworks Inc. 2014
Thank you.
Page 23

More Related Content

More from HBaseCon

HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon
 
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon
 

More from HBaseCon (20)

hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
 
HBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at Xiaomi
 
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

hbaseconasia2017: Backup / Restore feature in HBase

  • 1. © Hortonworks Inc. 2014 HBase Backup and Restore Vladimir Rodionov Ted Yu Page 1
  • 2. © Hortonworks Inc. 2014 About the authors • Ted Yu: • Been working on HBase for over 6 years • HBase committer / PMC • Senior Staff Engineer at Hortonworks • Vladimir: • active contributor to hbase (over 100 HBase JIRAs) • completed most of the backup work based on IBM’s initial contribution • Senior Staff Engineer at Hortonworks Page 2
  • 3. © Hortonworks Inc. 2014 HBase Backup – Why We Need It • Database needs disaster recovery tool • Previously users can perform snapshot • However, execution cost for snapshot may be high – flush across region servers is involved • There was no incremental snapshot – whole dataset is captured by snapshot • Incremental backup doesn’t involve flushing, making continuous backup possible Page 3
  • 4. © Hortonworks Inc. 2014 Brief History of Backup / Restore work • Started by engineers at IBM – see HBASE-7912 • Initial design included backup manifest • Vladimir / Ted picked up the work last year • Vladimir rendered many iterations of patches for phase 2 work (see HBASE-14123) • Due to feedback from community, the design has gone thru major changes • Mostly tested by developers and QA engineers so far Page 4
  • 5. © Hortonworks Inc. 2014 HBase Backup Types • Full backup – foundation for incremental backups • Incremental backup – can be periodic to capture changes over time • Supports table level backup Page 5
  • 6. © Hortonworks Inc. 2014 Required Configuration • Set hbase.backup.enable to true • BackupLogCleaner for hbase.master.logcleaner.plugins • LogRollMasterProcedureManager for hbase.procedure.master.classes • LogRollRegionServerProcedureManager for hbase.procedure.regionserver.classes • Backup may get stuck if not configured properly Page 6
  • 7. © Hortonworks Inc. 2014 Backup Strategy • Intra-cluster backup is appropriate for testing Page 7
  • 8. © Hortonworks Inc. 2014 Backup Strategy: Dedicated HDFS Cluster • backup on a separate HDFS archive cluster Page 8
  • 9. © Hortonworks Inc. 2014 Backup Strategy: Cloud or a Storage Vendor • vendor can be a public cloud provider or a storage vendor who uses a Hadoop compatible file system Page 9
  • 10. © Hortonworks Inc. 2014 Best Practices for Backup-and-Restore • Secure a full backup image first • Formulate a restore strategy and test it • Define and use backup sets for groups of tables that are logical subsets of the entire dataset • Document the backup-and-restore strategy, and ideally log information about each backup Page 10
  • 11. © Hortonworks Inc. 2014 Creating/Maintaining Backup Image • Run the following command as hbase superuser: • hbase backup create {{ full | incremental } {backup_root_path} {[-t tables] | [-set backup_set_name]}} [[-silent] | [-w number_of_workers] | [-b bandwidth_per_worker]] Page 11
  • 12. © Hortonworks Inc. 2014 Using Backup Sets • Reduces the amount of repetitive input of table names. • “hbase backup set add” command. • You can have multiple backup sets • Backup set can be used in the “hbase backup create” or “hbase backup restore” commands Page 12
  • 13. © Hortonworks Inc. 2014 Restoring a Backup Image • You can only restore on a live HBase cluster • Run the following command as hbase superuser • hbase restore {[-set backup_set_name] | [backup_root_path] | [backupId] | -t [tables]} [-m [table_mapping] | [-overwrite] | [-check]] • hbase restore /tmp/backup_incremental backupId_1467823988425 -t mytable1,mytable2 -overwrite Page 13
  • 14. © Hortonworks Inc. 2014 Backup table • Backup table will keep track of all backup sessions –Write/Read backup session state –Write/Read backup session progress (per region server). –Stores last backed up WAL file timestamp (per region server). –Stores list of all backed up WAL files (for BackupLogCleaner ) –Stores backup sets • Must be backed up and restored separately from other tables • Information needed for restore is on hdfs Page 14
  • 15. © Hortonworks Inc. 2014 Incremental backups • Use Write Ahead Logs (WALs) to capture the data changes since the previous backup • Log roll is executed across all RegionServers • All the WAL files from incremental backups between the last full backup and the incremental backup are converted to HFiles • A process similar to the DistCp tool is used to move the source backup files to the target file system Page 15
  • 16. © Hortonworks Inc. 2014 Filter WALs on backup to only include relevant edits • Suppose incremental backup request is for table t, all the tables already registered in a backup system, T, are union’ed with t • For every table K in the union: 1. Convert new WAL files into HFile applying table filter for K 2. Move these HFile(s) to backup destination Page 16
  • 17. © Hortonworks Inc. 2014 Restore • The full backup is restored from the full backup image. • HFileSplitter job will collect all HFile(s), split them into new region boundaries • HBase Bulk Load utility is invoked by restore to import the HFiles as restored data in the table. Page 17
  • 18. © Hortonworks Inc. 2014 Backup Manifest • Backup image has the following: • Backup Id, Backup Type, Backup Rootdir, Table List, start timestamp, completion timestamp • Mapping between region server and last recorded WAL timestamp • Backup image keeps lineage of all previously created backup images (ancestors) • When backup image list covers the image being considered, it is removed from restore • See message BackupImage in Backup.proto Page 18
  • 19. © Hortonworks Inc. 2014 Bulk load support • Bulk loaded Hfiles are recorded in backup table at the end of bulk load, thru preCommitStoreFile() hook • During incremental backup, these Hfiles are copied to backup destination • During restore, these Hfiles are loaded into target table Page 19
  • 20. © Hortonworks Inc. 2014 Limitations of the Backup-Restore • Only one active backup session is supported. • Both backup and restore can’t be canceled while in progress. (HBASE- 15997,15998) • Single backup destination only is supported. HBASE-15476 • There is no merge for incremental images (HBASE-14135) • Only superuser (hbase) is allowed to perform backup/restore Page 20
  • 21. © Hortonworks Inc. 2014 Credit • Richard Ding • Vladimir Rodionov Page 21
  • 22. © Hortonworks Inc. 2014 Q/A Page 22
  • 23. © Hortonworks Inc. 2014 Thank you. Page 23