SlideShare uma empresa Scribd logo
1 de 61
Apache HBase Table Snapshots
Matteo Bertozzi | Cloudera | Software Engineer / HBase Committer
Jonathan Hsieh | Cloudera | Software Engineer / HBase Committer
Jesse Yates | Salesforce.com | Software Engineer / HBase Committer
June 13, 2013
HBaseCon 2013
Outline
• Intro and Use Cases
• Usage Instructions
• Internals
• Snapshot Layout
• Snapshot Restoration
• Online Snapshots
• Conclusion
HBase Table Snapshots
Snapshot is a collection of
metadata required to
reconstitute the data near
a particular point in time
HBaseCon 2013 6/13/20133
HBase Table Snapshots
• An inexpensive way to freeze state of a
table
• A mechanism that helps backup data to
in the cluster or to a remote cluster
• Recover from user error
• Bootstrap Replication
HBaseCon 2013 6/13/20134
Old: HBase-Supported Batch Backups
• Export / Dist CP / Import
• 3 batch MR jobs
• Several extra copies of data
• High latency (hours)
• Impacts existing low-latency
workloads
• Copy Table
• 1 MR Job
• Single copy of data
• Incremental table copies
• High Latency (hours)
• Impacts existing workloads
Export
MR Job
Import
MR Job
Dist CP
MR Job
Copy Table
MR Job
HBaseCon 2013 6/13/20135
Upcoming: HDFS Snapshots (or DistCP backup)
• Take an hdfs snapshot of all the
files in the underlying HBase’s
data directory.
• Hfiles, hlogs, and other
metadata.
• Snapshots all tables in Hbase
• Cannot Clone tables
• “Restore As”
• Targeted for Hadoop 2.1 /
Hadoop 3.0 DistCP
HLog Append
Flush
Compact
Restart
Recover
HBaseCon 2013 6/13/20136
New: HBase Snapshot-based Backups
• Snapshot, then Export
• 1 MR Job
• Single copy of data
• Little impact on low-latency
workloads
• Export is like distcp directly
from hfds
• No incremental snapshot
copy
HBaseCon 2013
Export
Snapshot
6/13/20137
Export
• Like distcp for a snapshot manifest
• Copy data files without going through HBase’s “front door”
Export
HBaseCon 2013 6/13/20138
Recover from User Error
• How do we recover from user error?
Recovery Time
time
User Error:
drop ‘table’
Service is
restored, major data
loss
Service is down!
Panic! Black magic!
HBaseCon 2013 6/13/20139
Recovering from User Mistakes: Table Snapshots
• Snapshot the state of a table at a certain moment in time
• Restore it or Clone it later, creating a new read write table
• Export it to another cluster with minimal impact on HBase
time
User Error:
drop ‘table’
Service restored, Minor
data loss. Carry on.
Periodic
snapshot
Service is down!
Keep calm!
restore
Periodic
snapshot
HBaseCon 2013 6/13/201310
Usage
What an Admin needs to know
Configuration
• Simple hbase-site.xml configuration
<property>
<name>hbase.snapshot.enabled</name>
<value>true</value>
</property>
• Enabled by default in 0.95+
• Requires user to enable in 0.94.6.1+.
HBaseCon 2013 6/13/201312
Usage: Shell Commands
• snapshot ‘table’, ‘snapshot’
• Table can be offline or online
• list_snapshot [<regex>]
• clone_snapshot ‘snapshot’, ‘dsttable’
• restore_snapshot ‘snapshot’
• delete_snapshot ‘snapshot’
HBaseCon 2013 6/13/201313
Usage: Web UI
HBaseCon 2013 6/13/201314
Usage: Web UI
HBaseCon 2013 6/13/201315
Export: Usage
• Copy “MySnapshot” to a remote HDFS
• $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
MySnapshot -copy-to hdfs:///srv2:8082/hbase -mappers 16
• With permission change on the copy
• $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
MySnapshot -copy-to hdfs:///srv2:8082/hbase -chuser MyUser -chgroup MyGroup
-chmod 700 -mappers 16
HBaseCon 2013 6/13/201316
Debugging and Info
• Dump a snapshot manifest
• Writes to standard out
• Usage
• $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot
test-snapshot
• $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot
test-snapshot -files
HBaseCon 2013 6/13/201317
Metrics
• Histograms of operation completion
• snapshotTime
• cloneTime
• restoreTime
• Includes ‘extended’ metrics
• Std deviation
• Min/max
HBaseCon 2013 6/13/201318
Table Snapshot Internals
Internals
• HBase Table HDFS Layout
• Snapshot HDFS layout
• Offline Snapshots
• Restore and Clone Snapshot
• Online Snapshots
HBaseCon 2013 6/13/201320
Primer: HBase Table Layout in HDFS
• HRegions map directly to a directory structure
with table name, encoded region
name, column family and hfiles.
• In HDFS:
/hbase/Table/<enc R1>/cf/<hfile f11>
/hbase/Table/<enc R1>/cf/<hfile f12>
/hbase/Table/<enc R2>/cf/<hfile f21>
/hbase/Table/<enc R2>/cf/<hfile f22>
/hbase/Table/<enc R3>/cf/<hfile f31>
/hbase/Table/<enc R3>/cf/<hfile f32>
Table
F11 F21 F31
R1 R2 R3
6/13/2013HBaseCon 201321
Table Snapshots in the File System
• A Snapshot manifest contains references to files in the original
table.
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201322
Table Snapshots in the File System
• A Snapshot manifest contains references to files in the original
table.
• Each snapshot is stored in the hbase/.hbase-snapshots dir.
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201323
Offline Snapshots
• Disable table, then create Snapshot Manifest
• Created in temporary dir to guarantee snapshot creation
atomicity
• Includes
• Snapshot Metadata
• Table Metadata/Schema (.tableinfo)
• References to original HFiles
• Master-only file system operation
HBaseCon 2013 6/13/201324
HFile Life Cycle
• Splits and Compactions remove hfiles
• What happens to references to these files?
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201325
HFile Life Cycle
• Splits and Compactions remove hfiles
• What happens to references to these files?
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201326
HFile Life Cycle
• Splits and Compactions remove hfiles
• What happens to references to these files?
./.hbase-snapshots
Table
F11 F21
F31
+32
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
No more
Hfile??
HBaseCon 2013 6/13/201327
HFile Archiver
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
F31
HBaseCon 2013 6/13/201328
• We archive old HFiles from compactions (HBASE-5547)
HFile Archiver
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
F31
+32
Table files
F31
HBaseCon 2013 6/13/201329
• We archive old HFiles from compactions (HBASE-5547)
• Files stored in hbase/.archive
HFile Archiver
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
F31
+32
Table files
F31
• We archive old HFiles from compactions (HBASE-5547)
• Files stored in hbase/.archive
• HFileCleaner ensures HFiles’ data remains available
HBaseCon 2013 6/13/201330
Restore Snapshot Internals
Restore Operations
• Restore table
• Rollback table to specific state
• Clone from snapshot (Restore As)
• Create new read-write table from snapshot
• There can be multiple replicas of a snapshot
• Export snapshot
• Send snapshot and all its data to another cluster
HBaseCon 2013 6/13/201332
Clone: Creating table from a Snapshot
• Convert snapshot manifest info into a Table.
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
Clone
R1 R2 R3
F31
HBaseCon 2013 6/13/201333
Clone: Creating table from a Snapshot
• Convert snapshot manifest info into a Table.
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
Clone
R1 R2 R3
F31
HBaseCon 2013 6/13/201334
Clone: Creating table from a Snapshot
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
F31
+32
Table files
F31
Clone
R1 R2 R3
• Convert snapshot manifest info into a Table.
• HFileLinks (HBASE-6610) to mimic unix open file descriptor semantics
HBaseCon 2013 6/13/201335
Restore: Rollback to an old state
• Rollback the existing table to snapshot state
• Restores original schema if altered
• Snapshots current table, just in case
• Minimal overhead
• Smarter delete table & clone snapshot
• Handles creating/deleting regions
• Restore META
HBaseCon 2013 6/13/201336
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F31
+32 F41
R4
• Rollback “Table” to the “TableSnapshot” state
HBaseCon 2013 6/13/201337
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F31
+32 F41
R4
• Region “R4” is not present in the snapshot
• “R4” will be removed from “Table”, files moved to “.archive”
HBaseCon 2013 6/13/201338
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F31
+32
F41
• New files not present in the snapshots are moved to the archive
HBaseCon 2013 6/13/201339
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F41F3+
32
• New files not present in the snapshots are moved to the archive
• HFileLinks are created to point to old files.
HBaseCon 2013 6/13/201340
Restore failures
• The table to restore is disabled
• META and HDFS operations may fail (network issue, server down, …)
• hbck can’t repair an incomplete restore...
• Restore again!
HBaseCon 2013 6/13/201341
Export Snapshot
• Copy a full snapshot to another cluster
• All required HFiles, and Metadata
• Lots of options
• Fancy dist-cp
• Must resolve HFileLinks
• Faster than CopyTable or table export+import!
• Minimal impact on running cluster
HBaseCon 2013 6/13/201342
Online Snapshots
Online snapshots
• Take a snapshot without making the table
unavailable
• No need to disable the table
• Continue accepting reads and writes from
clients
• Challenges
• Coordinating Region Servers
• Data is in memory
• Consistency
HBaseCon 2013 6/13/201344
Offline vs Online Snapshots
Offline Online
mastermaster
RS1 RS2 RS3 RS4
verify
Snapshot
region
subprocedure
Write
manifest
per region
verify
Write
manifest
per region
6/13/2013HBaseCon 201345
Online Snapshots
• Each Region can have data in memstore and hlog, not yet Hfile
• Snapshot is missing in memory data!
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
F31
mem mem mem
HBaseCon 2013 6/13/201346
Online Snapshots
• Flush so that all in memory data written in an Hfile
• Then add to snapshot manifest
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
Table files
F31
F13 F23 F33
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201347
Online Snapshots
• Flush so that all in memory data in an Hfile
• Then add to snapshot manifest
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
Table files
F31
F13 F23 F33
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201348
Consistency
• Offline Snapshots
• Fully consistent snapshot
• Online Flush Snapshot
• “CopyTable” level consistency with a much smaller window.
• Time bounded by slowest region server and region flush
HBaseCon 2013 6/13/201349
Online Snapshots and Causal consistency
• Causal consistency would only allow A, AB, or neither A nor B.
• B and Not A is currently possible
Table
F11 F21
R1 R2 R3
F31
TableSnapshot manifest
R1 R2 R3
Master RS1 RS2 Client
mem mem
Flush SS
F13
HBaseCon 2013 6/13/201350
Online Snapshots and Causal consistency
• Causal consistency would only allow A, AB, or neither A nor B.
• B and Not A is currently possible
Table
F11 F21
R1 R2 R3
F31
TableSnapshot manifest
R1 R2 R3
Master RS1 RS2 Client
mem mem
Put A …
… then
Put B
F13
mem
HBaseCon 2013 6/13/201351
Online Snapshots and Causal consistency
• Causal consistency would allow A, AB, or neither A nor B.
• B and Not A is possible with Flush Snapshots
Table
F11 F21
R1 R2 R3
F31
TableSnapshot manifest
R1 R2 R3
Master RS1 RS2 Client
mem
F23
F13
Flush SS
Put B is
in but
Put A is
not!
F33
HBaseCon 2013 6/13/201352
Online snapshot attempts can fail
• If involved RS’s fail, the snapshots attempt will fail.
• Needs a way to prevent other table metadata operations
• Table Metadata Locks (0.95+)
• Avoid many snapshot failures conflicts(Ex: Online schema, splits)
• Failed attempt will report errors -- user must retry.
• o.a.h.hbase.snapshot.HBaseSnapshotException
• o.a.h.hbase.snapshot.CorruptedSnapshotException
HBaseCon 2013 6/13/201353
Development Notes
How we collaborated, built, and tested
Table Snapshots Development
• Developed in a Branch off of trunk
merged and in 0.95 and trunk.
• Feature is too big to include as a single
patch
• Does not destabilize trunk
• Does not slow time-based release
trains
• Later Backported to 0.94.6.1
src
branch
Reintegrate
into trunk
sync
HBaseCon 2013 6/13/201355
System testing with Jenkins
• Concurrently load data while taking snapshots
• Inject compactions, Kill RS’s, Meta RS, Master
• Create snapshot clones of the snapshots
• Inject Compactions, Kill RS’s, META Rs, Master
HBaseCon 2013 6/13/201356
Future Work:
• Alternative semantics and implementations
• Log Roll Snapshot (HBASE-7291)
• Store logs and replay on restore
• Faster for snapshot, slower and more complicated for restore.
• Timestamp Snapshot (HBASE-6866)
• All updates before ts in snapshot, all after not in snapshot
• Longer pause before snapshot taken
• Globally-Consistent Snapshot (HBASE-6867)
• global write lock for all regions nodes until snapshot complete.
• Expensive
• Repair tools
• Manual repairs necessary (hbck does not support yet)
HBaseCon 2013 6/13/201357
Conclusions
Feature Summary by Version
Apache 0.92.x
Apache <0.94.6.1
Apache 0.94.6.1+ Apache 0.95.0
Apache 0.96.0
Copy Table Copy Table Copy Table
Import / Export Import / Export Import / Export
Offline snapshots Offline snapshots
Flush Online
Snapshot
Flush Online
Snapshot
Table Locks
HBaseCon 2013 6/13/201359
Key Contributors
• Jesse Yates (Salesforce.com)
• HFileArchiver, Offline Snapshot, first draft online
• Matteo Bertozzi (Cloudera)
• HFileLink, Restore, clone, Testing, 0.94 backport
• Jonathan Hsieh (Cloudera)
• Online Snapshots revamp, Testing, Branch Sheppard
• Ted Yu (HortonWorks)
• Reviews
• Enis Soztutar (HortonWorks)
• Table Locks on Snapshots
HBaseCon 2013 6/13/201360
Thanks! Questions?
Matteo Bertozzi
@th30z
matteo.bertozzi@cloudera.com
Jonathan Hsieh
@jmhsieh
jon@cloudera.com
Jesse Yates
@jesse_yates
jesse.k.yates@gmail.com
HBaseCon 2013 6/13/201361

Mais conteúdo relacionado

Mais procurados

Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxVinay Shukla
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsDataWorks Summit/Hadoop Summit
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive DataWorks Summit
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseCloudera, Inc.
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowDataWorks Summit
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Giuseppe Paterno'
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersDataWorks Summit
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Impala presentation
Impala presentationImpala presentation
Impala presentationtrihug
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 

Mais procurados (20)

HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DMUpgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
Upgrading HDFS to 3.3.0 and deploying RBF in production #LINE_DM
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Hadoop Security Today and Tomorrow
Hadoop Security Today and TomorrowHadoop Security Today and Tomorrow
Hadoop Security Today and Tomorrow
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size Matters
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Impala presentation
Impala presentationImpala presentation
Impala presentation
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 

Destaque

HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseCloudera, Inc.
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache HiveHBaseCon
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsMichael Stack
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storagehive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata StorageDataWorks Summit/Hadoop Summit
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Sematext Group, Inc.
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics Cloudera, Inc.
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterCloudera, Inc.
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...Cloudera, Inc.
 

Destaque (20)

HBase Snapshots
HBase SnapshotsHBase Snapshots
HBase Snapshots
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at Pinterest
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storagehive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...
 

Semelhante a HBaseCon 2013: Apache HBase Table Snapshots

Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 
Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0enissoz
 
[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery[Altibase] 13 backup and recovery
[Altibase] 13 backup and recoveryaltistory
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 ReleaseNick Dimiduk
 
HBase Backups
HBase BackupsHBase Backups
HBase BackupsHBaseCon
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
 
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...Michael Stack
 
Hbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the EnterpriseHbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the EnterpriseSalesforce Engineering
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかToshihiro Suzuki
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxMarco Gralike
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...Michael Stack
 
Less17 flashback tb3
Less17 flashback tb3Less17 flashback tb3
Less17 flashback tb3Imran Ali
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
 
Oracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slidesOracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slidesSaiful
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under controlMarcin Przepiórowski
 
RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)Gustavo Rene Antunez
 

Semelhante a HBaseCon 2013: Apache HBase Table Snapshots (20)

Meet Apache HBase - 2.0
Meet Apache HBase - 2.0Meet Apache HBase - 2.0
Meet Apache HBase - 2.0
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0
 
[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
 
Les 12 fl_db
Les 12 fl_dbLes 12 fl_db
Les 12 fl_db
 
Hbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the EnterpriseHbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the Enterprise
 
Les 11 fl2
Les 11 fl2Les 11 fl2
Les 11 fl2
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
 
Less17 flashback tb3
Less17 flashback tb3Less17 flashback tb3
Less17 flashback tb3
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Oracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slidesOracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slides
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under control
 
RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)
 

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

HBaseCon 2013: Apache HBase Table Snapshots

  • 1. Apache HBase Table Snapshots Matteo Bertozzi | Cloudera | Software Engineer / HBase Committer Jonathan Hsieh | Cloudera | Software Engineer / HBase Committer Jesse Yates | Salesforce.com | Software Engineer / HBase Committer June 13, 2013 HBaseCon 2013
  • 2. Outline • Intro and Use Cases • Usage Instructions • Internals • Snapshot Layout • Snapshot Restoration • Online Snapshots • Conclusion
  • 3. HBase Table Snapshots Snapshot is a collection of metadata required to reconstitute the data near a particular point in time HBaseCon 2013 6/13/20133
  • 4. HBase Table Snapshots • An inexpensive way to freeze state of a table • A mechanism that helps backup data to in the cluster or to a remote cluster • Recover from user error • Bootstrap Replication HBaseCon 2013 6/13/20134
  • 5. Old: HBase-Supported Batch Backups • Export / Dist CP / Import • 3 batch MR jobs • Several extra copies of data • High latency (hours) • Impacts existing low-latency workloads • Copy Table • 1 MR Job • Single copy of data • Incremental table copies • High Latency (hours) • Impacts existing workloads Export MR Job Import MR Job Dist CP MR Job Copy Table MR Job HBaseCon 2013 6/13/20135
  • 6. Upcoming: HDFS Snapshots (or DistCP backup) • Take an hdfs snapshot of all the files in the underlying HBase’s data directory. • Hfiles, hlogs, and other metadata. • Snapshots all tables in Hbase • Cannot Clone tables • “Restore As” • Targeted for Hadoop 2.1 / Hadoop 3.0 DistCP HLog Append Flush Compact Restart Recover HBaseCon 2013 6/13/20136
  • 7. New: HBase Snapshot-based Backups • Snapshot, then Export • 1 MR Job • Single copy of data • Little impact on low-latency workloads • Export is like distcp directly from hfds • No incremental snapshot copy HBaseCon 2013 Export Snapshot 6/13/20137
  • 8. Export • Like distcp for a snapshot manifest • Copy data files without going through HBase’s “front door” Export HBaseCon 2013 6/13/20138
  • 9. Recover from User Error • How do we recover from user error? Recovery Time time User Error: drop ‘table’ Service is restored, major data loss Service is down! Panic! Black magic! HBaseCon 2013 6/13/20139
  • 10. Recovering from User Mistakes: Table Snapshots • Snapshot the state of a table at a certain moment in time • Restore it or Clone it later, creating a new read write table • Export it to another cluster with minimal impact on HBase time User Error: drop ‘table’ Service restored, Minor data loss. Carry on. Periodic snapshot Service is down! Keep calm! restore Periodic snapshot HBaseCon 2013 6/13/201310
  • 11. Usage What an Admin needs to know
  • 12. Configuration • Simple hbase-site.xml configuration <property> <name>hbase.snapshot.enabled</name> <value>true</value> </property> • Enabled by default in 0.95+ • Requires user to enable in 0.94.6.1+. HBaseCon 2013 6/13/201312
  • 13. Usage: Shell Commands • snapshot ‘table’, ‘snapshot’ • Table can be offline or online • list_snapshot [<regex>] • clone_snapshot ‘snapshot’, ‘dsttable’ • restore_snapshot ‘snapshot’ • delete_snapshot ‘snapshot’ HBaseCon 2013 6/13/201313
  • 14. Usage: Web UI HBaseCon 2013 6/13/201314
  • 15. Usage: Web UI HBaseCon 2013 6/13/201315
  • 16. Export: Usage • Copy “MySnapshot” to a remote HDFS • $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/hbase -mappers 16 • With permission change on the copy • $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/hbase -chuser MyUser -chgroup MyGroup -chmod 700 -mappers 16 HBaseCon 2013 6/13/201316
  • 17. Debugging and Info • Dump a snapshot manifest • Writes to standard out • Usage • $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot test-snapshot • $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot test-snapshot -files HBaseCon 2013 6/13/201317
  • 18. Metrics • Histograms of operation completion • snapshotTime • cloneTime • restoreTime • Includes ‘extended’ metrics • Std deviation • Min/max HBaseCon 2013 6/13/201318
  • 20. Internals • HBase Table HDFS Layout • Snapshot HDFS layout • Offline Snapshots • Restore and Clone Snapshot • Online Snapshots HBaseCon 2013 6/13/201320
  • 21. Primer: HBase Table Layout in HDFS • HRegions map directly to a directory structure with table name, encoded region name, column family and hfiles. • In HDFS: /hbase/Table/<enc R1>/cf/<hfile f11> /hbase/Table/<enc R1>/cf/<hfile f12> /hbase/Table/<enc R2>/cf/<hfile f21> /hbase/Table/<enc R2>/cf/<hfile f22> /hbase/Table/<enc R3>/cf/<hfile f31> /hbase/Table/<enc R3>/cf/<hfile f32> Table F11 F21 F31 R1 R2 R3 6/13/2013HBaseCon 201321
  • 22. Table Snapshots in the File System • A Snapshot manifest contains references to files in the original table. ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201322
  • 23. Table Snapshots in the File System • A Snapshot manifest contains references to files in the original table. • Each snapshot is stored in the hbase/.hbase-snapshots dir. ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201323
  • 24. Offline Snapshots • Disable table, then create Snapshot Manifest • Created in temporary dir to guarantee snapshot creation atomicity • Includes • Snapshot Metadata • Table Metadata/Schema (.tableinfo) • References to original HFiles • Master-only file system operation HBaseCon 2013 6/13/201324
  • 25. HFile Life Cycle • Splits and Compactions remove hfiles • What happens to references to these files? ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201325
  • 26. HFile Life Cycle • Splits and Compactions remove hfiles • What happens to references to these files? ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201326
  • 27. HFile Life Cycle • Splits and Compactions remove hfiles • What happens to references to these files? ./.hbase-snapshots Table F11 F21 F31 +32 R1 R2 R3 TableSnapshot manifest R1 R2 R3 No more Hfile?? HBaseCon 2013 6/13/201327
  • 28. HFile Archiver ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files F31 HBaseCon 2013 6/13/201328 • We archive old HFiles from compactions (HBASE-5547)
  • 29. HFile Archiver ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 F31 +32 Table files F31 HBaseCon 2013 6/13/201329 • We archive old HFiles from compactions (HBASE-5547) • Files stored in hbase/.archive
  • 30. HFile Archiver ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 F31 +32 Table files F31 • We archive old HFiles from compactions (HBASE-5547) • Files stored in hbase/.archive • HFileCleaner ensures HFiles’ data remains available HBaseCon 2013 6/13/201330
  • 32. Restore Operations • Restore table • Rollback table to specific state • Clone from snapshot (Restore As) • Create new read-write table from snapshot • There can be multiple replicas of a snapshot • Export snapshot • Send snapshot and all its data to another cluster HBaseCon 2013 6/13/201332
  • 33. Clone: Creating table from a Snapshot • Convert snapshot manifest info into a Table. ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files Clone R1 R2 R3 F31 HBaseCon 2013 6/13/201333
  • 34. Clone: Creating table from a Snapshot • Convert snapshot manifest info into a Table. ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files Clone R1 R2 R3 F31 HBaseCon 2013 6/13/201334
  • 35. Clone: Creating table from a Snapshot ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 F31 +32 Table files F31 Clone R1 R2 R3 • Convert snapshot manifest info into a Table. • HFileLinks (HBASE-6610) to mimic unix open file descriptor semantics HBaseCon 2013 6/13/201335
  • 36. Restore: Rollback to an old state • Rollback the existing table to snapshot state • Restores original schema if altered • Snapshots current table, just in case • Minimal overhead • Smarter delete table & clone snapshot • Handles creating/deleting regions • Restore META HBaseCon 2013 6/13/201336
  • 37. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F31 +32 F41 R4 • Rollback “Table” to the “TableSnapshot” state HBaseCon 2013 6/13/201337
  • 38. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F31 +32 F41 R4 • Region “R4” is not present in the snapshot • “R4” will be removed from “Table”, files moved to “.archive” HBaseCon 2013 6/13/201338
  • 39. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F31 +32 F41 • New files not present in the snapshots are moved to the archive HBaseCon 2013 6/13/201339
  • 40. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F41F3+ 32 • New files not present in the snapshots are moved to the archive • HFileLinks are created to point to old files. HBaseCon 2013 6/13/201340
  • 41. Restore failures • The table to restore is disabled • META and HDFS operations may fail (network issue, server down, …) • hbck can’t repair an incomplete restore... • Restore again! HBaseCon 2013 6/13/201341
  • 42. Export Snapshot • Copy a full snapshot to another cluster • All required HFiles, and Metadata • Lots of options • Fancy dist-cp • Must resolve HFileLinks • Faster than CopyTable or table export+import! • Minimal impact on running cluster HBaseCon 2013 6/13/201342
  • 44. Online snapshots • Take a snapshot without making the table unavailable • No need to disable the table • Continue accepting reads and writes from clients • Challenges • Coordinating Region Servers • Data is in memory • Consistency HBaseCon 2013 6/13/201344
  • 45. Offline vs Online Snapshots Offline Online mastermaster RS1 RS2 RS3 RS4 verify Snapshot region subprocedure Write manifest per region verify Write manifest per region 6/13/2013HBaseCon 201345
  • 46. Online Snapshots • Each Region can have data in memstore and hlog, not yet Hfile • Snapshot is missing in memory data! ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files F31 mem mem mem HBaseCon 2013 6/13/201346
  • 47. Online Snapshots • Flush so that all in memory data written in an Hfile • Then add to snapshot manifest ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 Table files F31 F13 F23 F33 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201347
  • 48. Online Snapshots • Flush so that all in memory data in an Hfile • Then add to snapshot manifest ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 Table files F31 F13 F23 F33 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201348
  • 49. Consistency • Offline Snapshots • Fully consistent snapshot • Online Flush Snapshot • “CopyTable” level consistency with a much smaller window. • Time bounded by slowest region server and region flush HBaseCon 2013 6/13/201349
  • 50. Online Snapshots and Causal consistency • Causal consistency would only allow A, AB, or neither A nor B. • B and Not A is currently possible Table F11 F21 R1 R2 R3 F31 TableSnapshot manifest R1 R2 R3 Master RS1 RS2 Client mem mem Flush SS F13 HBaseCon 2013 6/13/201350
  • 51. Online Snapshots and Causal consistency • Causal consistency would only allow A, AB, or neither A nor B. • B and Not A is currently possible Table F11 F21 R1 R2 R3 F31 TableSnapshot manifest R1 R2 R3 Master RS1 RS2 Client mem mem Put A … … then Put B F13 mem HBaseCon 2013 6/13/201351
  • 52. Online Snapshots and Causal consistency • Causal consistency would allow A, AB, or neither A nor B. • B and Not A is possible with Flush Snapshots Table F11 F21 R1 R2 R3 F31 TableSnapshot manifest R1 R2 R3 Master RS1 RS2 Client mem F23 F13 Flush SS Put B is in but Put A is not! F33 HBaseCon 2013 6/13/201352
  • 53. Online snapshot attempts can fail • If involved RS’s fail, the snapshots attempt will fail. • Needs a way to prevent other table metadata operations • Table Metadata Locks (0.95+) • Avoid many snapshot failures conflicts(Ex: Online schema, splits) • Failed attempt will report errors -- user must retry. • o.a.h.hbase.snapshot.HBaseSnapshotException • o.a.h.hbase.snapshot.CorruptedSnapshotException HBaseCon 2013 6/13/201353
  • 54. Development Notes How we collaborated, built, and tested
  • 55. Table Snapshots Development • Developed in a Branch off of trunk merged and in 0.95 and trunk. • Feature is too big to include as a single patch • Does not destabilize trunk • Does not slow time-based release trains • Later Backported to 0.94.6.1 src branch Reintegrate into trunk sync HBaseCon 2013 6/13/201355
  • 56. System testing with Jenkins • Concurrently load data while taking snapshots • Inject compactions, Kill RS’s, Meta RS, Master • Create snapshot clones of the snapshots • Inject Compactions, Kill RS’s, META Rs, Master HBaseCon 2013 6/13/201356
  • 57. Future Work: • Alternative semantics and implementations • Log Roll Snapshot (HBASE-7291) • Store logs and replay on restore • Faster for snapshot, slower and more complicated for restore. • Timestamp Snapshot (HBASE-6866) • All updates before ts in snapshot, all after not in snapshot • Longer pause before snapshot taken • Globally-Consistent Snapshot (HBASE-6867) • global write lock for all regions nodes until snapshot complete. • Expensive • Repair tools • Manual repairs necessary (hbck does not support yet) HBaseCon 2013 6/13/201357
  • 59. Feature Summary by Version Apache 0.92.x Apache <0.94.6.1 Apache 0.94.6.1+ Apache 0.95.0 Apache 0.96.0 Copy Table Copy Table Copy Table Import / Export Import / Export Import / Export Offline snapshots Offline snapshots Flush Online Snapshot Flush Online Snapshot Table Locks HBaseCon 2013 6/13/201359
  • 60. Key Contributors • Jesse Yates (Salesforce.com) • HFileArchiver, Offline Snapshot, first draft online • Matteo Bertozzi (Cloudera) • HFileLink, Restore, clone, Testing, 0.94 backport • Jonathan Hsieh (Cloudera) • Online Snapshots revamp, Testing, Branch Sheppard • Ted Yu (HortonWorks) • Reviews • Enis Soztutar (HortonWorks) • Table Locks on Snapshots HBaseCon 2013 6/13/201360
  • 61. Thanks! Questions? Matteo Bertozzi @th30z matteo.bertozzi@cloudera.com Jonathan Hsieh @jmhsieh jon@cloudera.com Jesse Yates @jesse_yates jesse.k.yates@gmail.com HBaseCon 2013 6/13/201361

Notas do Editor

  1. Talk about everything! Don’t glaze over internals
  2. Why does it cause extra latency? What “crushes” the cluster?
  3. Causes move expensive backups. Have a bunch of ‘write optimized files’ – HLogs and have to convert them to ‘read optimized files’ – HFIles. This isn’t a cheap process.
  4. Don’t over sell! Just say what it is.
  5. Add a quick summary of what just talked about BEFORE handoff!!!