SlideShare uma empresa Scribd logo
1 de 17
Baixar para ler offline
A High Availability story for HDFS
AvatarNode
Dhruba Borthakur & Dmytro Molkov
dhruba@apache.org & dms@facebook.com
Presented at The Hadoop User Group Meeting,
Sept 29, 2010
How infrequently does the NameNode (NN) stop?
  Hadoop Software Bugs
–  Two directories in fs.name.dir, but when a write to first
directory failed, the NN ignored the second one (once)
–  Upgrade from 0.17 to 0.18 caused data corruption
(once)
  Configuration errors
–  Fsimage partition ran out of space (once)
–  Network Load Anomalies (about 10 times)
  Maintenance:
–  Deploy new patches (once every month)
What does the SecondaryNameNode do?
  Periodically merges Transaction logs
  Requires the same amount of memory as NN
  Why is it separate from NN?
–  Avoids fine-grain locking of NN data structures
–  Avoids implementing copy-on-write for NN data
structures
  Renamed as CheckpointNode (CN) in 0.21 release.
Shortcomings of the SecondaryNameNode?
  Does not have a copy of the latest transaction log
  Periodic and is not continuous
–  Configured to run every hour
  If the NN dies, the SecondaryNameNode does not take
over the responsibilities of the NN
BackupNode (BN)
  NN streams transaction log to BackupNode
  BackupNode applies log to in-memory and disk image
  BN always commit to disk before success to NN
  If BN restarts, it has to catch up with NN
  Available in HDFS 0.21 release
Limitations of BackupNode (BN)
  Maximum of one BackupNode per NN
–  Support only two-machine failure
  NN does not forward block reports to BackupNode
  Time to restart from 12 GB image, 70M files + 100 M
blocks
–  3 – 5 minutes to read the image from disk
–  20 min to process block reports
–  BN will still take 25+ minutes to failover!
Overlapping Clusters for HA
  “Always available for write” model
  Two logical clusters each with their own NN
  Each physical machine runs two instances of DataNode
  Two DataNode instances share the same physical storage
device
  Application has logic to failover writes from one HDFS
cluster to another
  More details at http://hadoopblog.blogspot.com/2009/06/hdfs-
scribe-integration.html
HDFS+Zookeeper
  HDFS can store transaction logs in Zookeeper/Bookeeper
–  http://issues.apache.org/jira/browse/HDFS-234
  Transaction log need not be stored in NFS filer
  A new NN will still have to process block reports
–  Not good for HA yet, because NN failover will take 30 minutes
Our use case for High Availability
  Failover should occur in less than a minute
  Failovers are needed only for new software upgrades
Challenges
  DataNodes send block location
information to only one
NameNode
  NameNode needs block locations
in memory to serve clients
  The in-memory metadata for 100
million files could be 60 GB,
huge!
DataNodes
Primary
NameNode
Client
Block location
message “yes, I
have blockid 123”
Client retrieves
block location from
NameNode
Introduction for AvatarNode
  Active-Standby Pair
–  Coordinated via zookeeper
–  Failover in few seconds
–  Wrapper over NameNode
  Active AvatarNode
–  Writes transaction log to filer
  Standby AvatarNode
–  Reads transactions from filer
–  Latest metadata in memory
http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html
NFS
Filer
Active
AvatarNode
(NameNode)
Client
Standby
AvatarNode
(NameNode)
Block
location
messages
Client retrieves
block location from
Primary or Standby
write
transaction
read
transaction
Block
location
messages
DataNodes
ZooKeeper integration for Clients
  DistributedAvatarFileSystem:
–  Connects to ZooKeeper to figure out who the Primary node is.
There is a znode in ZooKeeper that has the current address of the
primary. Clients read it on creation and during failover.
–  Is aware of the failover state and pauses until it is over. If the
znode is empty the cluster is failing over to the new Primary, just
wait for that to finish.
–  Handles failures in calls to the NameNode. If the call failed with
network exception – checks for the failover in progress and retries
the call after the new Primary is up.
Four steps to failover
  Wipe ZooKeeper entry. Clients will know the failover is in
progress. (0 seconds)
  Stop the primary namenode. Last bits of data will be
flushed to Transaction Log and it will die. (Seconds)
  Switch Standby to Primary. It will consume the rest of the
Transaction log and get out of safemode ready to serve
traffic. (Seconds)
  Update the entry in ZooKeeper. All the clients waiting for
failover will pick up the new connection. (0 seconds)
  After: Start the first node in the Standby Mode (Takes a
while, but the cluster is up and running)
Why add ZooKeeper to the mix
  Provides a clean way to execute failovers in the application
layer.
  A centralized control of all the clients. Gives us the ability
to Pause clients until the failover is done.
  A good stepping stone for future improvements needed to
perform automatic failover:
–  Nodes voting on who will be the primary
–  DataNodes knowing who has the authority to delete blocks
ZooKeeper is an option
  Can be implemented using IP failover based on the existing
infrastructure.
  The clients will not know if the failover is done and it is
safe to make the call again.
  IP failover works well in tightly coupled system (both nodes
in one rack) so the Single Point of Failure is still there (rack
switch)
  There is no need to run a dedicated ZooKeeper cluster.
It is not all about failover
  The Standby node has a lot of CPU that is not used
  The Standby node has a full (but delayed a bit) picture of
Blocks and the Namesystem.
  Send all reads that can deal with stale data to Standby.
  Have pluggable services run as a part of Standby node and
use the metadata of the filesystem directly from the
shared memory instead of querying the namenode all the
time.
  My Hadoop Blog:
–  http://hadoopblog.blogspot.com/
–  http://www.facebook.com/hadoopfs

Mais conteúdo relacionado

Mais procurados

Tier 2 net app baseline design standard revised nov 2011
Tier 2 net app baseline design standard   revised nov 2011Tier 2 net app baseline design standard   revised nov 2011
Tier 2 net app baseline design standard revised nov 2011Accenture
 
High Availability in 37 Easy Steps
High Availability in 37 Easy StepsHigh Availability in 37 Easy Steps
High Availability in 37 Easy StepsTim Serong
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)Kirill Tsym
 
Linux System Monitoring
Linux System Monitoring Linux System Monitoring
Linux System Monitoring PriyaTeli
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Brendan Gregg
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudBrendan Gregg
 
Understand and optimize Linux I/O
Understand and optimize Linux I/OUnderstand and optimize Linux I/O
Understand and optimize Linux I/OAndrea Righi
 
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptxThink_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptxPayal Singh
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Interview questions
Interview questionsInterview questions
Interview questionsxavier john
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizationsBrendan Gregg
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologiesBrendan Gregg
 
제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustre제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustreTommy Lee
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Amazon Web Services
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF SuperpowersBrendan Gregg
 
Testing real-time Linux. What to test and how
Testing real-time Linux. What to test and how Testing real-time Linux. What to test and how
Testing real-time Linux. What to test and how Chirag Jog
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to LinuxBrendan Gregg
 
Practice and challenges from building IaaS
Practice and challenges from building IaaSPractice and challenges from building IaaS
Practice and challenges from building IaaSShawn Zhu
 
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensOpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensMatthew Ahrens
 
Everything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About OrchestrationEverything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About OrchestrationLaura Frank Tacho
 

Mais procurados (20)

Tier 2 net app baseline design standard revised nov 2011
Tier 2 net app baseline design standard   revised nov 2011Tier 2 net app baseline design standard   revised nov 2011
Tier 2 net app baseline design standard revised nov 2011
 
High Availability in 37 Easy Steps
High Availability in 37 Easy StepsHigh Availability in 37 Easy Steps
High Availability in 37 Easy Steps
 
FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
 
Linux System Monitoring
Linux System Monitoring Linux System Monitoring
Linux System Monitoring
 
Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016Broken Linux Performance Tools 2016
Broken Linux Performance Tools 2016
 
Performance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloudPerformance Analysis: new tools and concepts from the cloud
Performance Analysis: new tools and concepts from the cloud
 
Understand and optimize Linux I/O
Understand and optimize Linux I/OUnderstand and optimize Linux I/O
Understand and optimize Linux I/O
 
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptxThink_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
Think_your_Postgres_backups_and_recovery_are_safe_lets_talk.pptx
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Interview questions
Interview questionsInterview questions
Interview questions
 
LISA2010 visualizations
LISA2010 visualizationsLISA2010 visualizations
LISA2010 visualizations
 
Lisa12 methodologies
Lisa12 methodologiesLisa12 methodologies
Lisa12 methodologies
 
제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustre제3회난공불락 오픈소스 인프라세미나 - lustre
제3회난공불락 오픈소스 인프라세미나 - lustre
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
Linux BPF Superpowers
Linux BPF SuperpowersLinux BPF Superpowers
Linux BPF Superpowers
 
Testing real-time Linux. What to test and how
Testing real-time Linux. What to test and how Testing real-time Linux. What to test and how
Testing real-time Linux. What to test and how
 
From DTrace to Linux
From DTrace to LinuxFrom DTrace to Linux
From DTrace to Linux
 
Practice and challenges from building IaaS
Practice and challenges from building IaaSPractice and challenges from building IaaS
Practice and challenges from building IaaS
 
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt AhrensOpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
OpenZFS novel algorithms: snapshots, space allocation, RAID-Z - Matt Ahrens
 
Everything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About OrchestrationEverything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About Orchestration
 

Destaque

Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsHadoop User Group
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21Hadoop User Group
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21Hadoop User Group
 
Twitter Protobufs And Hadoop Hug 021709
Twitter Protobufs And Hadoop   Hug 021709Twitter Protobufs And Hadoop   Hug 021709
Twitter Protobufs And Hadoop Hug 021709Hadoop User Group
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Hadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Hadoop User Group
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 

Destaque (14)

Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Cascalog internal dsl_preso
Cascalog internal dsl_presoCascalog internal dsl_preso
Cascalog internal dsl_preso
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
Karmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-toolsKarmasphere hadoop-productivity-tools
Karmasphere hadoop-productivity-tools
 
1 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-211 content optimization-hug-2010-07-21
1 content optimization-hug-2010-07-21
 
Mumak
MumakMumak
Mumak
 
File Context
File ContextFile Context
File Context
 
2 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-212 hadoop@e bay-hug-2010-07-21
2 hadoop@e bay-hug-2010-07-21
 
Twitter Protobufs And Hadoop Hug 021709
Twitter Protobufs And Hadoop   Hug 021709Twitter Protobufs And Hadoop   Hug 021709
Twitter Protobufs And Hadoop Hug 021709
 
Cloudera Desktop
Cloudera DesktopCloudera Desktop
Cloudera Desktop
 
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
Yahoo! Hadoop User Group - May 2010 Meetup - Apache Hadoop Release Plans for ...
 
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
Yahoo! Hadoop User Group - May Meetup - Extraordinarily rapid and robust data...
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 

Semelhante a Hdfs high availability

Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Mandakini Kumari
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introducejhao niu
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High AvailabilityCloudera, Inc.
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxMiraj Godha
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveUlf Wendel
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hivesrikanthhadoop
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyStuart Pook
 
LOAD BALANCING OF APPLICATIONS USING XEN HYPERVISOR
LOAD BALANCING OF APPLICATIONS  USING XEN HYPERVISORLOAD BALANCING OF APPLICATIONS  USING XEN HYPERVISOR
LOAD BALANCING OF APPLICATIONS USING XEN HYPERVISORVanika Kapoor
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Chris Nauroth
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualizationFranck Pachot
 
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...DataWorks Summit
 
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...WebCamp
 

Semelhante a Hdfs high availability (20)

.ppt
.ppt.ppt
.ppt
 
Hadoop
HadoopHadoop
Hadoop
 
NetApp against ransomware
NetApp against ransomwareNetApp against ransomware
NetApp against ransomware
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
 
Apache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptxApache Hadoop- Hadoop Basics.pptx
Apache Hadoop- Hadoop Basics.pptx
 
Unit 1
Unit 1Unit 1
Unit 1
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
 
Network Testing ques
Network Testing quesNetwork Testing ques
Network Testing ques
 
Apache hadoop and hive
Apache hadoop and hiveApache hadoop and hive
Apache hadoop and hive
 
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster RedundancyPilot Hadoop Towards 2500 Nodes and Cluster Redundancy
Pilot Hadoop Towards 2500 Nodes and Cluster Redundancy
 
LOAD BALANCING OF APPLICATIONS USING XEN HYPERVISOR
LOAD BALANCING OF APPLICATIONS  USING XEN HYPERVISORLOAD BALANCING OF APPLICATIONS  USING XEN HYPERVISOR
LOAD BALANCING OF APPLICATIONS USING XEN HYPERVISOR
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Hdfs questions answers
Hdfs questions answersHdfs questions answers
Hdfs questions answers
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualization
 
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
NonStop Hadoop - Applying the PaxosFamily of Protocols to make Critical Hadoo...
 
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...
WebCamp 2016: DevOps. Николай Дойков: Опыт создания клауда для потокового вид...
 

Mais de Hadoop User Group

HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practicesHadoop User Group
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010Hadoop User Group
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduceHadoop User Group
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupHadoop User Group
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation HadoopHadoop User Group
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Hadoop Record Reader In Python
Hadoop Record Reader In PythonHadoop Record Reader In Python
Hadoop Record Reader In PythonHadoop User Group
 
Karmasphere Studio for Hadoop
Karmasphere Studio for HadoopKarmasphere Studio for Hadoop
Karmasphere Studio for HadoopHadoop User Group
 

Mais de Hadoop User Group (18)

Common crawlpresentation
Common crawlpresentationCommon crawlpresentation
Common crawlpresentation
 
Pig at Linkedin
Pig at LinkedinPig at Linkedin
Pig at Linkedin
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practices
 
3 avro hug-2010-07-21
3 avro hug-2010-07-213 avro hug-2010-07-21
3 avro hug-2010-07-21
 
1 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit20101 hadoop security_in_details_hadoop_summit2010
1 hadoop security_in_details_hadoop_summit2010
 
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReducePublic Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
Public Terabyte Dataset Project: Web crawling with Amazon Elastic MapReduce
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
Yahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user groupYahoo! Mail antispam - Bay area Hadoop user group
Yahoo! Mail antispam - Bay area Hadoop user group
 
Flightcaster Presentation Hadoop
Flightcaster  Presentation  HadoopFlightcaster  Presentation  Hadoop
Flightcaster Presentation Hadoop
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Security Preview
Hadoop Security PreviewHadoop Security Preview
Hadoop Security Preview
 
Hadoop Release Plan Feb17
Hadoop Release Plan Feb17Hadoop Release Plan Feb17
Hadoop Release Plan Feb17
 
Ordered Record Collection
Ordered Record CollectionOrdered Record Collection
Ordered Record Collection
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Searching At Scale
Searching At ScaleSearching At Scale
Searching At Scale
 
Hadoop Record Reader In Python
Hadoop Record Reader In PythonHadoop Record Reader In Python
Hadoop Record Reader In Python
 
Karmasphere Studio for Hadoop
Karmasphere Studio for HadoopKarmasphere Studio for Hadoop
Karmasphere Studio for Hadoop
 

Último

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfChris Hunter
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docxPoojaSen20
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Shubhangi Sonawane
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 

Último (20)

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 

Hdfs high availability

  • 1. A High Availability story for HDFS AvatarNode Dhruba Borthakur & Dmytro Molkov dhruba@apache.org & dms@facebook.com Presented at The Hadoop User Group Meeting, Sept 29, 2010
  • 2. How infrequently does the NameNode (NN) stop?   Hadoop Software Bugs –  Two directories in fs.name.dir, but when a write to first directory failed, the NN ignored the second one (once) –  Upgrade from 0.17 to 0.18 caused data corruption (once)   Configuration errors –  Fsimage partition ran out of space (once) –  Network Load Anomalies (about 10 times)   Maintenance: –  Deploy new patches (once every month)
  • 3. What does the SecondaryNameNode do?   Periodically merges Transaction logs   Requires the same amount of memory as NN   Why is it separate from NN? –  Avoids fine-grain locking of NN data structures –  Avoids implementing copy-on-write for NN data structures   Renamed as CheckpointNode (CN) in 0.21 release.
  • 4. Shortcomings of the SecondaryNameNode?   Does not have a copy of the latest transaction log   Periodic and is not continuous –  Configured to run every hour   If the NN dies, the SecondaryNameNode does not take over the responsibilities of the NN
  • 5. BackupNode (BN)   NN streams transaction log to BackupNode   BackupNode applies log to in-memory and disk image   BN always commit to disk before success to NN   If BN restarts, it has to catch up with NN   Available in HDFS 0.21 release
  • 6. Limitations of BackupNode (BN)   Maximum of one BackupNode per NN –  Support only two-machine failure   NN does not forward block reports to BackupNode   Time to restart from 12 GB image, 70M files + 100 M blocks –  3 – 5 minutes to read the image from disk –  20 min to process block reports –  BN will still take 25+ minutes to failover!
  • 7. Overlapping Clusters for HA   “Always available for write” model   Two logical clusters each with their own NN   Each physical machine runs two instances of DataNode   Two DataNode instances share the same physical storage device   Application has logic to failover writes from one HDFS cluster to another   More details at http://hadoopblog.blogspot.com/2009/06/hdfs- scribe-integration.html
  • 8. HDFS+Zookeeper   HDFS can store transaction logs in Zookeeper/Bookeeper –  http://issues.apache.org/jira/browse/HDFS-234   Transaction log need not be stored in NFS filer   A new NN will still have to process block reports –  Not good for HA yet, because NN failover will take 30 minutes
  • 9. Our use case for High Availability   Failover should occur in less than a minute   Failovers are needed only for new software upgrades
  • 10. Challenges   DataNodes send block location information to only one NameNode   NameNode needs block locations in memory to serve clients   The in-memory metadata for 100 million files could be 60 GB, huge! DataNodes Primary NameNode Client Block location message “yes, I have blockid 123” Client retrieves block location from NameNode
  • 11. Introduction for AvatarNode   Active-Standby Pair –  Coordinated via zookeeper –  Failover in few seconds –  Wrapper over NameNode   Active AvatarNode –  Writes transaction log to filer   Standby AvatarNode –  Reads transactions from filer –  Latest metadata in memory http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html NFS Filer Active AvatarNode (NameNode) Client Standby AvatarNode (NameNode) Block location messages Client retrieves block location from Primary or Standby write transaction read transaction Block location messages DataNodes
  • 12. ZooKeeper integration for Clients   DistributedAvatarFileSystem: –  Connects to ZooKeeper to figure out who the Primary node is. There is a znode in ZooKeeper that has the current address of the primary. Clients read it on creation and during failover. –  Is aware of the failover state and pauses until it is over. If the znode is empty the cluster is failing over to the new Primary, just wait for that to finish. –  Handles failures in calls to the NameNode. If the call failed with network exception – checks for the failover in progress and retries the call after the new Primary is up.
  • 13. Four steps to failover   Wipe ZooKeeper entry. Clients will know the failover is in progress. (0 seconds)   Stop the primary namenode. Last bits of data will be flushed to Transaction Log and it will die. (Seconds)   Switch Standby to Primary. It will consume the rest of the Transaction log and get out of safemode ready to serve traffic. (Seconds)   Update the entry in ZooKeeper. All the clients waiting for failover will pick up the new connection. (0 seconds)   After: Start the first node in the Standby Mode (Takes a while, but the cluster is up and running)
  • 14. Why add ZooKeeper to the mix   Provides a clean way to execute failovers in the application layer.   A centralized control of all the clients. Gives us the ability to Pause clients until the failover is done.   A good stepping stone for future improvements needed to perform automatic failover: –  Nodes voting on who will be the primary –  DataNodes knowing who has the authority to delete blocks
  • 15. ZooKeeper is an option   Can be implemented using IP failover based on the existing infrastructure.   The clients will not know if the failover is done and it is safe to make the call again.   IP failover works well in tightly coupled system (both nodes in one rack) so the Single Point of Failure is still there (rack switch)   There is no need to run a dedicated ZooKeeper cluster.
  • 16. It is not all about failover   The Standby node has a lot of CPU that is not used   The Standby node has a full (but delayed a bit) picture of Blocks and the Namesystem.   Send all reads that can deal with stale data to Standby.   Have pluggable services run as a part of Standby node and use the metadata of the filesystem directly from the shared memory instead of querying the namenode all the time.
  • 17.   My Hadoop Blog: –  http://hadoopblog.blogspot.com/ –  http://www.facebook.com/hadoopfs