SlideShare uma empresa Scribd logo
1 de 18
NameNode HA Suresh Srinivas - Hortonworks Aaron T. Myers - Cloudera
Overview Part 1 – Suresh Srinivas (Hortonworks) HDFS Availability and Data Integrity – what is the record? NN HA Design Part 2 – Aaron T. Myers (Cloudera) NN HA Design continued Client-NN Connection failover Operations and Admin of HA Future Work 2
Current HDFS Availability & Data Integrity Simple design, storage fault tolerance Storage: Rely in OS’s file system rather than use raw disk Storage Fault Tolerance: multiple replicas, active monitoring Single NameNode Master Persistent state:  multiple copies  + checkpoints Restart on failure How well did it work? Lost 19 out of 329 Million blocks on 10 clusters with 20K nodes in 2009  7-9’s of reliability Fixed in 20 and 21. 18 month Study: 22 failures on 25 clusters - 0.58 failures per year per cluster Only 8 would have benefitted from HA failover!! (0.23 failures per cluster year) NN is very robust and can take a lot of abuse NN is resilient against overload caused by misbehaving apps 3
 HA NameNode Active work has started on HA NameNode (Failover) HA NameNode Detailed design and sub tasks in HDFS-1623 HA: Related work Backup NN (0.21) Avatar NN (Facebook) HA NN prototype using Linux HA (Yahoo!) HA NN prototype with Backup NN and block report replicator (eBay) HA is the highest priority 4
Approach and Terminology Initial goal is Active-Standby With Federation each namespace volume has a NameNode Single active NN for any namespace volume Terminology Active NN –  actively serves the read/write operations from the clients Standby NN  - waits, becomes active when Active dies or is unhealthy Could serve read operations Standby’s State may be cold, warm or hot  Cold : Standby has zero state (e.g. started after the Active is declared dead. Warm: Standby has partial state: has loaded fsImage& editLogsbut has not received any block reports Hot Standby: Standby has all most of the Active’s state and start immediately 5
High Level Use Cases Supported failures Single hardware failure Double hardware failure not supported Some software failures Same software failure affects both active and standby 6 Planned downtime Upgrades Config changes Main reason for downtime Unplanned downtime Hardware failure Server unresponsive Software failures Occurs infrequently
Use Cases Deployment models Single NN configuration; no failover Active and Standby with manual failover Standby could be cold/warm/hot Addresses downtime during upgrades – main cause of unavailability Active and Standby with automatic failover Hot standby Addresses downtime during upgrades and other failures See HDFS-1623 for detailed use cases 7
Design Failover control outside NN Parallel Block reports to Active and Standby (Hot failover) Shared or non-shared NN state Fencing of shared resources/data Datanodes Shared NN state (if any) Client failover IP Failover Smart clients (e.g configuration, or ZooKeeper for coordination) 8
Failover Control Outside NN HA Daemon outside NameNode Daemon manages resources All resources modeled uniformly Resources – OS, HW, Network etc. NameNode is just another resource Heartbeat with other nodes Quorum based leader election Zookeeper for coordination and Quorum Fencing during split brain Prevents data corruption Quorum Service Heartbeat Leader Election HA Daemon Resources Resources Resources Actions start, stop,  failover, monitor, … Fencing/ STONITH Shared Resources
NN HA with Shared Storage and ZooKeeper ZK ZK ZK Heartbeat Heartbeat FailoverController Standby FailoverController Active Cmds Monitor Health  of NN. OS, HW Monitor Health  of NN. OS, HW NN Active NN Standby Shared NN state with single writer (fencing) Block Reports to Active & Standby DN fencing: Update cmds from one DN DN DN
HA Design Details 11
Client Failover Design Smart clients Users use one logical URI, client selects correct NN to connect to Implementing two options out of the box Client Knows of multiple NNs  Use a coordination service (ZooKeeper) Common things between these Which operations are idempotent, therefore safe to retry on a failover Failover/retry strategies Some differences Expected time for client failover Ease of administration 12
Ops/Admin: Shared Storage To share NN state, need shared storage Needs to be HA itself to avoid just shifting SPOF BookKeeper, etc will likely take care of this in the future Many come with IP fencing options Recommended mount options: tcp,soft,intr,timeo=60,retrans=10 Not all edits directories are created equal Used to be all edits dirs were just a pool of redundant dirs Can now configure some edits directories to be required Can now configure number of tolerated failures You want at least 2 for durability, 1 remote for HA 13
Ops/Admin: NN fencing Client failover does not solve this problem Out of the box RPC to active NN to tell it to go to standby (graceful failover) SSH to active NN and `kill -9’ NN Pluggable options Many filers have protocols for IP-based fencing options Many PDUs have protocols for IP-based plug-pulling (STONITH) Nuke the node from orbit. It’s the only way to be sure. Configure extra options if available to you Will be tried in order during a failover event Escalate the aggressiveness of the method Fencing is critical for correctness of NN metadata 14
Ops/Admin: Monitoring New NN metrics Size of pending DN message queues Seconds since the standby NN last read from shared edit log DN block report lag All measurements of standby NN lag – monitor/alert on all of these Monitor shared storage solution Volumes fill up, disks go bad, etc Should configure paranoid edit log retention policy (default is 2) Canary-based monitoring of HDFS a good idea Pinging both NNs not sufficient 15
Ops/Admin: Hardware Active/Standby NNs should be on separate racks Shared storage system should be on separate rack Active/Standby NNs should have close to the same hardware Same amount of RAM – need to store the same things Same # of processors - need to serve same number of clients All the same recommendations still apply for NN ECC memory, 48GB Several separate disks for NN metadata directories Redundant disks for OS drives, probably RAID 5 or mirroring Redundant power 16
Future Work Other options to share NN metadata BookKeeper Multiple, potentially non-HA filers Entirely different metadata system More advanced client failover/load shedding Serve stale reads from the standby NN Speculative RPC Non-RPC clients (IP failover, DNS failover, proxy, etc.) Even Higher HA Multiple standby NNs 17
QA Detailed design (HDFS-1623) Community effort HDFS-1971, 1972, 1973, 1974, 1975, 2005, 2064, 1073 18

Mais conteúdo relacionado

Mais procurados

Tsm7.1 seminar Stavanger
Tsm7.1 seminar StavangerTsm7.1 seminar Stavanger
Tsm7.1 seminar StavangerSolv AS
 
Presentation on backup and recoveryyyyyyyyyyyyy
Presentation on backup and recoveryyyyyyyyyyyyyPresentation on backup and recoveryyyyyyyyyyyyy
Presentation on backup and recoveryyyyyyyyyyyyyTehmina Gulfam
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.finalHortonworks
 
AITP July 2012 Presentation - Disaster Recovery - Business + Technology
AITP July 2012 Presentation - Disaster Recovery - Business + TechnologyAITP July 2012 Presentation - Disaster Recovery - Business + Technology
AITP July 2012 Presentation - Disaster Recovery - Business + TechnologyAndrew Miller
 
SVC / Storwize: cache partition analysis (BVQ howto)
SVC / Storwize: cache partition analysis  (BVQ howto)   SVC / Storwize: cache partition analysis  (BVQ howto)
SVC / Storwize: cache partition analysis (BVQ howto) Michael Pirker
 
Database backup & recovery
Database backup & recoveryDatabase backup & recovery
Database backup & recoveryMustafa Khan
 
Disaster Recovery & Data Backup Strategies
Disaster Recovery & Data Backup StrategiesDisaster Recovery & Data Backup Strategies
Disaster Recovery & Data Backup StrategiesSpiceworks
 
Designing large scale distributed systems
Designing large scale distributed systemsDesigning large scale distributed systems
Designing large scale distributed systemsAshwani Priyedarshi
 
Backup and recovery
Backup and recoveryBackup and recovery
Backup and recoverydhawal mehta
 
Backup & restore in windows
Backup & restore in windowsBackup & restore in windows
Backup & restore in windowsJab Vtl
 
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...Michael Pirker
 
Flashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drivesFlashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drivesPratik Bhat
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rulesOleg Tsal-Tsalko
 
Creating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery PlanCreating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery PlanRishu Mehra
 
EMC Data Domain Retention Lock Software: Detailed Review
EMC Data Domain Retention Lock Software: Detailed ReviewEMC Data Domain Retention Lock Software: Detailed Review
EMC Data Domain Retention Lock Software: Detailed ReviewEMC
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureCloudera, Inc.
 

Mais procurados (20)

Tsm7.1 seminar Stavanger
Tsm7.1 seminar StavangerTsm7.1 seminar Stavanger
Tsm7.1 seminar Stavanger
 
Presentation on backup and recoveryyyyyyyyyyyyy
Presentation on backup and recoveryyyyyyyyyyyyyPresentation on backup and recoveryyyyyyyyyyyyy
Presentation on backup and recoveryyyyyyyyyyyyy
 
Nn ha hadoop world.final
Nn ha hadoop world.finalNn ha hadoop world.final
Nn ha hadoop world.final
 
Backup
BackupBackup
Backup
 
Hadoop availability
Hadoop availabilityHadoop availability
Hadoop availability
 
AITP July 2012 Presentation - Disaster Recovery - Business + Technology
AITP July 2012 Presentation - Disaster Recovery - Business + TechnologyAITP July 2012 Presentation - Disaster Recovery - Business + Technology
AITP July 2012 Presentation - Disaster Recovery - Business + Technology
 
SVC / Storwize: cache partition analysis (BVQ howto)
SVC / Storwize: cache partition analysis  (BVQ howto)   SVC / Storwize: cache partition analysis  (BVQ howto)
SVC / Storwize: cache partition analysis (BVQ howto)
 
Fb Sales Enbl 1 4
Fb Sales Enbl 1 4Fb Sales Enbl 1 4
Fb Sales Enbl 1 4
 
Database backup & recovery
Database backup & recoveryDatabase backup & recovery
Database backup & recovery
 
Backup Exec 21
Backup Exec 21Backup Exec 21
Backup Exec 21
 
Disaster Recovery & Data Backup Strategies
Disaster Recovery & Data Backup StrategiesDisaster Recovery & Data Backup Strategies
Disaster Recovery & Data Backup Strategies
 
Designing large scale distributed systems
Designing large scale distributed systemsDesigning large scale distributed systems
Designing large scale distributed systems
 
Backup and recovery
Backup and recoveryBackup and recovery
Backup and recovery
 
Backup & restore in windows
Backup & restore in windowsBackup & restore in windows
Backup & restore in windows
 
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
Analyze a SVC, STORWIZE metro/ global mirror performance problem-v58-20150818...
 
Flashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drivesFlashy prefetching for high performance flash drives
Flashy prefetching for high performance flash drives
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
Creating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery PlanCreating And Implementing A Data Disaster Recovery Plan
Creating And Implementing A Data Disaster Recovery Plan
 
EMC Data Domain Retention Lock Software: Detailed Review
EMC Data Domain Retention Lock Software: Detailed ReviewEMC Data Domain Retention Lock Software: Detailed Review
EMC Data Domain Retention Lock Software: Detailed Review
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and Future
 

Destaque

Sociales esquemas
Sociales esquemasSociales esquemas
Sociales esquemasalbertito57
 
Playing with the moon
Playing with the moonPlaying with the moon
Playing with the moonLion Roars
 
La educación virtual en el periodismo - Tarea foro 6
La educación virtual en el periodismo - Tarea foro 6La educación virtual en el periodismo - Tarea foro 6
La educación virtual en el periodismo - Tarea foro 6Electrosur
 
Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...
Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...
Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...David Piao Chiu
 
The age of orchestration: from Docker basics to cluster management
The age of orchestration: from Docker basics to cluster managementThe age of orchestration: from Docker basics to cluster management
The age of orchestration: from Docker basics to cluster managementNicola Paolucci
 
Progression Search
Progression SearchProgression Search
Progression SearchJacelyn Tan
 
Career page photo slide show
Career page photo slide showCareer page photo slide show
Career page photo slide showAlissa Turpin
 
Características de los videojuegos
Características de los videojuegosCaracterísticas de los videojuegos
Características de los videojuegostraderotate1
 
Introction to docker swarm
Introction to docker swarmIntroction to docker swarm
Introction to docker swarmHsi-Kai Wang
 
Esurance Careers Slideshow
Esurance Careers SlideshowEsurance Careers Slideshow
Esurance Careers SlideshowEsurance
 
Randomforestで高次元の変数重要度を見る #japanr LT
 Randomforestで高次元の変数重要度を見る #japanr LT Randomforestで高次元の変数重要度を見る #japanr LT
Randomforestで高次元の変数重要度を見る #japanr LTAkifumi Eguchi
 
Linkedin us regional page intro slides edited_160902_final
Linkedin us regional page intro slides edited_160902_finalLinkedin us regional page intro slides edited_160902_final
Linkedin us regional page intro slides edited_160902_finalPaul Hussey
 
サイボウズの開発を支えるKAIZEN文化
サイボウズの開発を支えるKAIZEN文化サイボウズの開発を支えるKAIZEN文化
サイボウズの開発を支えるKAIZEN文化Teppei Sato
 
Launch Festival 2016 - Push Notifications -- You're Doing it Wrong
Launch Festival 2016 - Push Notifications -- You're Doing it WrongLaunch Festival 2016 - Push Notifications -- You're Doing it Wrong
Launch Festival 2016 - Push Notifications -- You're Doing it WrongRichard Sgro
 

Destaque (18)

Sociales esquemas
Sociales esquemasSociales esquemas
Sociales esquemas
 
Playing with the moon
Playing with the moonPlaying with the moon
Playing with the moon
 
La educación virtual en el periodismo - Tarea foro 6
La educación virtual en el periodismo - Tarea foro 6La educación virtual en el periodismo - Tarea foro 6
La educación virtual en el periodismo - Tarea foro 6
 
Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...
Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...
Kongregate - Maximizing Player Retention and Monetization in Free-to-Play Gam...
 
Bascules
BasculesBascules
Bascules
 
Meet the Outwarians
Meet the OutwariansMeet the Outwarians
Meet the Outwarians
 
The age of orchestration: from Docker basics to cluster management
The age of orchestration: from Docker basics to cluster managementThe age of orchestration: from Docker basics to cluster management
The age of orchestration: from Docker basics to cluster management
 
Progression Search
Progression SearchProgression Search
Progression Search
 
Career page photo slide show
Career page photo slide showCareer page photo slide show
Career page photo slide show
 
Características de los videojuegos
Características de los videojuegosCaracterísticas de los videojuegos
Características de los videojuegos
 
Minneapolis VAST/HQ
Minneapolis VAST/HQMinneapolis VAST/HQ
Minneapolis VAST/HQ
 
Introction to docker swarm
Introction to docker swarmIntroction to docker swarm
Introction to docker swarm
 
Esurance Careers Slideshow
Esurance Careers SlideshowEsurance Careers Slideshow
Esurance Careers Slideshow
 
Randomforestで高次元の変数重要度を見る #japanr LT
 Randomforestで高次元の変数重要度を見る #japanr LT Randomforestで高次元の変数重要度を見る #japanr LT
Randomforestで高次元の変数重要度を見る #japanr LT
 
Linkedin us regional page intro slides edited_160902_final
Linkedin us regional page intro slides edited_160902_finalLinkedin us regional page intro slides edited_160902_final
Linkedin us regional page intro slides edited_160902_final
 
サイボウズの開発を支えるKAIZEN文化
サイボウズの開発を支えるKAIZEN文化サイボウズの開発を支えるKAIZEN文化
サイボウズの開発を支えるKAIZEN文化
 
Aioug big data and hadoop
Aioug  big data and hadoopAioug  big data and hadoop
Aioug big data and hadoop
 
Launch Festival 2016 - Push Notifications -- You're Doing it Wrong
Launch Festival 2016 - Push Notifications -- You're Doing it WrongLaunch Festival 2016 - Push Notifications -- You're Doing it Wrong
Launch Festival 2016 - Push Notifications -- You're Doing it Wrong
 

Semelhante a Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & Sanjay Radia, Hortonworks

HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High AvailabilityHortonworks
 
Hadoop Summit 2012 | HDFS High Availability
Hadoop Summit 2012 | HDFS High AvailabilityHadoop Summit 2012 | HDFS High Availability
Hadoop Summit 2012 | HDFS High AvailabilityCloudera, Inc.
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
2007-05-23 Cecchet_PGCon2007.ppt
2007-05-23 Cecchet_PGCon2007.ppt2007-05-23 Cecchet_PGCon2007.ppt
2007-05-23 Cecchet_PGCon2007.pptnadirpervez2
 
Apache ignite as in-memory computing platform
Apache ignite as in-memory computing platformApache ignite as in-memory computing platform
Apache ignite as in-memory computing platformSurinder Mehra
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldDataWorks Summit
 
Domino server and application performance in the real world
Domino server and application performance in the real worldDomino server and application performance in the real world
Domino server and application performance in the real worlddominion
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringErik Krogen
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Chris Nauroth
 
Management and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - SlidesManagement and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - SlidesSeveralnines
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and FutureDataWorks Summit
 
Oracle presentations RAC dataguard active database
Oracle presentations RAC dataguard active databaseOracle presentations RAC dataguard active database
Oracle presentations RAC dataguard active databasemabessisindu
 

Semelhante a Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & Sanjay Radia, Hortonworks (20)

HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High Availability
 
Hadoop Summit 2012 | HDFS High Availability
Hadoop Summit 2012 | HDFS High AvailabilityHadoop Summit 2012 | HDFS High Availability
Hadoop Summit 2012 | HDFS High Availability
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Sinfonia
Sinfonia Sinfonia
Sinfonia
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
2007-05-23 Cecchet_PGCon2007.ppt
2007-05-23 Cecchet_PGCon2007.ppt2007-05-23 Cecchet_PGCon2007.ppt
2007-05-23 Cecchet_PGCon2007.ppt
 
Apache ignite as in-memory computing platform
Apache ignite as in-memory computing platformApache ignite as in-memory computing platform
Apache ignite as in-memory computing platform
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5Hadoop operations-2015-hadoop-summit-san-jose-v5
Hadoop operations-2015-hadoop-summit-san-jose-v5
 
Hadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the FieldHadoop Operations - Best Practices from the Field
Hadoop Operations - Best Practices from the Field
 
Domino server and application performance in the real world
Domino server and application performance in the real worldDomino server and application performance in the real world
Domino server and application performance in the real world
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
 
Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5Hadoop operations-2014-strata-new-york-v5
Hadoop operations-2014-strata-new-york-v5
 
Management and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - SlidesManagement and Automation of MongoDB Clusters - Slides
Management and Automation of MongoDB Clusters - Slides
 
HDFS- What is New and Future
HDFS- What is New and FutureHDFS- What is New and Future
HDFS- What is New and Future
 
dgintro (1).ppt
dgintro (1).pptdgintro (1).ppt
dgintro (1).ppt
 
Oracle presentations RAC dataguard active database
Oracle presentations RAC dataguard active databaseOracle presentations RAC dataguard active database
Oracle presentations RAC dataguard active database
 
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptxKudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
 
Hadoop 3 in a Nutshell
Hadoop 3 in a NutshellHadoop 3 in a Nutshell
Hadoop 3 in a Nutshell
 
NetApp against ransomware
NetApp against ransomwareNetApp against ransomware
NetApp against ransomware
 

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 

Último (20)

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 

Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & Sanjay Radia, Hortonworks

  • 1. NameNode HA Suresh Srinivas - Hortonworks Aaron T. Myers - Cloudera
  • 2. Overview Part 1 – Suresh Srinivas (Hortonworks) HDFS Availability and Data Integrity – what is the record? NN HA Design Part 2 – Aaron T. Myers (Cloudera) NN HA Design continued Client-NN Connection failover Operations and Admin of HA Future Work 2
  • 3. Current HDFS Availability & Data Integrity Simple design, storage fault tolerance Storage: Rely in OS’s file system rather than use raw disk Storage Fault Tolerance: multiple replicas, active monitoring Single NameNode Master Persistent state: multiple copies + checkpoints Restart on failure How well did it work? Lost 19 out of 329 Million blocks on 10 clusters with 20K nodes in 2009 7-9’s of reliability Fixed in 20 and 21. 18 month Study: 22 failures on 25 clusters - 0.58 failures per year per cluster Only 8 would have benefitted from HA failover!! (0.23 failures per cluster year) NN is very robust and can take a lot of abuse NN is resilient against overload caused by misbehaving apps 3
  • 4. HA NameNode Active work has started on HA NameNode (Failover) HA NameNode Detailed design and sub tasks in HDFS-1623 HA: Related work Backup NN (0.21) Avatar NN (Facebook) HA NN prototype using Linux HA (Yahoo!) HA NN prototype with Backup NN and block report replicator (eBay) HA is the highest priority 4
  • 5. Approach and Terminology Initial goal is Active-Standby With Federation each namespace volume has a NameNode Single active NN for any namespace volume Terminology Active NN – actively serves the read/write operations from the clients Standby NN - waits, becomes active when Active dies or is unhealthy Could serve read operations Standby’s State may be cold, warm or hot Cold : Standby has zero state (e.g. started after the Active is declared dead. Warm: Standby has partial state: has loaded fsImage& editLogsbut has not received any block reports Hot Standby: Standby has all most of the Active’s state and start immediately 5
  • 6. High Level Use Cases Supported failures Single hardware failure Double hardware failure not supported Some software failures Same software failure affects both active and standby 6 Planned downtime Upgrades Config changes Main reason for downtime Unplanned downtime Hardware failure Server unresponsive Software failures Occurs infrequently
  • 7. Use Cases Deployment models Single NN configuration; no failover Active and Standby with manual failover Standby could be cold/warm/hot Addresses downtime during upgrades – main cause of unavailability Active and Standby with automatic failover Hot standby Addresses downtime during upgrades and other failures See HDFS-1623 for detailed use cases 7
  • 8. Design Failover control outside NN Parallel Block reports to Active and Standby (Hot failover) Shared or non-shared NN state Fencing of shared resources/data Datanodes Shared NN state (if any) Client failover IP Failover Smart clients (e.g configuration, or ZooKeeper for coordination) 8
  • 9. Failover Control Outside NN HA Daemon outside NameNode Daemon manages resources All resources modeled uniformly Resources – OS, HW, Network etc. NameNode is just another resource Heartbeat with other nodes Quorum based leader election Zookeeper for coordination and Quorum Fencing during split brain Prevents data corruption Quorum Service Heartbeat Leader Election HA Daemon Resources Resources Resources Actions start, stop, failover, monitor, … Fencing/ STONITH Shared Resources
  • 10. NN HA with Shared Storage and ZooKeeper ZK ZK ZK Heartbeat Heartbeat FailoverController Standby FailoverController Active Cmds Monitor Health of NN. OS, HW Monitor Health of NN. OS, HW NN Active NN Standby Shared NN state with single writer (fencing) Block Reports to Active & Standby DN fencing: Update cmds from one DN DN DN
  • 12. Client Failover Design Smart clients Users use one logical URI, client selects correct NN to connect to Implementing two options out of the box Client Knows of multiple NNs Use a coordination service (ZooKeeper) Common things between these Which operations are idempotent, therefore safe to retry on a failover Failover/retry strategies Some differences Expected time for client failover Ease of administration 12
  • 13. Ops/Admin: Shared Storage To share NN state, need shared storage Needs to be HA itself to avoid just shifting SPOF BookKeeper, etc will likely take care of this in the future Many come with IP fencing options Recommended mount options: tcp,soft,intr,timeo=60,retrans=10 Not all edits directories are created equal Used to be all edits dirs were just a pool of redundant dirs Can now configure some edits directories to be required Can now configure number of tolerated failures You want at least 2 for durability, 1 remote for HA 13
  • 14. Ops/Admin: NN fencing Client failover does not solve this problem Out of the box RPC to active NN to tell it to go to standby (graceful failover) SSH to active NN and `kill -9’ NN Pluggable options Many filers have protocols for IP-based fencing options Many PDUs have protocols for IP-based plug-pulling (STONITH) Nuke the node from orbit. It’s the only way to be sure. Configure extra options if available to you Will be tried in order during a failover event Escalate the aggressiveness of the method Fencing is critical for correctness of NN metadata 14
  • 15. Ops/Admin: Monitoring New NN metrics Size of pending DN message queues Seconds since the standby NN last read from shared edit log DN block report lag All measurements of standby NN lag – monitor/alert on all of these Monitor shared storage solution Volumes fill up, disks go bad, etc Should configure paranoid edit log retention policy (default is 2) Canary-based monitoring of HDFS a good idea Pinging both NNs not sufficient 15
  • 16. Ops/Admin: Hardware Active/Standby NNs should be on separate racks Shared storage system should be on separate rack Active/Standby NNs should have close to the same hardware Same amount of RAM – need to store the same things Same # of processors - need to serve same number of clients All the same recommendations still apply for NN ECC memory, 48GB Several separate disks for NN metadata directories Redundant disks for OS drives, probably RAID 5 or mirroring Redundant power 16
  • 17. Future Work Other options to share NN metadata BookKeeper Multiple, potentially non-HA filers Entirely different metadata system More advanced client failover/load shedding Serve stale reads from the standby NN Speculative RPC Non-RPC clients (IP failover, DNS failover, proxy, etc.) Even Higher HA Multiple standby NNs 17
  • 18. QA Detailed design (HDFS-1623) Community effort HDFS-1971, 1972, 1973, 1974, 1975, 2005, 2064, 1073 18

Notas do Editor

  1. Data – can I read what I wrote, is the service availableWhen I asked one of the original authors of of GFS if there were any decisions they would revist – random writersSimplicity is keyRaw disk – fs take time to stabilize – we can take advantage of ext4, xfs or zfs