SlideShare uma empresa Scribd logo
1 de 40
Hadoop Cluster HA
的經驗分享
Etu 韓祖棻
jerryhan@etusolution.com
Who am I
韓祖棻 Jerry – Etu 技術經理
• Database Management
• Windows/Linux Application Developer
• Web Developer
• Developer of Etu
jerryhan@etusolution.com

2
Agenda
• Background
•
•
•
•
•
•

Facebook Namenode High Availability
Hadoop 1.0 Namenode High Availability
Hortonworks High Availability
Cloudera High Availability
Etu Appliance High Availability
Conclusion

3
Background

4
The Hadoop Ecosystem
ETL Tools

BI Reporting

RDBMS

Zookeeper

Pig

HiveQL

Mahout

MapReduce

Data Store
Hive Meta Store

HBase

Avro (Serialization)

Data Processing Layer

HDFS ( Hadoop Distributed File System)

5
HDFS Architecture (Master/Slave)
Metadata ops

Namenode

HDFS cluster consists of a
single Namenode.

Client
Metadata ops
Read

Datanodes

Metadata(Name, replicas..)
(/var/disk/data, 1..

Block ops
Datanodes
The Namenode was a
sing point of failure (SPOF)
B
inreplication
an HDFS Cluster.

Blocks
Write
Rack1

Rack2

Client

6
Facebook Namenode
High Availability

7
AvatarNode

8
Hadoop 1.0 Namenode
High Availability

9
Backup Namenode Approach

• Use case 3f:
– Active running, Standby down for maintenance. Active dies and cannot start.
Standby is started and takes over as active.

10
Hortonworks
High Availability

11
HDPs Full-Stack HA Architecture

12
HA for HDFS NameNode Using VMware

Do not use the NameNode VM for running any
other master daemon.

13
HA for Hadoop Using RHEL (v5.x, v6.x)

14
Cloudera
High Availability

15
Shared Storage Using NFS (After CDH 4.0)
ZK

ZK

ZK
Heartbeat

Heartbeat

FailoverController
Active

FailoverController
Standby

SPOF

Monitor Health
of NN. OS, HW

Monitor Health
of NN. OS, HW

NN
Active

Shared NN state
with single writer
(fencing)

DN

DN

NN
Standby

DN
16
Quorum-based Storage (After CDH 4.1)
ZK

ZK

ZK

Heartbeat

Heartbeat

FailoverController
Standby

FailoverController
Active
Journal Nodes

Monitor Health
of NN. OS, HW

JN

NN
Active

QJM

DN
JN

JN

QJM

NN
Standby

DN

DN
JN

Monitor Health
of NN. OS, HW

JN

JN
17
Etu Appliance
High Availability

18
Summarize previous solutions
Auto
Failover
X
X

Namenode
Namenode

External
Storage
○
○

Vmware (*1)

○

Namenode

○

RHEL (*2)

○

System-wide

○

Shared Storage

○

Namenode(*3)

Optional

Quorum-based Storage

○

Namenode (*3)

Optional

Solution
Facebook
Apache Hadoop 1.0

Avatar Node
Backup Namenode

HA Type

Hortonworks

Cloudera
(Apache Hadoop 2.X)

1. 2 ESX Servers + SAN Arch. (vSphere HA Cluster)
2. RHEL Cluster HA and Power Fencing Device
3. Implementing the Fencing Method for System-wide HA.

19
Two Roles

Master node

Worker

Worker
Master node

Worker

20
Services on Master and Workers
Master

Hadoop Ecosystem
Services

System Services

Worker

Name Node
Job Tracker
HBase Master
Zookeeper (Leader)
Hive

Data Node
Task Tracker
Region Server
Zookeeper

MySQL/PostgreSQL
Kerberos
NTP Server
Syslog

Syslog

21
HA Architecture (Active/Standby)

Active
Etu Master
Service
Disablement

Cluster-ware
Big-Data Services
Failover

Standby
Etu Master
Service
Enablement

Synchronised
File System
Heartbeat

Heartbeat

Orchestration Services
(fully redundant network)

Etu Worker

Etu Worker

Etu Worker

22
HA based on CDH4.0.1
ZK

ZK

ZK
Heartbeat

Heartbeat

FailoverController
Standby

FailoverController
Active
Monitor Health
of NN. OS, HW

Monitor Health
of NN. OS, HW
Synchronized
File System

NN
Active

DN

DN

NN
Standby

DN
23
Data Synchronization
• Hadoop ecosystem
– Configurations are stored in Zookeeper
– Hive meta data is stored in PostgreSQL

• PostgreSQL
– Using PostgreSQL Replication

• User data
• System configurations or data
– PostgreSQL, Kerberos, NTP server, Syslog

24
Requirements
ZK Leader

Active Master

Worker

Worker

ZK

ZK

Standby Master

-

HDFS Service is Running in Active Master
Zookeeper Cluster is ready
Standby Master is ready to activate High
Availability service

25
Failover Scenario
Active Master

Worker

Worker
Standby Master

-

Active Namenode service failure
Active Namenode JVM failure
Active ZKFC service failure
Etu Active Master OS failure
Etu Active Master machine power failure
Failure of NIC cards on the Etu Active
Master machine
Network failure for the Etu Active Master
machine

Worker

26
Design Details – Enabling HA
Namenode

JT, HMaster, …

Active Master
Kerberos, NTP, …

DB Replication

Namenode

Standby Master

edit logs

FC

FC
Kerberos, NTP, Syslog,…

1. Stopping services dependent on HDFS.
(JobTracker, HMaster, …)
2. Stopping Namenode and Datanode services.
3. Configuring HDFS and FC service.
4. Creating Synchronized File System.
5. Initializing Synchronized File System
for share edit logs.
6. Starting Active FC service.

7. Initializing Standby Master.
8. Starting Standby FC service.
9. Synchronizing system configurations
and data.
10. Starting Active Namenode and
Datanode services.
11. Starting Standby Namenode and
Datanode services.
12. Checking Services Status.
13. Starting services dependent on HDFS.
(JobTracker, HMaster, …)

27
Design Details - Failover
Fencing
Namenode

JT, HMaster, …

DB Replication

Active Master
Kerberos, NTP, …

Namenode
Namenode

JT, HMaster, …

Standby Master
Active Master

edit logs

Kerberos, NTP, …

FC

FC

Kerberos, NTP, Syslog,…

1. Fencing Active Master from Standby Master
a. Stopping network service.
b. Stopping Hadoop related services.
c. Stopping system services.
d. Configuring network environment.
e. Removing default services.
2. Stopping Standby FC service.
3. Stopping Standby Namenode service.
4. Removing Synchronized File System .
5. Removing DB Replication.

7. Transition Standby Master to Active Master.
a. Stopping network service.
b. Stopping system services.
c. Configuring network environment.
d. Configuring host information.
e. Configuring system services.
f. Starting network service.
g. Starting System services.
8. Configuring Hadoop related services.
9. Starting Namenode and Datanode services.
10. Starting Hadoop related services.

28
Use case Active Namenode maintenance
Active Master

-

Stop NN
Restart NN

Worker

Worker

Standby Master

Worker

29
Use case Standby Master failure
Active Master

Worker

Worker

Standby Master

-

OS failure
Power failure
Failure of NICs
Network failure

Worker

30
Use case Cluster power failure
Active Master

Worker

Worker

Standby Master

Worker

31
Use case Cluster network failure
Active Master

Worker

Worker

Standby Master

Worker

32
Demo –
Non-HA (VM002)

Activating HA with One-Click

33
Demo –
Activating (VM002 --- VM007)

34
Demo –
Activating Done (VM002 – VM007)

35
Demo –
Failover (VM002 –> VM007)

36
Demo –
Failover Done (VM007)

37
Conclusion
• Leveraging Synchronized File System to share
Namenode edit logs, and system data between
Masters.
• Implements improved fencing method to handle
failover.
• Providing system-wide high availability, not only
for Hadoop Name Node Service.

38
Reference
• Hadoop 1.0.4 Documentation
– http://hadoop.apache.org/docs/stable/index.html
– https://issues.apache.org/jira/secure/attachment/12480489/N
ameNode%20HA_v2_1.pdf

• Hadoop 2.0.3-alpha Documentation
– http://hadoop.apache.org/docs/r2.0.3-alpha/index.html

• Hadoop AvatarNode High Availability
– http://hadoopblog.blogspot.tw/2010/02/hadoop-namenodehigh-availability.html

• Hortonworks Data Platform
– http://hortonworks.com/products/hortonworksdataplatform/
– http://www.vmware.com/files/pdf/Apache-Hadoop-VMwareHA-solution.pdf

39
Reference
• CDH4.2.0 Documentation
– http://www.cloudera.com/content/support/en/documentation/
cdh4-documentation/cdh4-documentation-v4-latest.html

40

Mais conteúdo relacionado

Mais procurados

Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop AdministrationEdureka!
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopDataWorks Summit
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase clientShashwat Shriparv
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterAltoros
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop ClusterEdureka!
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxAlex Moundalexis
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configurationprabakaranbrick
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learnedtcurdt
 

Mais procurados (20)

Hadoop2.2
Hadoop2.2Hadoop2.2
Hadoop2.2
 
ha_module5
ha_module5ha_module5
ha_module5
 
Hadoop administration
Hadoop administrationHadoop administration
Hadoop administration
 
Learn Hadoop Administration
Learn Hadoop AdministrationLearn Hadoop Administration
Learn Hadoop Administration
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Hadoop admin
Hadoop adminHadoop admin
Hadoop admin
 
Optimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for HadoopOptimizing your Infrastrucure and Operating System for Hadoop
Optimizing your Infrastrucure and Operating System for Hadoop
 
Configure h base hadoop and hbase client
Configure h base hadoop and hbase clientConfigure h base hadoop and hbase client
Configure h base hadoop and hbase client
 
How to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop ClusterHow to Increase Performance of Your Hadoop Cluster
How to Increase Performance of Your Hadoop Cluster
 
Administer Hadoop Cluster
Administer Hadoop ClusterAdminister Hadoop Cluster
Administer Hadoop Cluster
 
Improving Hadoop Performance via Linux
Improving Hadoop Performance via LinuxImproving Hadoop Performance via Linux
Improving Hadoop Performance via Linux
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Hadoop cluster configuration
Hadoop cluster configurationHadoop cluster configuration
Hadoop cluster configuration
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 

Semelhante a [OSDC 2013] Hadoop Cluster HA 的經驗分享

Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanNarayana B
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldUwe Printz
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldRichard McDougall
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)outstanding59
 
Dynamic Hadoop Clusters
Dynamic Hadoop ClustersDynamic Hadoop Clusters
Dynamic Hadoop ClustersSteve Loughran
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosEdureka!
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Mandakini Kumari
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)Prashant Gupta
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Sumeet Singh
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseSankar H
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nageSantosh Nage
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationScott Miao
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopJim Dowling
 
70-410 Practice Test
70-410 Practice Test70-410 Practice Test
70-410 Practice Testwrailebo
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentationAmrut Patil
 

Semelhante a [OSDC 2013] Hadoop Cluster HA 的經驗分享 (20)

Hadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_PlanHadoop Architecture_Cluster_Cap_Plan
Hadoop Architecture_Cluster_Cap_Plan
 
Hadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the fieldHadoop Operations - Best practices from the field
Hadoop Operations - Best practices from the field
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
Dynamic Hadoop Clusters
Dynamic Hadoop ClustersDynamic Hadoop Clusters
Dynamic Hadoop Clusters
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Secure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With KerberosSecure Hadoop Cluster With Kerberos
Secure Hadoop Cluster With Kerberos
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
 
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive WarehouseDisaster Recovery and Cloud Migration for your Apache Hive Warehouse
Disaster Recovery and Cloud Migration for your Apache Hive Warehouse
 
Hadoop installation by santosh nage
Hadoop installation by santosh nageHadoop installation by santosh nage
Hadoop installation by santosh nage
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for Hadoop
 
70-410 Practice Test
70-410 Practice Test70-410 Practice Test
70-410 Practice Test
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 

Último

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

[OSDC 2013] Hadoop Cluster HA 的經驗分享

  • 1. Hadoop Cluster HA 的經驗分享 Etu 韓祖棻 jerryhan@etusolution.com
  • 2. Who am I 韓祖棻 Jerry – Etu 技術經理 • Database Management • Windows/Linux Application Developer • Web Developer • Developer of Etu jerryhan@etusolution.com 2
  • 3. Agenda • Background • • • • • • Facebook Namenode High Availability Hadoop 1.0 Namenode High Availability Hortonworks High Availability Cloudera High Availability Etu Appliance High Availability Conclusion 3
  • 5. The Hadoop Ecosystem ETL Tools BI Reporting RDBMS Zookeeper Pig HiveQL Mahout MapReduce Data Store Hive Meta Store HBase Avro (Serialization) Data Processing Layer HDFS ( Hadoop Distributed File System) 5
  • 6. HDFS Architecture (Master/Slave) Metadata ops Namenode HDFS cluster consists of a single Namenode. Client Metadata ops Read Datanodes Metadata(Name, replicas..) (/var/disk/data, 1.. Block ops Datanodes The Namenode was a sing point of failure (SPOF) B inreplication an HDFS Cluster. Blocks Write Rack1 Rack2 Client 6
  • 9. Hadoop 1.0 Namenode High Availability 9
  • 10. Backup Namenode Approach • Use case 3f: – Active running, Standby down for maintenance. Active dies and cannot start. Standby is started and takes over as active. 10
  • 12. HDPs Full-Stack HA Architecture 12
  • 13. HA for HDFS NameNode Using VMware Do not use the NameNode VM for running any other master daemon. 13
  • 14. HA for Hadoop Using RHEL (v5.x, v6.x) 14
  • 16. Shared Storage Using NFS (After CDH 4.0) ZK ZK ZK Heartbeat Heartbeat FailoverController Active FailoverController Standby SPOF Monitor Health of NN. OS, HW Monitor Health of NN. OS, HW NN Active Shared NN state with single writer (fencing) DN DN NN Standby DN 16
  • 17. Quorum-based Storage (After CDH 4.1) ZK ZK ZK Heartbeat Heartbeat FailoverController Standby FailoverController Active Journal Nodes Monitor Health of NN. OS, HW JN NN Active QJM DN JN JN QJM NN Standby DN DN JN Monitor Health of NN. OS, HW JN JN 17
  • 19. Summarize previous solutions Auto Failover X X Namenode Namenode External Storage ○ ○ Vmware (*1) ○ Namenode ○ RHEL (*2) ○ System-wide ○ Shared Storage ○ Namenode(*3) Optional Quorum-based Storage ○ Namenode (*3) Optional Solution Facebook Apache Hadoop 1.0 Avatar Node Backup Namenode HA Type Hortonworks Cloudera (Apache Hadoop 2.X) 1. 2 ESX Servers + SAN Arch. (vSphere HA Cluster) 2. RHEL Cluster HA and Power Fencing Device 3. Implementing the Fencing Method for System-wide HA. 19
  • 21. Services on Master and Workers Master Hadoop Ecosystem Services System Services Worker Name Node Job Tracker HBase Master Zookeeper (Leader) Hive Data Node Task Tracker Region Server Zookeeper MySQL/PostgreSQL Kerberos NTP Server Syslog Syslog 21
  • 22. HA Architecture (Active/Standby) Active Etu Master Service Disablement Cluster-ware Big-Data Services Failover Standby Etu Master Service Enablement Synchronised File System Heartbeat Heartbeat Orchestration Services (fully redundant network) Etu Worker Etu Worker Etu Worker 22
  • 23. HA based on CDH4.0.1 ZK ZK ZK Heartbeat Heartbeat FailoverController Standby FailoverController Active Monitor Health of NN. OS, HW Monitor Health of NN. OS, HW Synchronized File System NN Active DN DN NN Standby DN 23
  • 24. Data Synchronization • Hadoop ecosystem – Configurations are stored in Zookeeper – Hive meta data is stored in PostgreSQL • PostgreSQL – Using PostgreSQL Replication • User data • System configurations or data – PostgreSQL, Kerberos, NTP server, Syslog 24
  • 25. Requirements ZK Leader Active Master Worker Worker ZK ZK Standby Master - HDFS Service is Running in Active Master Zookeeper Cluster is ready Standby Master is ready to activate High Availability service 25
  • 26. Failover Scenario Active Master Worker Worker Standby Master - Active Namenode service failure Active Namenode JVM failure Active ZKFC service failure Etu Active Master OS failure Etu Active Master machine power failure Failure of NIC cards on the Etu Active Master machine Network failure for the Etu Active Master machine Worker 26
  • 27. Design Details – Enabling HA Namenode JT, HMaster, … Active Master Kerberos, NTP, … DB Replication Namenode Standby Master edit logs FC FC Kerberos, NTP, Syslog,… 1. Stopping services dependent on HDFS. (JobTracker, HMaster, …) 2. Stopping Namenode and Datanode services. 3. Configuring HDFS and FC service. 4. Creating Synchronized File System. 5. Initializing Synchronized File System for share edit logs. 6. Starting Active FC service. 7. Initializing Standby Master. 8. Starting Standby FC service. 9. Synchronizing system configurations and data. 10. Starting Active Namenode and Datanode services. 11. Starting Standby Namenode and Datanode services. 12. Checking Services Status. 13. Starting services dependent on HDFS. (JobTracker, HMaster, …) 27
  • 28. Design Details - Failover Fencing Namenode JT, HMaster, … DB Replication Active Master Kerberos, NTP, … Namenode Namenode JT, HMaster, … Standby Master Active Master edit logs Kerberos, NTP, … FC FC Kerberos, NTP, Syslog,… 1. Fencing Active Master from Standby Master a. Stopping network service. b. Stopping Hadoop related services. c. Stopping system services. d. Configuring network environment. e. Removing default services. 2. Stopping Standby FC service. 3. Stopping Standby Namenode service. 4. Removing Synchronized File System . 5. Removing DB Replication. 7. Transition Standby Master to Active Master. a. Stopping network service. b. Stopping system services. c. Configuring network environment. d. Configuring host information. e. Configuring system services. f. Starting network service. g. Starting System services. 8. Configuring Hadoop related services. 9. Starting Namenode and Datanode services. 10. Starting Hadoop related services. 28
  • 29. Use case Active Namenode maintenance Active Master - Stop NN Restart NN Worker Worker Standby Master Worker 29
  • 30. Use case Standby Master failure Active Master Worker Worker Standby Master - OS failure Power failure Failure of NICs Network failure Worker 30
  • 31. Use case Cluster power failure Active Master Worker Worker Standby Master Worker 31
  • 32. Use case Cluster network failure Active Master Worker Worker Standby Master Worker 32
  • 33. Demo – Non-HA (VM002) Activating HA with One-Click 33
  • 34. Demo – Activating (VM002 --- VM007) 34
  • 35. Demo – Activating Done (VM002 – VM007) 35
  • 36. Demo – Failover (VM002 –> VM007) 36
  • 38. Conclusion • Leveraging Synchronized File System to share Namenode edit logs, and system data between Masters. • Implements improved fencing method to handle failover. • Providing system-wide high availability, not only for Hadoop Name Node Service. 38
  • 39. Reference • Hadoop 1.0.4 Documentation – http://hadoop.apache.org/docs/stable/index.html – https://issues.apache.org/jira/secure/attachment/12480489/N ameNode%20HA_v2_1.pdf • Hadoop 2.0.3-alpha Documentation – http://hadoop.apache.org/docs/r2.0.3-alpha/index.html • Hadoop AvatarNode High Availability – http://hadoopblog.blogspot.tw/2010/02/hadoop-namenodehigh-availability.html • Hortonworks Data Platform – http://hortonworks.com/products/hortonworksdataplatform/ – http://www.vmware.com/files/pdf/Apache-Hadoop-VMwareHA-solution.pdf 39
  • 40. Reference • CDH4.2.0 Documentation – http://www.cloudera.com/content/support/en/documentation/ cdh4-documentation/cdh4-documentation-v4-latest.html 40