2. Outline
• Overview of Apache Hadoop
• Fault Tolerant features in
• HDFS layer
• MapReduce layer
• Summary
• Questions
3. Apache Hadoop
• Apache Hadoop's MapReduce and HDFS
components originally derived from
• Google File System (GFS)
1
– 2003
• Google's MapReduce2
- 2004
• Data is broken in splits that are processed in
different machines.
• Industry wide standard for processing Big Data.
1 - http://research.google.com/archive/gfs.html
2 - http://research.google.com/archive/mapreduce.html
4. Fault Tolerance in Hadoop
• One of the primary reasons to use Hadoop is due to its high
degree of fault tolerance.
• Individual nodes or network components may experience high
rates of failure.
• Hadoop can guide jobs toward a successful completion.
5. Overview of Hadoop
• Basic components of Hadoop are:
• Map Reduce Layer
• Job tracker (master) -which coordinates the execution of jobs;
• Task trackers (slaves)- which control the execution of map and reduce tasks in the
machines that do the processing;
• HDFS Layer- which stores files.
• Name Node (master)- manages the file system, keeps metadata for all the files and
directories in the tree
• Data Nodes (slaves)- work horses of the file system. Store and retrieve blocks when they
are told to ( by clients or name node ) and report back to name node periodically
6. Overview of Hadoop contd.
Job Tracker - coordinates the
execution of jobs
Task Tracker – control the
execution of map and reduce
tasks in slave machines
Data Node – Follow the
instructions from name node
- stores, retrieves data
Name Node – Manages the
file system, keeps metadata
9. Fault Tolerance in HDFS layer
• Hardware failure is the norm rather than the exception
• Detection of faults and quick, automatic recovery from them is a
core architectural goal of HDFS.
• Master Slave Architecture with NameNode (master) and DataNode
(slave)
• Common types of failures
• NameNode failures
• DataNode failures
10. Handling Data Node Failure
• Each DataNode sends a Heartbeat message to the NameNode
periodically
• If the namenode does not receive a heartbeat from a particular data
node for 10 minutes, then it considers that data node to be dead/out
of service.
• Name Node initiates replication of blocks which were hosted on that
data node to be hosted on some other data node.
11. Handling Name Node Failure
• Single Name Node per cluster.
• Prior to Hadoop 2.0.0, the NameNode was a single point of failure
(SPOF) in an HDFS cluster.
• If NameNode becomes unavailable, the cluster as a whole would be
unavailable
• NameNode has to be restarted
• Brought up on a separate machine.
12. HDFS High Availability
• Provides an option of
running two redundant
NameNodes in the same
cluster
• Active/Passive configuration
with a hot standby.
• Fast failover to a new
NameNode in the case that
a machine crashes
• Graceful administrator-
initiated failover for the
purpose of planned
maintenance.
14. Classic MapReduce (v1)
• Job Tracker
• Manage Cluster Resources and Job
Scheduling
• Task Tracker
• Per-node agent
• Manage Tasks
• Jobs can fail
• While running the task ( Task Failure )
• Task Tracker failure
• Job Tracker failure
15. Handling Task Failure
• User code bug in map/reduce
• Throws a RunTimeException
• Child JVM reports a failure back to the parent task tracker before it exits.
• Sudden exit of the child JVM
• Bug that causes the JVM to exit for the conditions exposed by map/reduce
code.
• Task tracker marks the task attempt as failed, makes room available
to another task.
16. Task Tracker Failure
• Task tracker will stop sending the heartbeat to the Job Tracker
• Job Tracker notices this failure
• Hasn’t received a heart beat from 10 mins
• Can be configured via mapred.tasktracker.expiry.interval
property
• Job Tracker removes this task from the task pool
• Rerun the Job even if map task has ran completely
• Intermediate output resides in the failed task trackers local file system which
is not accessible by the reduce tasks.
17. Job Tracker Failure
• This is serious than the other two modes of failure.
• Single point of failure.
• In this case all jobs will fail.
• After restarting Job Tracker all the jobs running at the time of the
failure needs to be resubmitted.
• Good News – YARN has eliminated this.
• One of YARN’s design goals is to eliminate single point of failure in Map
Reduce.
18. YARN - Yet Another Resource Negotiator
• Next version of MapReduce or MapReduce 2.0 (MRv2)
• In 2010 group at Yahoo! Began to design the next generation of MR
19. YARN architecture
• Resource Manager
• Central Agent – Manages and
allocates cluster resources
• Node Manager
• Per-node agent – Manages and
enforces node resource allocations
• Application Master
• Per Application
• Manages application life cycle and
task scheduling
20. YARN – Resource Manager Failure
• After a crash a new Resource Manager instance needs to brought up (
by an administrator )
• It recovers from saved state
• State consists of
• node managers in the systems
• running applications
• State to manage is much more manageable than that of Job Tracker.
• Tasks are not part of Resource Managers state.
• They are handled by the application master.
21. Summary
• Overview of Hadoop MapReduce
• Discussed Fault Tolerant features of
• HDFS
• MapReduce v1
• MapReducev2 - YARN (briefly)
22. References
• Google MapReduce paper
• MapReduce: Simplified Data Processing on Large Clusters(Jeffrey Dean and Sanjay
Ghemawat Google, Inc. 2004)
• Apache Hadoop Web site (http://hadoop.apache.org/)
• Yahoo Developer Network Tutorial on MapReduce
(https://developer.yahoo.com/hadoop/tutorial/module4.html)
• Hadoop The Definitive Guide( Tom White - O’REILLY )
• Cloudera blog (http://blog.cloudera.com/blog/2011/02/hadoop-availability/)
• HortonWorks YARN Tutorial(http://hortonworks.com/hadoop/yarn/)
25. More Fault Tolerant features provided in
HDFS
• Rack awareness
• Take a node's physical location into account while scheduling tasks and
allocating storage.
• NameNode tries to place replicas of block on multiple racks for improved fault
tolerance
• Upgrade and rollback
• After a software upgrade, it is possible to rollback to HDFS' state before the
upgrade in case of unexpected problems.
• Secondary NameNode
• Performs periodic checkpoints of the namespace and helps keep the size of
file containing log of HDFS modifications within certain limits at the
NameNode.
Notas do Editor
Talk about the articles I read.
Apache Hadoop Web site
HDFS architecture guide
MapReduce original paper from Google
Task tracker crashing or running very slowly
Stop or send the heart beat very infrequently
Job tracker removes this task from the pool of task trackers
Author mentions this possibility is quite low, because of a particular machine failing is low. Can’t agree with that though.
Scalability bottlenecks in the case of very large number of nodes ( > 4000 )
Splitting the responsibilities of Job Tracker in to separate entities.
The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM).
Tasks are not part of resource manager’s state. Since they are managed by the application master. The amount of state to be managed is much more manageable than that of the Job Tracker.
When discussing Hadoop availability people often start with the NameNode since it is a single point of failure (SPOF) in HDFS, and most components in the Hadoop ecosystem (MapReduce, Apache HBase, Apache Pig, Apache Hive etc.) rely on HDFS directly, and are therefore limited by its availability. However, Hadoop availability is a larger, more general issue, so it’s helpful to establish some context before diving in.
Availability is the proportion of time a system is functioning [1], which is commonly referred to as “uptime” (vs downtime, when the system is not functioning).
Note that availability is a stricter requirement than fault tolerance – the ability for a system to perform as designed and degrade gracefully in the presence of failures. A system that requires an hour to restart (eg for a configuration change or software upgrade) but has no single point of failure is fault tolerant but not highly available (HA). Adding redundancy in all SPOFs is a common way to improve fault tolerance, which helps [2], but is just a part of, improving Hadoop availability. Note also that fault tolerance is distinct from durability, even though the NameNode is a SPOF no single failure results in data loss as copies of NameNode persistent state (the image and edit log) are replicated both within and across hosts.
Availability is also often conflated with reliability. Reliability in distributed systems is a more general issue than availability [3]. A truly reliable distributed system must be highly available, fault tolerant, secure, scalable, and perform predictably, etc. I’ll limit this post to Hadoop availability.