SlideShare uma empresa Scribd logo
1 de 44
DO NOT USE PUBLICLY
    High Availability for the HDFS NameNode
                                     PRIOR TO 10/23/12
    Phase 2
    Headline Goes Here
    Aaron T. Myers and Todd Lipcon | Cloudera HDFS Team
    Speaker Name or Subhead Goes Here
    October 2012




1
Introductions / who we are
    • Software engineers on Cloudera’s HDFS engineering team
    • Committers/PMC Members for Apache Hadoop at ASF
    • Main developers on HDFS HA
          •   Responsible for ~80% of the code for all phases of HA
              development
    •   Have helped numerous customers setup and troubleshoot HA
        HDFS clusters this year


                                  ©2012 Cloudera, Inc. All Rights
2
                                           Reserved.
Outline
    • HDFS HA Phase 1
        • How did it work? What could it do?
        • What problems remained?
    • HDFS HA Phase 2: Automatic failover
    • HDFS HA Phase 2: Quorum Journal




                           ©2012 Cloudera, Inc. All Rights
3
                                    Reserved.
HDFS HA Phase 1 Review
    HDFS-1623: completed March 2012




4
HDFS HA Development Phase 1
    • Completed March 2012 (HDFS-1623)
    • Introduced the StandbyNode, a hot backup for the HDFS
      NameNode.
    • Relied on shared storage to synchronize namespace state
        •   (e.g. a NAS filer appliance)
    • Allowed operators to manually trigger failover to the Standby
    • Sufficient for many HA use cases: avoided planned downtime
      for hardware and software upgrades, planned machine/OS
      maintenance, configuration changes, etc.
                                  ©2012 Cloudera, Inc. All Rights
5
                                           Reserved.
HDFS HA Architecture Phase 1
    • Parallel block reports sent to Active and Standby NameNodes
    • NameNode state shared by locating edit log on NAS over NFS
    • Fencing of shared resources/data
          •   Critical that only a single NameNode is Active at any point in time
    •   Client failover done via client configuration
          •   Each client configured with the address of both NNs: try both to
              find active


                                   ©2012 Cloudera, Inc. All Rights
6
                                            Reserved.
HDFS HA Architecture Phase 1




                     ©2012 Cloudera, Inc. All Rights
7
                              Reserved.
Fencing and NFS
    •   Must avoid split-brain syndrome
          •   Both nodes think they are active and try to write to the same file. Your
              metadata becomes corrupt and requires manual intervention to restart
    •   Configure a fencing script
          •   Script must ensure that prior active has stopped writing
          •   STONITH: shoot-the-other-node-in-the-head
          •   Storage fencing: e.g using NetApp ONTAP API to restrict filer access
    •   Fencing script must succeed to have a successful failover


                                     ©2012 Cloudera, Inc. All Rights
8
                                              Reserved.
Shortcomings of Phase 1
    •   Insufficient to protect against unplanned downtime
          • Manual failover only: requires an operator to step in quickly after
            a crash
          • Various studies indicated this was the minority of downtime, but
            still important to address
    •   Requirement of a NAS device made deployment
        complex, expensive, and error-prone

    (we always knew this was just the first phase!)

                                        ©2012 Cloudera, Inc. All Rights
9
                                                 Reserved.
HDFS HA Development Phase 2
     •   Multiple new features for high availability
           •   Automatic failover, based on Apache ZooKeeper
           •   Remove dependency on NAS (network-attached storage)

     •   Address new HA use cases
           •   Avoid unplanned downtime due to software or hardware faults
           •   Deploy in filer-less environments
           •   Completely stand-alone HA with no external hardware or software
               dependencies
                 •   no Linux-HA, filers, etc

                                         ©2012 Cloudera, Inc. All Rights
10
                                                  Reserved.
Automatic Failover Overview
     HDFS-3042: completed May 2012




11
Automatic Failover Goals
     •   Automatically detect failure of the Active NameNode
           •   Hardware, software, network, etc.
     •   Do not require operator intervention to initiate failover
           •   Once failure is detected, process completes automatically
     •   Support manually initiated failover as first-class
           •   Operators can still trigger failover without having to stop Active
     •   Do not introduce a new SPOF
           •   All parts of auto-failover deployment must themselves be HA

                                    ©2012 Cloudera, Inc. All Rights
12
                                             Reserved.
Automatic Failover Architecture
     • Automatic failover requires ZooKeeper
         • Not required for manual failover
     • ZK makes it easy to:
         • Detect failure of Active NameNode
         • Determine which NameNode should become the Active NN




                          ©2012 Cloudera, Inc. All Rights
13
                                   Reserved.
Automatic Failover Architecture
     • Introduce new daemon in HDFS: ZooKeeper Failover Controller
     • In an auto failover deployment, run two ZKFCs
          • One per NameNode, on that NameNode machine
     • ZooKeeper Failover Controller (ZKFC) is responsible for:
          • Monitoring health of associated NameNode
          • Participating in leader election of NameNodes
          • Fencing the other NameNode if it wins election


                           ©2012 Cloudera, Inc. All Rights
14
                                    Reserved.
Automatic Failover Architecture




                       ©2012 Cloudera, Inc. All Rights
15
                                Reserved.
ZooKeeper Failover Controller Details
     •   When a ZKFC is started, it:
          • Begins checking the health of its associated NN via RPC
          • As long as the associated NN is healthy, attempts to create
            an ephemeral znode in ZK
          • One of the two ZKFCs will succeed in creating the znode
            and transition its associated NN to the Active state
          • The other ZKFC transitions its associated NN to the Standby
            state and begins monitoring the ephemeral znode

                               ©2012 Cloudera, Inc. All Rights
16
                                        Reserved.
What happens when…
     • … a NameNode process crashes?
         • Associated ZKFC notices the health failure of the NN and
           quits from active/standby election by removing znode
     • … a whole NameNode machine crashes?
         • ZKFC process crashes with it and the ephemeral znode is
           deleted from ZK



                             ©2012 Cloudera, Inc. All Rights
17
                                      Reserved.
What happens when…
     • … the two NameNodes are partitioned from each other?
         • Nothing happens: Only one will still have the znode
     • … ZooKeeper crashes (or down for upgrade)?
         • Nothing happens: active stays active




                            ©2012 Cloudera, Inc. All Rights
18
                                     Reserved.
Fencing Still Required with ZKFC
     • Tempting to think ZooKeeper means no need for fencing
     • Consider the following scenario:
         • Two NameNodes: A and B, each with associated ZKFC
         • ZKFC A process crashes, ephemeral znode removed
         • NameNode A process is still running
         • ZKFC B notices znode removed
         • ZKFC B wants to transition NN B to Active, but without
           fencing NN A, both NNs would be active simultaneously
                            ©2012 Cloudera, Inc. All Rights
19
                                     Reserved.
Auto-failover recap
     •   New daemon ZooKeeperFailoverController monitors the
         NameNodes
           • Automatically triggers fail-overs
           • No need for operator intervention




             Fencing and dependency on NFS storage still a pain


                             ©2012 Cloudera, Inc. All Rights
20
                                      Reserved.
Removing the NAS dependency
     HDFS-3077: completed October 2012




21
Shared Storage in HDFS HA
     •   The Standby NameNode synchronizes the namespace by
         following the Active NameNode’s transaction log
           • Each operation (eg mkdir(/foo)) is written to the log by the Active
           • The StandbyNode periodically reads all new edits and applies
             them to its own metadata structures
     •   Reliable shared storage is required for correct operation




                                  ©2012 Cloudera, Inc. All Rights
22
                                           Reserved.
Shared Storage in “Phase 1”
     • Operator configures a traditional shared storage device (eg SAN
       or NAS)
     • Mount the shared storage via NFS on both Active and Standby
       NNs
     • Active NN writes to a directory on NFS, while Standby reads it




                             ©2012 Cloudera, Inc. All Rights
23
                                      Reserved.
Shortcomings of NFS-based approach
     •   Custom hardware
           •   Lots of our customers don’t have SAN/NAS available in their datacenter
           •   Costs money, time and expertise
           •   Extra “stuff” to monitor outside HDFS
           •   We just moved the SPOF, didn’t eliminate it!
     •   Complicated
           •   Storage fencing, NFS mount options, multipath networking, etc
           •   Organizationally complicated: dependencies on storage ops team
     •   NFS issues
           •   Buggy client implementations, little control over timeout behavior, etc
                                      ©2012 Cloudera, Inc. All Rights
24
                                               Reserved.
Primary Requirements for Improved Storage
     • No special hardware (PDUs, NAS)
     • No custom fencing configuration
          •   Too complicated == too easy to misconfigure
     •   No SPOFs
          • punting to filers isn’t a good option
          • need something inherently distributed




                                  ©2012 Cloudera, Inc. All Rights
25
                                           Reserved.
Secondary Requirements
     •   Configurable failure toleration
           •   Configure N nodes to tolerate (N-1)/2
     •   Making N bigger (within reasonable bounds) shouldn’t hurt
         performance. Implies:
           • Writes done in parallel, not pipelined
           • Writes should not wait on slowest replica
     •   Locate replicas on existing hardware investment (eg share with
         JobTracker, NN, SBN)

                                   ©2012 Cloudera, Inc. All Rights
26
                                            Reserved.
Operational Requirements
     •   Should be operable by existing Hadoop admins. Implies:
           • Same metrics system (“hadoop metrics”)
           • Same configuration system (xml)
           • Same logging infrastructure (log4j)
           • Same security system (Kerberos-based)
     • Allow existing ops to easily deploy and manage the new feature
     • Allow existing Hadoop tools to monitor the feature
           •   (eg Cloudera Manager, Ganglia, etc)

                                   ©2012 Cloudera, Inc. All Rights
27
                                            Reserved.
Our solution: QuorumJournalManager
     •   QuorumJournalManager (client)
           • Plugs into JournalManager abstraction in NN (instead of existing
             FileJournalManager)
           • Provides edit log storage abstraction
     •   JournalNode (server)
           • Standalone daemon running on an odd number of nodes
           • Provides actual storage of edit logs on local disks
           • Could run inside other daemons in the future


                                 ©2012 Cloudera, Inc. All Rights
28
                                          Reserved.
Architecture




                    ©2012 Cloudera, Inc. All Rights
29
                             Reserved.
Commit protocol
     • NameNode accumulates edits locally as they are logged
     • On logSync(), sends accumulated batch to all JNs via Hadoop
       RPC
     • Waits for success ACK from a majority of nodes
         • Majority commit means that a single lagging or crashed replica
           does not impact NN latency
         • Latency @ NN = median(Latency @ JNs)



                               ©2012 Cloudera, Inc. All Rights
30
                                        Reserved.
JN Fencing
     • How do we prevent split-brain?
     • Each instance of QJM is assigned a unique epoch number
         •   provides a strong ordering between client NNs
         •   Each IPC contains the client’s epoch
         •   JN remembers on disk the highest epoch it has seen
         •   Any request from an earlier epoch is rejected. Any from a newer
             one is recorded on disk
         •   Distributed Systems folks may recognize this technique from
             Paxos and other literature

                                 ©2012 Cloudera, Inc. All Rights
31
                                          Reserved.
Fencing with epochs
     • Fencing is now implicit
     • The act of becoming active causes any earlier active NN to be
       fenced out
           •   Since a quorum of nodes has accepted the new active, any other
               IPC by an earlier epoch number can’t get quorum
     •   Eliminates confusing and error-prone custom fencing
         configuration


                                  ©2012 Cloudera, Inc. All Rights
32
                                           Reserved.
Segment recovery
     • In normal operation, a minority of JNs may be out of sync
     • After a crash, all JNs may have different numbers of txns (last batch
       may or may not have arrived at each)
           •   eg JN1 was down, JN2 crashed right before NN wrote txnid 150:
                 • JN1: has no edits
                 • JN2: has edits 101-149
                 • JN3: has edits 101-150
     •   Before becoming active, we need to come to consensus on this last
         batch: was it committed or not?
           •   Use the well-known Paxos algorithm to solve consensus

                                     ©2012 Cloudera, Inc. All Rights
33
                                              Reserved.
Other implementation features
     •   Hadoop Metrics
           • lag, percentile latencies, etc from perspective of JN, NN
           • metrics for queued txns, % of time each JN fell behind, etc, to
             help suss out a slow JN before it causes problems
     •   Security
           •   full Kerberos and SSL support: edits can be optionally encrypted
               in-flight, and all access is mutually authenticated



                                   ©2012 Cloudera, Inc. All Rights
34
                                            Reserved.
Testing
     •   Randomized fault test
           • Runs all communications in a single thread with deterministic
             order and fault injections based on a seed
           • Caught a number of really subtle bugs along the way
           • Run as an MR job: 5000 fault tests in parallel
           • Multiple CPU-years of stress testing: found 2 bugs in Jetty!
     •   Cluster testing: 100-node, MR, HBase, Hive, etc
           •   Commit latency in practice: within same range as local disks
               (better than one of two local disks, worse than the other one)

                                   ©2012 Cloudera, Inc. All Rights
36
                                            Reserved.
Deployment and Configuration
     •   Most customers running 3 JNs (tolerate 1 failure)
           •   1 on NN, 1 on SBN, 1 on JobTracker/ResourceManager
           •   Optionally run 2 more (eg on bastion/gateway nodes) to tolerate 2
               failures
     •   Configuration:
           •   dfs.namenode.shared.edits.dir:
               qjournal://nn1.company.com:8485,nn2.company.com:8485,jt.company.
               com:8485/my-journal
           •   dfs.journalnode.edits.dir:
               /data/1/hadoop/journalnode/
           •   dfs.ha.fencing.methods:
               shell(/bin/true)    (fencing not required!)
                                     ©2012 Cloudera, Inc. All Rights
37
                                              Reserved.
Status
     • Merged into Hadoop development trunk in early October
     • Available in CDH4.1
     • Deployed at several customer/community sites with good
       success so far
         •   Planned rollout to 20+ production HBase clusters within the
             month




                                 ©2012 Cloudera, Inc. All Rights
38
                                          Reserved.
Conclusion




39
HA Phase 2 Improvements
     • Run an active NameNode and a hot Standby NameNode
     • Automatically triggers seamless failover using Apache
       ZooKeeper
     • Stores shared metadata on QuorumJournalManager: a fully
       distributed, redundant, low latency journaling system.

     •   All improvements available now in HDFS trunk and CDH4.1


                              ©2012 Cloudera, Inc. All Rights
40
                                       Reserved.
41
Backup Slides




42
Why not BookKeeper?
     •   Pipelined commit instead of quorum commit
           •   Unpredictable latency
     • Research project
     • Not “Hadoopy”
           •   Their own IPC system, no security, different configuration, no
               metrics
     •   External
           •   Feels like “two systems” to ops/deployment instead of just one
     •   Nevertheless: it’s pluggable and BK is an additional option.
                                    ©2012 Cloudera, Inc. All Rights
43
                                             Reserved.
Epoch number assignment
     •   On startup:
           •   NN -> JN: getEpochInfo()
                 •   JN: respond with current promised epoch
           • NN: set epoch = max(promisedEpoch) + 1
           • NN -> JN: newEpoch(epoch)
                 •   JN: if it is still higher than promisedEpoch, remember it and
                     ACK, otherwise NACK
           •   If NN receives ACK from a quorum of nodes, then it has uniquely
               claimed that epoch
                                       ©2012 Cloudera, Inc. All Rights
44
                                                Reserved.

Mais conteúdo relacionado

Mais procurados

Tuning DB2 in a Solaris Environment
Tuning DB2 in a Solaris EnvironmentTuning DB2 in a Solaris Environment
Tuning DB2 in a Solaris EnvironmentJignesh Shah
 
Dumb Simple PostgreSQL Performance (NYCPUG)
Dumb Simple PostgreSQL Performance (NYCPUG)Dumb Simple PostgreSQL Performance (NYCPUG)
Dumb Simple PostgreSQL Performance (NYCPUG)Joshua Drake
 
Deep Dive into RDS PostgreSQL Universe
Deep Dive into RDS PostgreSQL UniverseDeep Dive into RDS PostgreSQL Universe
Deep Dive into RDS PostgreSQL UniverseJignesh Shah
 
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Patrick McGarry
 
Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerYongseok Oh
 
ttec infortrend ds
ttec infortrend dsttec infortrend ds
ttec infortrend dsTTEC
 
VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster base...
VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster base...VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster base...
VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster base...VMworld
 
Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & ...
Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & ...Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & ...
Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & ...Cloudera, Inc.
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfsRami Jebara
 
Running without a ZFS system pool
Running without a ZFS system poolRunning without a ZFS system pool
Running without a ZFS system poolBill Pijewski
 
INF7827 DRS Best Practices
INF7827 DRS Best PracticesINF7827 DRS Best Practices
INF7827 DRS Best PracticesBrian Graf
 
Veeam Webinar - Case study: building bi-directional DR
Veeam Webinar - Case study: building bi-directional DRVeeam Webinar - Case study: building bi-directional DR
Veeam Webinar - Case study: building bi-directional DRJoep Piscaer
 
Practice and challenges from building IaaS
Practice and challenges from building IaaSPractice and challenges from building IaaS
Practice and challenges from building IaaSShawn Zhu
 
Migrating Novell GroupWise to Linux
Migrating Novell GroupWise to LinuxMigrating Novell GroupWise to Linux
Migrating Novell GroupWise to LinuxNovell
 
Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...
Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...
Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...Netgear Italia
 
Scaling Out Tier Based Applications
Scaling Out Tier Based ApplicationsScaling Out Tier Based Applications
Scaling Out Tier Based ApplicationsYury Kaliaha
 
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUsDavid Klee
 

Mais procurados (20)

Tuning DB2 in a Solaris Environment
Tuning DB2 in a Solaris EnvironmentTuning DB2 in a Solaris Environment
Tuning DB2 in a Solaris Environment
 
Dumb Simple PostgreSQL Performance (NYCPUG)
Dumb Simple PostgreSQL Performance (NYCPUG)Dumb Simple PostgreSQL Performance (NYCPUG)
Dumb Simple PostgreSQL Performance (NYCPUG)
 
Deep Dive into RDS PostgreSQL Universe
Deep Dive into RDS PostgreSQL UniverseDeep Dive into RDS PostgreSQL Universe
Deep Dive into RDS PostgreSQL Universe
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
 
Revisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS SchedulerRevisiting CephFS MDS and mClock QoS Scheduler
Revisiting CephFS MDS and mClock QoS Scheduler
 
ttec infortrend ds
ttec infortrend dsttec infortrend ds
ttec infortrend ds
 
VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster base...
VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster base...VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster base...
VMworld 2013: Operating and Architecting a vSphere Metro Storage Cluster base...
 
Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & ...
Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & ...Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & ...
Hadoop World 2011: HDFS Name Node High Availablity - Aaron Myers, Cloudera & ...
 
Vancouver bug enterprise storage and zfs
Vancouver bug   enterprise storage and zfsVancouver bug   enterprise storage and zfs
Vancouver bug enterprise storage and zfs
 
Running without a ZFS system pool
Running without a ZFS system poolRunning without a ZFS system pool
Running without a ZFS system pool
 
Hadoop availability
Hadoop availabilityHadoop availability
Hadoop availability
 
INF7827 DRS Best Practices
INF7827 DRS Best PracticesINF7827 DRS Best Practices
INF7827 DRS Best Practices
 
5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance5 Steps to PostgreSQL Performance
5 Steps to PostgreSQL Performance
 
Veeam Webinar - Case study: building bi-directional DR
Veeam Webinar - Case study: building bi-directional DRVeeam Webinar - Case study: building bi-directional DR
Veeam Webinar - Case study: building bi-directional DR
 
Practice and challenges from building IaaS
Practice and challenges from building IaaSPractice and challenges from building IaaS
Practice and challenges from building IaaS
 
Migrating Novell GroupWise to Linux
Migrating Novell GroupWise to LinuxMigrating Novell GroupWise to Linux
Migrating Novell GroupWise to Linux
 
Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...
Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...
Webinar NETGEAR - Storagecraft e Netgear: soluzioni per il backup e il disast...
 
Scaling Out Tier Based Applications
Scaling Out Tier Based ApplicationsScaling Out Tier Based Applications
Scaling Out Tier Based Applications
 
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs
 

Destaque

Map Reduce data types and formats
Map Reduce data types and formatsMap Reduce data types and formats
Map Reduce data types and formatsVigen Sahakyan
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introductionacogoluegnes
 
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHortonworks
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Rohit Agrawal
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High AvailabilityHortonworks
 
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHanborq Inc.
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseHortonworks
 

Destaque (9)

Map Reduce data types and formats
Map Reduce data types and formatsMap Reduce data types and formats
Map Reduce data types and formats
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3
 
HDFS Namenode High Availability
HDFS Namenode High AvailabilityHDFS Namenode High Availability
HDFS Namenode High Availability
 
Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari Hortonworks Data In Motion Series Part 3 - HDF Ambari
Hortonworks Data In Motion Series Part 3 - HDF Ambari
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
Enabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical EnterpriseEnabling the Real Time Analytical Enterprise
Enabling the Real Time Analytical Enterprise
 

Semelhante a Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2

Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Kathleen Ting
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineDataWorks Summit
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfhik_lhz
 
The State of HBase Replication
The State of HBase ReplicationThe State of HBase Replication
The State of HBase ReplicationHBaseCon
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop OperationsOwen O'Malley
 
Upgrading from NetWare to Novell Open Enterprise Server on Linux: The Novell ...
Upgrading from NetWare to Novell Open Enterprise Server on Linux: The Novell ...Upgrading from NetWare to Novell Open Enterprise Server on Linux: The Novell ...
Upgrading from NetWare to Novell Open Enterprise Server on Linux: The Novell ...Novell
 
7 Ways to Optimize Hudson in Production
7 Ways to Optimize Hudson in Production7 Ways to Optimize Hudson in Production
7 Ways to Optimize Hudson in ProductionCloudBees
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedCloudera, Inc.
 
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloudaidanshribman
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseCloudera, Inc.
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
 
ActiveMQ Performance Tuning
ActiveMQ Performance TuningActiveMQ Performance Tuning
ActiveMQ Performance TuningChristian Posta
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuiteEDB
 
Introduction to failover clustering with sql server
Introduction to failover clustering with sql serverIntroduction to failover clustering with sql server
Introduction to failover clustering with sql serverEduardo Castro
 
Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)jmhsieh
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Mandakini Kumari
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoopjdcryans
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of HadoopCloudera, Inc.
 

Semelhante a Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2 (20)

Flume and HBase
Flume and HBase Flume and HBase
Flume and HBase
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
 
Operate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmineOperate your hadoop cluster like a high eff goldmine
Operate your hadoop cluster like a high eff goldmine
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmf
 
The State of HBase Replication
The State of HBase ReplicationThe State of HBase Replication
The State of HBase Replication
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Upgrading from NetWare to Novell Open Enterprise Server on Linux: The Novell ...
Upgrading from NetWare to Novell Open Enterprise Server on Linux: The Novell ...Upgrading from NetWare to Novell Open Enterprise Server on Linux: The Novell ...
Upgrading from NetWare to Novell Open Enterprise Server on Linux: The Novell ...
 
7 Ways to Optimize Hudson in Production
7 Ways to Optimize Hudson in Production7 Ways to Optimize Hudson in Production
7 Ways to Optimize Hudson in Production
 
Risk Management for Data: Secured and Governed
Risk Management for Data: Secured and GovernedRisk Management for Data: Secured and Governed
Risk Management for Data: Secured and Governed
 
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloud
 
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the EnterpriseStrata + Hadoop World 2012: Apache HBase Features for the Enterprise
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
ActiveMQ Performance Tuning
ActiveMQ Performance TuningActiveMQ Performance Tuning
ActiveMQ Performance Tuning
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster Suite
 
Introduction to failover clustering with sql server
Introduction to failover clustering with sql serverIntroduction to failover clustering with sql server
Introduction to failover clustering with sql server
 
Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)Apache hbase for the enterprise (Strata+Hadoop World 2012)
Apache hbase for the enterprise (Strata+Hadoop World 2012)
 
Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04Big data with hadoop Setup on Ubuntu 12.04
Big data with hadoop Setup on Ubuntu 12.04
 
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in HadoopKudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of Hadoop
 

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 

Último (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 

Strata + Hadoop World 2012: High Availability for the HDFS NameNode Phase 2

  • 1. DO NOT USE PUBLICLY High Availability for the HDFS NameNode PRIOR TO 10/23/12 Phase 2 Headline Goes Here Aaron T. Myers and Todd Lipcon | Cloudera HDFS Team Speaker Name or Subhead Goes Here October 2012 1
  • 2. Introductions / who we are • Software engineers on Cloudera’s HDFS engineering team • Committers/PMC Members for Apache Hadoop at ASF • Main developers on HDFS HA • Responsible for ~80% of the code for all phases of HA development • Have helped numerous customers setup and troubleshoot HA HDFS clusters this year ©2012 Cloudera, Inc. All Rights 2 Reserved.
  • 3. Outline • HDFS HA Phase 1 • How did it work? What could it do? • What problems remained? • HDFS HA Phase 2: Automatic failover • HDFS HA Phase 2: Quorum Journal ©2012 Cloudera, Inc. All Rights 3 Reserved.
  • 4. HDFS HA Phase 1 Review HDFS-1623: completed March 2012 4
  • 5. HDFS HA Development Phase 1 • Completed March 2012 (HDFS-1623) • Introduced the StandbyNode, a hot backup for the HDFS NameNode. • Relied on shared storage to synchronize namespace state • (e.g. a NAS filer appliance) • Allowed operators to manually trigger failover to the Standby • Sufficient for many HA use cases: avoided planned downtime for hardware and software upgrades, planned machine/OS maintenance, configuration changes, etc. ©2012 Cloudera, Inc. All Rights 5 Reserved.
  • 6. HDFS HA Architecture Phase 1 • Parallel block reports sent to Active and Standby NameNodes • NameNode state shared by locating edit log on NAS over NFS • Fencing of shared resources/data • Critical that only a single NameNode is Active at any point in time • Client failover done via client configuration • Each client configured with the address of both NNs: try both to find active ©2012 Cloudera, Inc. All Rights 6 Reserved.
  • 7. HDFS HA Architecture Phase 1 ©2012 Cloudera, Inc. All Rights 7 Reserved.
  • 8. Fencing and NFS • Must avoid split-brain syndrome • Both nodes think they are active and try to write to the same file. Your metadata becomes corrupt and requires manual intervention to restart • Configure a fencing script • Script must ensure that prior active has stopped writing • STONITH: shoot-the-other-node-in-the-head • Storage fencing: e.g using NetApp ONTAP API to restrict filer access • Fencing script must succeed to have a successful failover ©2012 Cloudera, Inc. All Rights 8 Reserved.
  • 9. Shortcomings of Phase 1 • Insufficient to protect against unplanned downtime • Manual failover only: requires an operator to step in quickly after a crash • Various studies indicated this was the minority of downtime, but still important to address • Requirement of a NAS device made deployment complex, expensive, and error-prone (we always knew this was just the first phase!) ©2012 Cloudera, Inc. All Rights 9 Reserved.
  • 10. HDFS HA Development Phase 2 • Multiple new features for high availability • Automatic failover, based on Apache ZooKeeper • Remove dependency on NAS (network-attached storage) • Address new HA use cases • Avoid unplanned downtime due to software or hardware faults • Deploy in filer-less environments • Completely stand-alone HA with no external hardware or software dependencies • no Linux-HA, filers, etc ©2012 Cloudera, Inc. All Rights 10 Reserved.
  • 11. Automatic Failover Overview HDFS-3042: completed May 2012 11
  • 12. Automatic Failover Goals • Automatically detect failure of the Active NameNode • Hardware, software, network, etc. • Do not require operator intervention to initiate failover • Once failure is detected, process completes automatically • Support manually initiated failover as first-class • Operators can still trigger failover without having to stop Active • Do not introduce a new SPOF • All parts of auto-failover deployment must themselves be HA ©2012 Cloudera, Inc. All Rights 12 Reserved.
  • 13. Automatic Failover Architecture • Automatic failover requires ZooKeeper • Not required for manual failover • ZK makes it easy to: • Detect failure of Active NameNode • Determine which NameNode should become the Active NN ©2012 Cloudera, Inc. All Rights 13 Reserved.
  • 14. Automatic Failover Architecture • Introduce new daemon in HDFS: ZooKeeper Failover Controller • In an auto failover deployment, run two ZKFCs • One per NameNode, on that NameNode machine • ZooKeeper Failover Controller (ZKFC) is responsible for: • Monitoring health of associated NameNode • Participating in leader election of NameNodes • Fencing the other NameNode if it wins election ©2012 Cloudera, Inc. All Rights 14 Reserved.
  • 15. Automatic Failover Architecture ©2012 Cloudera, Inc. All Rights 15 Reserved.
  • 16. ZooKeeper Failover Controller Details • When a ZKFC is started, it: • Begins checking the health of its associated NN via RPC • As long as the associated NN is healthy, attempts to create an ephemeral znode in ZK • One of the two ZKFCs will succeed in creating the znode and transition its associated NN to the Active state • The other ZKFC transitions its associated NN to the Standby state and begins monitoring the ephemeral znode ©2012 Cloudera, Inc. All Rights 16 Reserved.
  • 17. What happens when… • … a NameNode process crashes? • Associated ZKFC notices the health failure of the NN and quits from active/standby election by removing znode • … a whole NameNode machine crashes? • ZKFC process crashes with it and the ephemeral znode is deleted from ZK ©2012 Cloudera, Inc. All Rights 17 Reserved.
  • 18. What happens when… • … the two NameNodes are partitioned from each other? • Nothing happens: Only one will still have the znode • … ZooKeeper crashes (or down for upgrade)? • Nothing happens: active stays active ©2012 Cloudera, Inc. All Rights 18 Reserved.
  • 19. Fencing Still Required with ZKFC • Tempting to think ZooKeeper means no need for fencing • Consider the following scenario: • Two NameNodes: A and B, each with associated ZKFC • ZKFC A process crashes, ephemeral znode removed • NameNode A process is still running • ZKFC B notices znode removed • ZKFC B wants to transition NN B to Active, but without fencing NN A, both NNs would be active simultaneously ©2012 Cloudera, Inc. All Rights 19 Reserved.
  • 20. Auto-failover recap • New daemon ZooKeeperFailoverController monitors the NameNodes • Automatically triggers fail-overs • No need for operator intervention Fencing and dependency on NFS storage still a pain ©2012 Cloudera, Inc. All Rights 20 Reserved.
  • 21. Removing the NAS dependency HDFS-3077: completed October 2012 21
  • 22. Shared Storage in HDFS HA • The Standby NameNode synchronizes the namespace by following the Active NameNode’s transaction log • Each operation (eg mkdir(/foo)) is written to the log by the Active • The StandbyNode periodically reads all new edits and applies them to its own metadata structures • Reliable shared storage is required for correct operation ©2012 Cloudera, Inc. All Rights 22 Reserved.
  • 23. Shared Storage in “Phase 1” • Operator configures a traditional shared storage device (eg SAN or NAS) • Mount the shared storage via NFS on both Active and Standby NNs • Active NN writes to a directory on NFS, while Standby reads it ©2012 Cloudera, Inc. All Rights 23 Reserved.
  • 24. Shortcomings of NFS-based approach • Custom hardware • Lots of our customers don’t have SAN/NAS available in their datacenter • Costs money, time and expertise • Extra “stuff” to monitor outside HDFS • We just moved the SPOF, didn’t eliminate it! • Complicated • Storage fencing, NFS mount options, multipath networking, etc • Organizationally complicated: dependencies on storage ops team • NFS issues • Buggy client implementations, little control over timeout behavior, etc ©2012 Cloudera, Inc. All Rights 24 Reserved.
  • 25. Primary Requirements for Improved Storage • No special hardware (PDUs, NAS) • No custom fencing configuration • Too complicated == too easy to misconfigure • No SPOFs • punting to filers isn’t a good option • need something inherently distributed ©2012 Cloudera, Inc. All Rights 25 Reserved.
  • 26. Secondary Requirements • Configurable failure toleration • Configure N nodes to tolerate (N-1)/2 • Making N bigger (within reasonable bounds) shouldn’t hurt performance. Implies: • Writes done in parallel, not pipelined • Writes should not wait on slowest replica • Locate replicas on existing hardware investment (eg share with JobTracker, NN, SBN) ©2012 Cloudera, Inc. All Rights 26 Reserved.
  • 27. Operational Requirements • Should be operable by existing Hadoop admins. Implies: • Same metrics system (“hadoop metrics”) • Same configuration system (xml) • Same logging infrastructure (log4j) • Same security system (Kerberos-based) • Allow existing ops to easily deploy and manage the new feature • Allow existing Hadoop tools to monitor the feature • (eg Cloudera Manager, Ganglia, etc) ©2012 Cloudera, Inc. All Rights 27 Reserved.
  • 28. Our solution: QuorumJournalManager • QuorumJournalManager (client) • Plugs into JournalManager abstraction in NN (instead of existing FileJournalManager) • Provides edit log storage abstraction • JournalNode (server) • Standalone daemon running on an odd number of nodes • Provides actual storage of edit logs on local disks • Could run inside other daemons in the future ©2012 Cloudera, Inc. All Rights 28 Reserved.
  • 29. Architecture ©2012 Cloudera, Inc. All Rights 29 Reserved.
  • 30. Commit protocol • NameNode accumulates edits locally as they are logged • On logSync(), sends accumulated batch to all JNs via Hadoop RPC • Waits for success ACK from a majority of nodes • Majority commit means that a single lagging or crashed replica does not impact NN latency • Latency @ NN = median(Latency @ JNs) ©2012 Cloudera, Inc. All Rights 30 Reserved.
  • 31. JN Fencing • How do we prevent split-brain? • Each instance of QJM is assigned a unique epoch number • provides a strong ordering between client NNs • Each IPC contains the client’s epoch • JN remembers on disk the highest epoch it has seen • Any request from an earlier epoch is rejected. Any from a newer one is recorded on disk • Distributed Systems folks may recognize this technique from Paxos and other literature ©2012 Cloudera, Inc. All Rights 31 Reserved.
  • 32. Fencing with epochs • Fencing is now implicit • The act of becoming active causes any earlier active NN to be fenced out • Since a quorum of nodes has accepted the new active, any other IPC by an earlier epoch number can’t get quorum • Eliminates confusing and error-prone custom fencing configuration ©2012 Cloudera, Inc. All Rights 32 Reserved.
  • 33. Segment recovery • In normal operation, a minority of JNs may be out of sync • After a crash, all JNs may have different numbers of txns (last batch may or may not have arrived at each) • eg JN1 was down, JN2 crashed right before NN wrote txnid 150: • JN1: has no edits • JN2: has edits 101-149 • JN3: has edits 101-150 • Before becoming active, we need to come to consensus on this last batch: was it committed or not? • Use the well-known Paxos algorithm to solve consensus ©2012 Cloudera, Inc. All Rights 33 Reserved.
  • 34. Other implementation features • Hadoop Metrics • lag, percentile latencies, etc from perspective of JN, NN • metrics for queued txns, % of time each JN fell behind, etc, to help suss out a slow JN before it causes problems • Security • full Kerberos and SSL support: edits can be optionally encrypted in-flight, and all access is mutually authenticated ©2012 Cloudera, Inc. All Rights 34 Reserved.
  • 35.
  • 36. Testing • Randomized fault test • Runs all communications in a single thread with deterministic order and fault injections based on a seed • Caught a number of really subtle bugs along the way • Run as an MR job: 5000 fault tests in parallel • Multiple CPU-years of stress testing: found 2 bugs in Jetty! • Cluster testing: 100-node, MR, HBase, Hive, etc • Commit latency in practice: within same range as local disks (better than one of two local disks, worse than the other one) ©2012 Cloudera, Inc. All Rights 36 Reserved.
  • 37. Deployment and Configuration • Most customers running 3 JNs (tolerate 1 failure) • 1 on NN, 1 on SBN, 1 on JobTracker/ResourceManager • Optionally run 2 more (eg on bastion/gateway nodes) to tolerate 2 failures • Configuration: • dfs.namenode.shared.edits.dir: qjournal://nn1.company.com:8485,nn2.company.com:8485,jt.company. com:8485/my-journal • dfs.journalnode.edits.dir: /data/1/hadoop/journalnode/ • dfs.ha.fencing.methods: shell(/bin/true) (fencing not required!) ©2012 Cloudera, Inc. All Rights 37 Reserved.
  • 38. Status • Merged into Hadoop development trunk in early October • Available in CDH4.1 • Deployed at several customer/community sites with good success so far • Planned rollout to 20+ production HBase clusters within the month ©2012 Cloudera, Inc. All Rights 38 Reserved.
  • 40. HA Phase 2 Improvements • Run an active NameNode and a hot Standby NameNode • Automatically triggers seamless failover using Apache ZooKeeper • Stores shared metadata on QuorumJournalManager: a fully distributed, redundant, low latency journaling system. • All improvements available now in HDFS trunk and CDH4.1 ©2012 Cloudera, Inc. All Rights 40 Reserved.
  • 41. 41
  • 43. Why not BookKeeper? • Pipelined commit instead of quorum commit • Unpredictable latency • Research project • Not “Hadoopy” • Their own IPC system, no security, different configuration, no metrics • External • Feels like “two systems” to ops/deployment instead of just one • Nevertheless: it’s pluggable and BK is an additional option. ©2012 Cloudera, Inc. All Rights 43 Reserved.
  • 44. Epoch number assignment • On startup: • NN -> JN: getEpochInfo() • JN: respond with current promised epoch • NN: set epoch = max(promisedEpoch) + 1 • NN -> JN: newEpoch(epoch) • JN: if it is still higher than promisedEpoch, remember it and ACK, otherwise NACK • If NN receives ACK from a quorum of nodes, then it has uniquely claimed that epoch ©2012 Cloudera, Inc. All Rights 44 Reserved.