SlideShare a Scribd company logo
1 of 13
Download to read offline
Hadoop Virtualization Extensions

Junping Du


Sr.MTS, VMware, Inc




                                   © 2009 VMware Inc. All rights reserved
Project HVE (Hadoop Virtualization Extensions)


 Refine Hadoop for running on virtualized infrastructure
    • Enable multiple-layer network topology
    • Enable resource sharing
    • Enable compute/data node separation without losing locality


 Patches are contributed back to Apache Hadoop Community
    • http://www.vmware.com/hadoop
    • Umbrella JIRA: HADOOP-8468
    • Sub JIRAs: HADOOP-8469, HADOOP-8470, HADOOP-8817, HDFS-3495,
                HDFS-3498, HDFS-3461, MAPREDUCE-4660, YARN-18, etc.




2
Current Network Topology

                                /



                D1                             D1
                                                               • D = data center
                                                               • R = rack
      R1              R2                 R3          R4
                                                               • H = host

 H1   H2   H3    H4   H5   H6       H7   H8   H9 H10 H11 H12



However, you have more choices on
 virtualized infrastructure


            • C = compute node
                (TaskTracker)
            • D = data node

 3
High Level View on HVE changes




4
Additional network topology layer to aware virtuliazation

                                                                        • D = data center
                                                                        • R = rack
                                                                        • NG = node group
                                              /                         • HG = node


                       D1                                          D2



           R1                     R2                   R3                     R4



     NG1         NG2        NG3         NG4       NG5        NG6        NG7        NG8



N1   N2     N3     N4       N5     N6     N7      N8        N9   N10    N11    N12    N13


5
“Virtualization Aware” Replica Placement Policy


                                           Updated Policies:
                                           • No replicas are placed on the
                                             same node or nodes under
                                             the same node group
                                           • 1st replica is on the local
                                             node or one of nodes under
                                             the same node group of the
                                             writer
                                           • 2nd replica is on a remote
                                             rack of the 1st replica
                                           • 3rd replica is on the same
                                             rack as the 2nd replica
                                           • Remaining replicas are
                                             placed randomly across rack
                                             to meet minimum restriction.



6
“Virtualization Aware” Replica Choosing Policy


                                           Distances for data locality:
                                           • Node local (0)
                                           • Node group local (2)
                                           • Rack local (4)
                                           • Off rack (6)




7
“Virtualization Aware” Balancer Policy


                                • Balancer policies contains two levels
                                  choosing policy
                                   - choosing node pairs of source and
                                  target, in sequence of: local node group,
                                  local rack, off rack
                                   - choosing blocks to move within node
                                  pair, a replica block is not a good
                                  candidate if another replica is on the
                                  target node or on the same node group
                                  of the target node




8
“Virtualization Aware” Task Scheduling Policy


                                          Get task split for TaskTracker or
                                           NodeManager in following
                                           sequences:
                                          • Node local
                                          • Node group local
                                          • Rack local
                                          • Off rack


                                          It works well with
                                          • FifoScheduler
                                          • FairScheduler
                                          • Capacity scheduler




9
HVE Effects on Reliability and Performance




10
Summary

 Hadoop Virtualization Extensions
 • Network Topology with additional layer
 • Replica placement/removal/choosing policies extension
 • Balancer policy extension
 • Task Scheduling policy extension
 HVE effect
 • Reliability – multiple DN VMs per host
 • Performance – DN/CN separation case




11
References

 Hadoop at VMware
 • www.vmware.com/hadoop
 Project Serengeti
 • projectserengeti.org
 Umbrella JIRA for HVE
 • https://issues.apache.org/jira/browse/HADOOP-8468                   Serengeti
 Hadoop on vSphere
 • Talks @ Hadoop World, Hadoop Summit
 • White Papers
 Spring for Apache Hadoop
 • http://blog.springsource.org/2012/02/29/introducing-spring-hadoop




12
Q&A

     Thank you!




13

More Related Content

What's hot

Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...Novell
 
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksCloudera, Inc.
 
DASH7 Webinar: Working With Open Tag For Mode 2
DASH7 Webinar:  Working With Open Tag For Mode 2DASH7 Webinar:  Working With Open Tag For Mode 2
DASH7 Webinar: Working With Open Tag For Mode 2Haystack Technologies
 

What's hot (6)

Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
 
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
 
PD_Tcl_Examples
PD_Tcl_ExamplesPD_Tcl_Examples
PD_Tcl_Examples
 
DASH7 Webinar: Working With Open Tag For Mode 2
DASH7 Webinar:  Working With Open Tag For Mode 2DASH7 Webinar:  Working With Open Tag For Mode 2
DASH7 Webinar: Working With Open Tag For Mode 2
 
Status of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and ToolsStatus of HDF-EOS, Related Software and Tools
Status of HDF-EOS, Related Software and Tools
 

Similar to Hadoop virtualization extensions hadoop world meetup

End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationCeph Community
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureCloudera, Inc.
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talkSatish Mehta
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaTed Dunning
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudCloudera, Inc.
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stackMurali Reddy
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduceFARUK BERKSÖZ
 
Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataCyanny LIANG
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011David Funaro
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionBenoit Perroud
 
GR8Conf 2011: Neo4j Plugin
GR8Conf 2011: Neo4j PluginGR8Conf 2011: Neo4j Plugin
GR8Conf 2011: Neo4j PluginGR8Conf
 
Hadoop: A Hands-on Introduction
Hadoop: A Hands-on IntroductionHadoop: A Hands-on Introduction
Hadoop: A Hands-on IntroductionClaudio Martella
 
Conole vilnius 3_nov
Conole vilnius 3_novConole vilnius 3_nov
Conole vilnius 3_novgrainne
 

Similar to Hadoop virtualization extensions hadoop world meetup (20)

End of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph ReplicationEnd of RAID as we know it with Ceph Replication
End of RAID as we know it with Ceph Replication
 
My sql tutorial-oscon-2012
My sql tutorial-oscon-2012My sql tutorial-oscon-2012
My sql tutorial-oscon-2012
 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and Future
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
Cassandra tech talk
Cassandra tech talkCassandra tech talk
Cassandra tech talk
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
Hadoop on VMware
Hadoop on VMwareHadoop on VMware
Hadoop on VMware
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in Cloud
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 
HBase with MapR
HBase with MapRHBase with MapR
HBase with MapR
 
Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big data
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 
NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011NoSQL overview #phptostart turin 11.07.2011
NoSQL overview #phptostart turin 11.07.2011
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment Evolution
 
GR8Conf 2011: Neo4j Plugin
GR8Conf 2011: Neo4j PluginGR8Conf 2011: Neo4j Plugin
GR8Conf 2011: Neo4j Plugin
 
Grails and Neo4j
Grails and Neo4jGrails and Neo4j
Grails and Neo4j
 
Hadoop: A Hands-on Introduction
Hadoop: A Hands-on IntroductionHadoop: A Hands-on Introduction
Hadoop: A Hands-on Introduction
 
MyCloud for $100k
MyCloud for $100kMyCloud for $100k
MyCloud for $100k
 
Conole vilnius 3_nov
Conole vilnius 3_novConole vilnius 3_nov
Conole vilnius 3_nov
 

Hadoop virtualization extensions hadoop world meetup

  • 1. Hadoop Virtualization Extensions Junping Du Sr.MTS, VMware, Inc © 2009 VMware Inc. All rights reserved
  • 2. Project HVE (Hadoop Virtualization Extensions)  Refine Hadoop for running on virtualized infrastructure • Enable multiple-layer network topology • Enable resource sharing • Enable compute/data node separation without losing locality  Patches are contributed back to Apache Hadoop Community • http://www.vmware.com/hadoop • Umbrella JIRA: HADOOP-8468 • Sub JIRAs: HADOOP-8469, HADOOP-8470, HADOOP-8817, HDFS-3495, HDFS-3498, HDFS-3461, MAPREDUCE-4660, YARN-18, etc. 2
  • 3. Current Network Topology / D1 D1 • D = data center • R = rack R1 R2 R3 R4 • H = host H1 H2 H3 H4 H5 H6 H7 H8 H9 H10 H11 H12 However, you have more choices on virtualized infrastructure • C = compute node (TaskTracker) • D = data node 3
  • 4. High Level View on HVE changes 4
  • 5. Additional network topology layer to aware virtuliazation • D = data center • R = rack • NG = node group / • HG = node D1 D2 R1 R2 R3 R4 NG1 NG2 NG3 NG4 NG5 NG6 NG7 NG8 N1 N2 N3 N4 N5 N6 N7 N8 N9 N10 N11 N12 N13 5
  • 6. “Virtualization Aware” Replica Placement Policy Updated Policies: • No replicas are placed on the same node or nodes under the same node group • 1st replica is on the local node or one of nodes under the same node group of the writer • 2nd replica is on a remote rack of the 1st replica • 3rd replica is on the same rack as the 2nd replica • Remaining replicas are placed randomly across rack to meet minimum restriction. 6
  • 7. “Virtualization Aware” Replica Choosing Policy Distances for data locality: • Node local (0) • Node group local (2) • Rack local (4) • Off rack (6) 7
  • 8. “Virtualization Aware” Balancer Policy • Balancer policies contains two levels choosing policy - choosing node pairs of source and target, in sequence of: local node group, local rack, off rack - choosing blocks to move within node pair, a replica block is not a good candidate if another replica is on the target node or on the same node group of the target node 8
  • 9. “Virtualization Aware” Task Scheduling Policy Get task split for TaskTracker or NodeManager in following sequences: • Node local • Node group local • Rack local • Off rack It works well with • FifoScheduler • FairScheduler • Capacity scheduler 9
  • 10. HVE Effects on Reliability and Performance 10
  • 11. Summary  Hadoop Virtualization Extensions • Network Topology with additional layer • Replica placement/removal/choosing policies extension • Balancer policy extension • Task Scheduling policy extension  HVE effect • Reliability – multiple DN VMs per host • Performance – DN/CN separation case 11
  • 12. References  Hadoop at VMware • www.vmware.com/hadoop  Project Serengeti • projectserengeti.org  Umbrella JIRA for HVE • https://issues.apache.org/jira/browse/HADOOP-8468 Serengeti  Hadoop on vSphere • Talks @ Hadoop World, Hadoop Summit • White Papers  Spring for Apache Hadoop • http://blog.springsource.org/2012/02/29/introducing-spring-hadoop 12
  • 13. Q&A Thank you! 13