SlideShare uma empresa Scribd logo
1 de 50
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters 02/28/11 Xiao Qin Department of Computer Science and Software Engineering Auburn University http://www.eng.auburn.edu/~xqin [email_address] Slides 2-20 are adapted from notes by Subbarao Kambhampati (ASU), Dan Weld (U. Washington), Jeff Dean, Sanjay Ghemawat, (Google, Inc.)
Motivation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Map/Reduce ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Distributed Grep Very  big data Split data Split data Split data Split data grep grep grep grep matches matches matches matches cat All matches
Distributed Word Count Very  big data Split data Split data Split data Split data count count count count count count count count merge merged count
Map Reduce ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Very  big data Result M A P R E D U C E Partitioning Function
Map in Lisp (Scheme) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Unary operator Binary operator
Map/Reduce ala Google ,[object Object],[object Object],[object Object],[object Object]
count words in docs ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Count,  Illustrated ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Grep ,[object Object],[object Object],[object Object],[object Object],[object Object]
Reverse Web-Link Graph ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Model is Widely Applicable MapReduce Programs In Google Source Tree  Example uses:  distributed grep   distributed sort    web link-graph reversal  term-vector / host web access log stats  inverted index construction  document clustering  machine learning  statistical machine translation  ...  ...  ...
Implementation Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Execution ,[object Object],[object Object],[object Object],[object Object],[object Object]
Job Processing JobTracker TaskTracker 0 TaskTracker 1 TaskTracker 2 TaskTracker 3 TaskTracker 4 TaskTracker 5 ,[object Object],[object Object],[object Object],[object Object],[object Object],“ grep”
Execution
Parallel Execution
Task Granularity & Pipelining ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
MapReduce outside Google ,[object Object],[object Object],[object Object],Master Slave MapReduce jobtracker tasktracker DFS namenode datanode
Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters Download Software at:   http://www.eng.auburn.edu/~xqin/software/hdfs-hc This HDFS-HC tool was described in our paper  -  Improving  MapReduce  Performance via Data Placement in Heterogeneous  Hadoop  Clusters  - by J. Xie, S. Yin, X.-J. Ruan, Z.-Y. Ding, Y. Tian, J. Majors, and X. Qin, published in Proc. 19th Int'l Heterogeneity in Computing Workshop, Atlanta, Georgia, April 2010.
Hadoop Overview (J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters.  OSDI ’04, pages 137–150, 2008)
One time setup ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hadoop Distributed File System (http://lucene.apache.org/hadoop)
Motivational Example Time (min) Node A (fast) Node B (slow) Node C (slowest) 2x slower 3x slower 1 task/min
The Native Strategy Node A Node B Node C 3 tasks 2 tasks 6 tasks Loading Transferring  Processing  Time (min)
Our Solution --Reducing data transfer time Node A’ Node B’ Node C’ 3 tasks 2 tasks 6 tasks Loading Transferring  Processing  Time (min) Node A
Preliminary Results Impact of data placement on performance of grep
Challenges    ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Measure Computing Ratios ,[object Object],[object Object],Time  Node A Node B Node C 2x slower 3x slower 1 task/min
Steps to Measure Computing Ratios 1. Run the application on each node with the same size data, individually collect the response time 2. Set the ratio of the shortest response as 1, accordingly set the ratio of other nodes  3.Caculate the least common multiple of these ratios 4. Count the portion of each node Node Response time(s) Ratio # of File Fragments Speed Node A 10 1 6 Fastest Node B 20 2 3 Average Node C 30 3 2 Slowest
Initial Data Distribution Namenode Datanodes File1 6 c ,[object Object],[object Object],C B A Portion   3:2:1 1 2 3 4 5 7 8 9 a b
Data Redistribution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1 Namenode 1 2 3 4 5 6 7 8 9 a b c C A C B A B 2 3 4 L1 L2 Portion   3:2:1
Sharing Files among  Multiple Applications ,[object Object],[object Object],[object Object]
Experimental Environment Five nodes in a hadoop heterogeneous cluster Node CPU Model CPU(Hz) L1 Cache(KB) Node A Intel core 2 Duo 2*1G=2G 204 Node B Intel Celeron 2.8G 256 Node C Intel Pentium 3 1.2G 256 Node D Intel Pentium 3 1.2G 256 Node  E Intel Pentium 3 1.2G 256
Grep and WordCount ,[object Object],[object Object]
Computing ratio for two applications Computing ratio of the five nodes with respective of Grep and Wordcount applications Computing Node Ratios for Grep Ratios for Wordcount Node A 1 1 Node B 2 2 Node C 3.3 5 Node D 3.3 5 Node E 3.3 5
Response time of Grep and wordcount in each Node Application dependence Data size independence
Six Data Placement Decisions
Impact of data placement on performance of Grep
Impact of data placement on performance of WordCount
Conclusion ,[object Object],[object Object]
Future Work ,[object Object],[object Object],[object Object]
Fellowship Program  Samuel Ginn College of Engineering at Auburn University ,[object Object],[object Object],[object Object],[object Object]
http://www.eng.auburn.edu/programs/grad-school/fellowship-program/
http://www.eng.auburn.edu
http://www.eng.auburn.edu
http://www.eng.auburn.edu
Download the presentation slides http://www.slideshare.net/xqin74 Google:  slideshare Xiao Qin
Questions

Mais conteúdo relacionado

Mais procurados

Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profilepramodbiligiri
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...Spark Summit
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
Shuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop TerasortShuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop Terasortpramodbiligiri
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)PyData
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduceHassan A-j
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analyticsAvinash Pandu
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analyticsAvinash Pandu
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Deanna Kosaraju
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programsjani shaik
 

Mais procurados (20)

MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profile
 
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
A Scalable Hierarchical Clustering Algorithm Using Spark: Spark Summit East t...
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Shuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop TerasortShuffle phase as the bottleneck in Hadoop Terasort
Shuffle phase as the bottleneck in Hadoop Terasort
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
 
Pig Experience
Pig ExperiencePig Experience
Pig Experience
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Stratosphere with big_data_analytics
Stratosphere with big_data_analyticsStratosphere with big_data_analytics
Stratosphere with big_data_analytics
 
Applying stratosphere for big data analytics
Applying stratosphere for big data analyticsApplying stratosphere for big data analytics
Applying stratosphere for big data analytics
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
Optimal Execution Of MapReduce Jobs In Cloud - Voices 2015
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
writing Hadoop Map Reduce programs
writing Hadoop Map Reduce programswriting Hadoop Map Reduce programs
writing Hadoop Map Reduce programs
 
MapReduce
MapReduceMapReduce
MapReduce
 

Semelhante a HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters

Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreKelly Technologies
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptxShimoFcis
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentationateeq ateeq
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduceNewvewm
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)anh tuan
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentationNoha Elprince
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & HadoopAhmed Gamil
 

Semelhante a HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters (20)

Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
Hadoop institutes-in-bangalore
Hadoop institutes-in-bangaloreHadoop institutes-in-bangalore
Hadoop institutes-in-bangalore
 
E031201032036
E031201032036E031201032036
E031201032036
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
 
Handout3o
Handout3oHandout3o
Handout3o
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 
ch02-mapreduce.pptx
ch02-mapreduce.pptxch02-mapreduce.pptx
ch02-mapreduce.pptx
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Map Reduce Online
Map Reduce OnlineMap Reduce Online
Map Reduce Online
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Hadoop bigdata overview
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
Map reduce
Map reduceMap reduce
Map reduce
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
Big data & Hadoop
Big data & HadoopBig data & Hadoop
Big data & Hadoop
 

Mais de Xiao Qin

How to apply for internship positions?
How to apply for internship positions?How to apply for internship positions?
How to apply for internship positions?Xiao Qin
 
How to write research papers? Version 5.0
How to write research papers? Version 5.0How to write research papers? Version 5.0
How to write research papers? Version 5.0Xiao Qin
 
Making a competitive nsf career proposal: Part 2 Worksheet
Making a competitive nsf career proposal: Part 2 WorksheetMaking a competitive nsf career proposal: Part 2 Worksheet
Making a competitive nsf career proposal: Part 2 WorksheetXiao Qin
 
Making a competitive nsf career proposal: Part 1 Tips
Making a competitive nsf career proposal: Part 1 TipsMaking a competitive nsf career proposal: Part 1 Tips
Making a competitive nsf career proposal: Part 1 TipsXiao Qin
 
Auburn csse faculty orientation
Auburn csse faculty orientationAuburn csse faculty orientation
Auburn csse faculty orientationXiao Qin
 
Auburn CSSE graduate student orientation
Auburn CSSE graduate student orientationAuburn CSSE graduate student orientation
Auburn CSSE graduate student orientationXiao Qin
 
CSSE Graduate Programs Committee: Progress Report
CSSE Graduate Programs Committee: Progress ReportCSSE Graduate Programs Committee: Progress Report
CSSE Graduate Programs Committee: Progress ReportXiao Qin
 
Project 2 How to modify os161: A Manual
Project 2 How to modify os161: A ManualProject 2 How to modify os161: A Manual
Project 2 How to modify os161: A ManualXiao Qin
 
Project 2 how to modify OS/161
Project 2 how to modify OS/161Project 2 how to modify OS/161
Project 2 how to modify OS/161Xiao Qin
 
Project 2 how to install and compile os161
Project 2 how to install and compile os161Project 2 how to install and compile os161
Project 2 how to install and compile os161Xiao Qin
 
Project 2 - how to compile os161?
Project 2 - how to compile os161?Project 2 - how to compile os161?
Project 2 - how to compile os161?Xiao Qin
 
Understanding what our customer wants-slideshare
Understanding what our customer wants-slideshareUnderstanding what our customer wants-slideshare
Understanding what our customer wants-slideshareXiao Qin
 
OS/161 Overview
OS/161 OverviewOS/161 Overview
OS/161 OverviewXiao Qin
 
Surviving a group project
Surviving a group projectSurviving a group project
Surviving a group projectXiao Qin
 
P#1 stream of praise
P#1 stream of praiseP#1 stream of praise
P#1 stream of praiseXiao Qin
 
Data center specific thermal and energy saving techniques
Data center specific thermal and energy saving techniquesData center specific thermal and energy saving techniques
Data center specific thermal and energy saving techniquesXiao Qin
 
How to do research?
How to do research?How to do research?
How to do research?Xiao Qin
 
COMP2710 Software Construction: header files
COMP2710 Software Construction: header filesCOMP2710 Software Construction: header files
COMP2710 Software Construction: header filesXiao Qin
 
COMP2710: Software Construction - Linked list exercises
COMP2710: Software Construction - Linked list exercisesCOMP2710: Software Construction - Linked list exercises
COMP2710: Software Construction - Linked list exercisesXiao Qin
 
How to add system calls to OS/161
How to add system calls to OS/161How to add system calls to OS/161
How to add system calls to OS/161Xiao Qin
 

Mais de Xiao Qin (20)

How to apply for internship positions?
How to apply for internship positions?How to apply for internship positions?
How to apply for internship positions?
 
How to write research papers? Version 5.0
How to write research papers? Version 5.0How to write research papers? Version 5.0
How to write research papers? Version 5.0
 
Making a competitive nsf career proposal: Part 2 Worksheet
Making a competitive nsf career proposal: Part 2 WorksheetMaking a competitive nsf career proposal: Part 2 Worksheet
Making a competitive nsf career proposal: Part 2 Worksheet
 
Making a competitive nsf career proposal: Part 1 Tips
Making a competitive nsf career proposal: Part 1 TipsMaking a competitive nsf career proposal: Part 1 Tips
Making a competitive nsf career proposal: Part 1 Tips
 
Auburn csse faculty orientation
Auburn csse faculty orientationAuburn csse faculty orientation
Auburn csse faculty orientation
 
Auburn CSSE graduate student orientation
Auburn CSSE graduate student orientationAuburn CSSE graduate student orientation
Auburn CSSE graduate student orientation
 
CSSE Graduate Programs Committee: Progress Report
CSSE Graduate Programs Committee: Progress ReportCSSE Graduate Programs Committee: Progress Report
CSSE Graduate Programs Committee: Progress Report
 
Project 2 How to modify os161: A Manual
Project 2 How to modify os161: A ManualProject 2 How to modify os161: A Manual
Project 2 How to modify os161: A Manual
 
Project 2 how to modify OS/161
Project 2 how to modify OS/161Project 2 how to modify OS/161
Project 2 how to modify OS/161
 
Project 2 how to install and compile os161
Project 2 how to install and compile os161Project 2 how to install and compile os161
Project 2 how to install and compile os161
 
Project 2 - how to compile os161?
Project 2 - how to compile os161?Project 2 - how to compile os161?
Project 2 - how to compile os161?
 
Understanding what our customer wants-slideshare
Understanding what our customer wants-slideshareUnderstanding what our customer wants-slideshare
Understanding what our customer wants-slideshare
 
OS/161 Overview
OS/161 OverviewOS/161 Overview
OS/161 Overview
 
Surviving a group project
Surviving a group projectSurviving a group project
Surviving a group project
 
P#1 stream of praise
P#1 stream of praiseP#1 stream of praise
P#1 stream of praise
 
Data center specific thermal and energy saving techniques
Data center specific thermal and energy saving techniquesData center specific thermal and energy saving techniques
Data center specific thermal and energy saving techniques
 
How to do research?
How to do research?How to do research?
How to do research?
 
COMP2710 Software Construction: header files
COMP2710 Software Construction: header filesCOMP2710 Software Construction: header files
COMP2710 Software Construction: header files
 
COMP2710: Software Construction - Linked list exercises
COMP2710: Software Construction - Linked list exercisesCOMP2710: Software Construction - Linked list exercises
COMP2710: Software Construction - Linked list exercises
 
How to add system calls to OS/161
How to add system calls to OS/161How to add system calls to OS/161
How to add system calls to OS/161
 

Último

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters

  • 1. HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters 02/28/11 Xiao Qin Department of Computer Science and Software Engineering Auburn University http://www.eng.auburn.edu/~xqin [email_address] Slides 2-20 are adapted from notes by Subbarao Kambhampati (ASU), Dan Weld (U. Washington), Jeff Dean, Sanjay Ghemawat, (Google, Inc.)
  • 2.
  • 3.
  • 4. Distributed Grep Very big data Split data Split data Split data Split data grep grep grep grep matches matches matches matches cat All matches
  • 5. Distributed Word Count Very big data Split data Split data Split data Split data count count count count count count count count merge merged count
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. Model is Widely Applicable MapReduce Programs In Google Source Tree Example uses: distributed grep   distributed sort   web link-graph reversal term-vector / host web access log stats inverted index construction document clustering machine learning statistical machine translation ... ... ...
  • 14.
  • 15.
  • 16.
  • 19.
  • 20.
  • 21. Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters Download Software at: http://www.eng.auburn.edu/~xqin/software/hdfs-hc This HDFS-HC tool was described in our paper -  Improving MapReduce Performance via Data Placement in Heterogeneous Hadoop Clusters  - by J. Xie, S. Yin, X.-J. Ruan, Z.-Y. Ding, Y. Tian, J. Majors, and X. Qin, published in Proc. 19th Int'l Heterogeneity in Computing Workshop, Atlanta, Georgia, April 2010.
  • 22. Hadoop Overview (J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI ’04, pages 137–150, 2008)
  • 23.
  • 24. Hadoop Distributed File System (http://lucene.apache.org/hadoop)
  • 25. Motivational Example Time (min) Node A (fast) Node B (slow) Node C (slowest) 2x slower 3x slower 1 task/min
  • 26. The Native Strategy Node A Node B Node C 3 tasks 2 tasks 6 tasks Loading Transferring Processing Time (min)
  • 27. Our Solution --Reducing data transfer time Node A’ Node B’ Node C’ 3 tasks 2 tasks 6 tasks Loading Transferring Processing Time (min) Node A
  • 28. Preliminary Results Impact of data placement on performance of grep
  • 29.
  • 30.
  • 31. Steps to Measure Computing Ratios 1. Run the application on each node with the same size data, individually collect the response time 2. Set the ratio of the shortest response as 1, accordingly set the ratio of other nodes 3.Caculate the least common multiple of these ratios 4. Count the portion of each node Node Response time(s) Ratio # of File Fragments Speed Node A 10 1 6 Fastest Node B 20 2 3 Average Node C 30 3 2 Slowest
  • 32.
  • 33.
  • 34.
  • 35. Experimental Environment Five nodes in a hadoop heterogeneous cluster Node CPU Model CPU(Hz) L1 Cache(KB) Node A Intel core 2 Duo 2*1G=2G 204 Node B Intel Celeron 2.8G 256 Node C Intel Pentium 3 1.2G 256 Node D Intel Pentium 3 1.2G 256 Node E Intel Pentium 3 1.2G 256
  • 36.
  • 37. Computing ratio for two applications Computing ratio of the five nodes with respective of Grep and Wordcount applications Computing Node Ratios for Grep Ratios for Wordcount Node A 1 1 Node B 2 2 Node C 3.3 5 Node D 3.3 5 Node E 3.3 5
  • 38. Response time of Grep and wordcount in each Node Application dependence Data size independence
  • 39. Six Data Placement Decisions
  • 40. Impact of data placement on performance of Grep
  • 41. Impact of data placement on performance of WordCount
  • 42.
  • 43.
  • 44.
  • 49. Download the presentation slides http://www.slideshare.net/xqin74 Google: slideshare Xiao Qin

Notas do Editor

  1. Cite here What is the hadoop? http://www.cloudera.com/what-is-hadoop/ Hadoop is an open-source project administered by the Apache Software Foundation . Hadoop’s contributors work for some of the world’s biggest technology companies. That diverse, motivated community has produced a genuinely innovative platform for consolidating, combining and understanding data. Technically, Hadoop consists of two key services: reliable data storage using the Hadoop Distributed File System (HDFS) and high-performance parallel data processing using a technique called MapReduce. Hadoop runs on a collection of commodity, shared-nothing servers. You can add or remove servers in a Hadoop cluster at will; the system detects and compensates for hardware or system problems on any server. Hadoop, in other words, is self-healing. It can deliver data — and can run large-scale, high-performance processing jobs — in spite of system changes or failures.
  2. Slow down, add animation of speculative task helping Note: Mention that we tested this on a heterogeneous Hadoop cluster
  3. Copy page 7 to here Note: Please add legend on this diagram. E.g., black bars represent ******, red bars represent ***** Show data movement (migration)
  4. Real results for the aforementioned motivational example.
  5. Note: Explain what is the definition of computing ratio
  6. Note: Name the over-utilized node list and the under-utilized node list on the diagram. a->A, b->B
  7. The heterogeneity measurement of a cluster depends on data-intensive applications. If multiple MapReduce applications must process the same input file, the data placement mechanism may need to distribute the input file’s fragments in several ways - one for each MapReduce application. In the case where multiple applications are similar in terms of data processing speed, one data placement decision may fit the needs of all the application
  8. Note: improve the resolution of the figures
  9. Title Application dependence Independenced of Data size Note 1: improve the quality of the figures. Large figures and large legend Note 2:
  10. number