SlideShare uma empresa Scribd logo
1 de 21
Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing SC09 Doctoral Symposium,  Portland, 11/18/2009 Student: Jaliya Ekanayake Advisor: Prof. Geoffrey Fox Community Grids Laboratory,  Digital Science Center Pervasive Technology Institute Indiana University
Cloud Runtimes for Data/Compute Intensive Applications Cloud Runtimes MapReduce  Dryad/DryadLINQ Sector/Sphere  Moving Computation to  Data Simple communication topologies MapReduce Directed Acyclic Graphs (DAG)s Distributed File Systems Fault Tolerance ,[object Object]
Represented as filter pipelines
Parallelizable filters,[object Object]
Applications using Hadoop and DryadLINQ (2) PhyloD [1]project from Microsoft Research Derive associations between HLA alleles and HIV codons and between codons themselves DryadLINQ  implementation [1] Microsoft Computational Biology Web Tools, http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/
Applications using Hadoop and DryadLINQ (3) 125 million distances 4 hours & 46 minutes Calculate  Pairwise Distances (Smith Waterman Gotoh) Calculate pairwise distances for a collection of genes (used for clustering, MDS) Fine grained tasks in MPI Coarse grained tasks in DryadLINQ Performed on 768 cores (Tempest Cluster)
Applications using Hadoop and DryadLINQ (4) ,[object Object]
K-Means Clustering
Matrix Multiplication
Multi-Dimensional Scaling (MDS),[object Object]
Applications & Different Interconnection Patterns Input map iterations Input Input map map Output Pij reduce reduce MPI Domain of MapReduce and Iterative Extensions
i-MapReduce ,[object Object]
Distinction on static data and variable data (data flow vs. δ flow)
Cacheable map/reduce tasks (long running tasks)
Combine operation
Support fast intermediate data transfersStatic data Configure() Iterate User Program δ flow Map(Key, Value)   Reduce (Key, List<Value>)  Close() Combine (Key, List<Value>) Different synchronization and intercommunication mechanisms used by the parallel runtimes
i-MapReduceProgramming Model runMapReduce()   Iterations Worker Nodes configureMaps() Local Disk configureReduce() Cacheable map/reduce tasks while(condition){ Can send <Key,Value> pairs directly Map() Reduce() Combine() operation Communications/data transfers via the pub-sub broker network updateCondition() Two configuration options : Using local disks (only for maps) Using pub-sub bus  } //end while close() User program’s process space
i-MapReduceArchitecture Pub/Sub Broker Network Map Worker M Worker Nodes Reduce Worker D MR Driver User Program D R M M M M MRDeamon D R R R R Data Read/Write File System Communication Data Split ,[object Object]
Eliminates file based communication
Cacheable map/reduce tasks
Static data remains in memory

Mais conteúdo relacionado

Mais procurados

Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationGeoffrey Fox
 
Plenzogan technology
Plenzogan technologyPlenzogan technology
Plenzogan technologyplenzogan
 
Efficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsEfficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsIJERA Editor
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
Energy-aware VM Allocation on An Opportunistic Cloud Infrastructure
Energy-aware VM Allocation on An Opportunistic Cloud InfrastructureEnergy-aware VM Allocation on An Opportunistic Cloud Infrastructure
Energy-aware VM Allocation on An Opportunistic Cloud InfrastructureMario Jose Villamizar Cano
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...EUDAT
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computingVajira Thambawita
 
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET Journal
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPIJSRED
 
Image transmission in wireless sensor networks
Image transmission in wireless sensor networksImage transmission in wireless sensor networks
Image transmission in wireless sensor networkseSAT Publishing House
 
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...Papitha Velumani
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationDevansh16
 
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...CSCJournals
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Derryck Lamptey, MPhil, CISSP
 
Load Rebalancing for Distributed Hash Tables in Cloud Computing
Load Rebalancing for Distributed Hash Tables in Cloud ComputingLoad Rebalancing for Distributed Hash Tables in Cloud Computing
Load Rebalancing for Distributed Hash Tables in Cloud Computingiosrjce
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 

Mais procurados (20)

Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
Plenzogan technology
Plenzogan technologyPlenzogan technology
Plenzogan technology
 
Efficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in CloudsEfficient load rebalancing for distributed file system in Clouds
Efficient load rebalancing for distributed file system in Clouds
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Energy-aware VM Allocation on An Opportunistic Cloud Infrastructure
Energy-aware VM Allocation on An Opportunistic Cloud InfrastructureEnergy-aware VM Allocation on An Opportunistic Cloud Infrastructure
Energy-aware VM Allocation on An Opportunistic Cloud Infrastructure
 
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
High Performance & High Throughput Computing - EUDAT Summer School (Giuseppe ...
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
IRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CLIRJET- Latin Square Computation of Order-3 using Open CL
IRJET- Latin Square Computation of Order-3 using Open CL
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
 
Parallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MPParallelization of Graceful Labeling Using Open MP
Parallelization of Graceful Labeling Using Open MP
 
Image transmission in wireless sensor networks
Image transmission in wireless sensor networksImage transmission in wireless sensor networks
Image transmission in wireless sensor networks
 
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
QoS-Aware Data Replication for Data-Intensive Applications in Cloud Computing...
 
FrackingPaper
FrackingPaperFrackingPaper
FrackingPaper
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
 
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
Run-Time Adaptive Processor Allocation of Self-Configurable Intel IXP2400 Net...
 
Lecture 05 - Chapter 3 - Models of parallel computers and interconnections
Lecture 05 - Chapter 3 - Models of parallel computers and  interconnectionsLecture 05 - Chapter 3 - Models of parallel computers and  interconnections
Lecture 05 - Chapter 3 - Models of parallel computers and interconnections
 
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
Achieving Portability and Efficiency in a HPC Code Using Standard Message-pas...
 
Load Rebalancing for Distributed Hash Tables in Cloud Computing
Load Rebalancing for Distributed Hash Tables in Cloud ComputingLoad Rebalancing for Distributed Hash Tables in Cloud Computing
Load Rebalancing for Distributed Hash Tables in Cloud Computing
 
Lecture 04 chapter 2 - Parallel Programming Platforms
Lecture 04  chapter 2 - Parallel Programming PlatformsLecture 04  chapter 2 - Parallel Programming Platforms
Lecture 04 chapter 2 - Parallel Programming Platforms
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 

Semelhante a Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing

Qiu bosc2010
Qiu bosc2010Qiu bosc2010
Qiu bosc2010BOSC 2010
 
Slide 1
Slide 1Slide 1
Slide 1butest
 
Slide 1
Slide 1Slide 1
Slide 1butest
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersIan Foster
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Yahoo Developer Network
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defensemarek_pomocka
 
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...LEGATO project
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdfLevLafayette1
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningRafael Ferreira da Silva
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 
grid mining
grid mininggrid mining
grid miningARNOLD
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniquejournalBEEI
 
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)Thilina Gunarathne
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Robert Grossman
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONijcsit
 

Semelhante a Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing (20)

Qiu bosc2010
Qiu bosc2010Qiu bosc2010
Qiu bosc2010
 
Slide 1
Slide 1Slide 1
Slide 1
 
Slide 1
Slide 1Slide 1
Slide 1
 
Many Task Applications for Grids and Supercomputers
Many Task Applications for Grids and SupercomputersMany Task Applications for Grids and Supercomputers
Many Task Applications for Grids and Supercomputers
 
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
Apache Hadoop India Summit 2011 Keynote talk "Programming Abstractions for Sm...
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Dissertation defense
Dissertation defenseDissertation defense
Dissertation defense
 
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
PADAL19: Runtime-Assisted Locality Abstraction Using Elastic Places and Virtu...
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
 
2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf2023comp90024_Spartan.pdf
2023comp90024_Spartan.pdf
 
The Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource ProvisioningThe Interplay of Workflow Execution and Resource Provisioning
The Interplay of Workflow Execution and Resource Provisioning
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
grid mining
grid mininggrid mining
grid mining
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce Technique
 
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
Map Reduce in the Clouds (http://salsahpc.indiana.edu/mapreduceroles4azure/)
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
TransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR UpdateTransPAC3/ACE Measurement & PerfSONAR Update
TransPAC3/ACE Measurement & PerfSONAR Update
 
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
Lessons Learned from a Year's Worth of Benchmarking Large Data Clouds (Robert...
 
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISONMAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
MAP-REDUCE IMPLEMENTATIONS: SURVEY AND PERFORMANCE COMPARISON
 

Último

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Último (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing

  • 1. Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing SC09 Doctoral Symposium, Portland, 11/18/2009 Student: Jaliya Ekanayake Advisor: Prof. Geoffrey Fox Community Grids Laboratory, Digital Science Center Pervasive Technology Institute Indiana University
  • 2.
  • 4.
  • 5. Applications using Hadoop and DryadLINQ (2) PhyloD [1]project from Microsoft Research Derive associations between HLA alleles and HIV codons and between codons themselves DryadLINQ implementation [1] Microsoft Computational Biology Web Tools, http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/
  • 6. Applications using Hadoop and DryadLINQ (3) 125 million distances 4 hours & 46 minutes Calculate Pairwise Distances (Smith Waterman Gotoh) Calculate pairwise distances for a collection of genes (used for clustering, MDS) Fine grained tasks in MPI Coarse grained tasks in DryadLINQ Performed on 768 cores (Tempest Cluster)
  • 7.
  • 10.
  • 11. Applications & Different Interconnection Patterns Input map iterations Input Input map map Output Pij reduce reduce MPI Domain of MapReduce and Iterative Extensions
  • 12.
  • 13. Distinction on static data and variable data (data flow vs. δ flow)
  • 14. Cacheable map/reduce tasks (long running tasks)
  • 16. Support fast intermediate data transfersStatic data Configure() Iterate User Program δ flow Map(Key, Value) Reduce (Key, List<Value>) Close() Combine (Key, List<Value>) Different synchronization and intercommunication mechanisms used by the parallel runtimes
  • 17. i-MapReduceProgramming Model runMapReduce() Iterations Worker Nodes configureMaps() Local Disk configureReduce() Cacheable map/reduce tasks while(condition){ Can send <Key,Value> pairs directly Map() Reduce() Combine() operation Communications/data transfers via the pub-sub broker network updateCondition() Two configuration options : Using local disks (only for maps) Using pub-sub bus } //end while close() User program’s process space
  • 18.
  • 19. Eliminates file based communication
  • 22. User Program is the composer of MapReduce computations
  • 23. Extends the MapReduce model to iterative computations
  • 25. Assume that static data fits in to distributed memory12/6/2009 Jaliya Ekanayake 11
  • 26. Applications – Pleasingly Parallel CAP3- Expressed Sequence Tagging Input files (FASTA) CAP3 CAP3 High Energy Physics (HEP) Data Analysis Output files
  • 27. Applications - Iterative Performance of K-Means Clustering Parallel Overhead of Matrix multiplication
  • 28. Current Research Virtualization Overhead Applications more susceptible to latencies (higher communication/computation ratio) => higher overheads under virtualization Hadoop shows 15% performance degradation on a private cloud Latency effect on i-MapReduceis lower compared to MPI due to the coarse grained tasks? Fault Tolerance for i-MapReduce Replicated data Saving state after n iterations
  • 29. Related Work General MapReduce References: Google MapReduce Apache Hadoop Microsoft DryadLINQ Pregel : Large-scale graph computing at Google Sector/Sphere All-Pairs SAGA: MapReduce Disco
  • 30. Contributions Programming model for iterative MapReduce computations i-MapReduceimplementation MapReduce algorithms/implementations for a series of scientific applications Applicability of cloud runtimes to different classes of data/compute intensive applications Comparison of cloud runtimes with MPI Virtualization overhead of HPC Applications and Cloud Runtimes
  • 31. Publications Jaliya Ekanayake, (Advisor: Geoffrey Fox) Architecture and Performance of Runtime Environments for Data Intensive Scalable Computing, Accepted for the Doctoral Showcase, SuperComputing2009. Xiaohong Qiu, Jaliya Ekanayake, Scott Beason, Thilina Gunarathne, Geoffrey Fox, Roger Barga, Dennis Gannon, Cloud Technologies for Bioinformatics Applications, Accepted for publication in 2nd ACM Workshop on Many-Task Computing on Grids and Supercomputers, SuperComputing2009. Jaliya Ekanayake, Atilla Soner Balkir, Thilina Gunarathne, Geoffrey Fox, Christophe Poulain, Nelson Araujo, Roger Barga, DryadLINQ for Scientific Analyses, Accepted for publication in Fifth IEEE International Conference on e-Science (eScience2009), Oxford, UK. Jaliya Ekanayake and Geoffrey Fox, High Performance Parallel Computing with Clouds and Cloud Technologies, First International Conference on Cloud Computing (CloudComp2009), Munich, Germany. – An extended version of this paper goes to a book chapter. Geoffrey Fox, Seung-Hee Bae, Jaliya Ekanayake, Xiaohong Qiu, and Huapeng Yuan, Parallel Data Mining from Multicore to Cloudy Grids, High Performance Computing and Grids workshop, 2008. – An extended version of this paper goes to a book chapter. Jaliya Ekanayake, Shrideep Pallickara, Geoffrey Fox, MapReduce for Data Intensive Scientific Analyses, Fourth IEEE International Conference on eScience, 2008, pp.277-284. Jaliya Ekanayake, Shrideep Pallickara, and Geoffrey Fox, A collaborative framework for scientific data analysis and visualization, Collaborative Technologies and Systems(CTS08), 2008, pp. 339-346. Shrideep Pallickara, Jaliya Ekanayake and Geoffrey Fox, A Scalable Approach for the Secure and Authorized Tracking of the Availability of Entities in Distributed Systems, 21st IEEE International Parallel & Distributed Processing Symposium (IPDPS 2007).
  • 32. Acknowledgements My Ph.D. Committee: Prof. Geoffrey Fox Prof. Andrew Lumsdaine Prof. Dennis Gannon Prof. David Leake SALSA Team @ IU Especially: Judy Qiu, Scott Beason, Thilina Gunarathne, Hui Li Microsoft Research Roger Barge Christophe Poulain
  • 34. Parallel Runtimes – DryadLINQ vs. Hadoop
  • 35. Cluster Configurations DryadLINQ Hadoop / MPI DryadLINQ / MPI

Notas do Editor

  1. Currently uses NaradaBrokering, but it is easily extensible to use any other pub/sub message infrastructure such as Apache ActiveMQ.