SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
Apache HAMA: An Introduction to
Bulk Synchronization Parallel on Hadoop           (Ver 0.1)



         Hyunsik Choi <hyunsik.choi@gmail.com>,
        Edward J. Yoon <edwardyoon@apache.org>,


            Apache Hama Team
Table Of Contents

•   An Overview of the BSP Model
•   Why the BSP on Hadoop?
•   A Comparison to M/R Programming Model
•   What Applications are Appropriate for BSP?
•   The BSP Implementation on Hadoop
•   Some Examples on BSP
•   What's Next?
An Overview of the BSP Model

The BSP (Bulk Synchronous Parallel) is a parallel
computational model that performs a series supersteps.

.   A superstep consists of three ordered stages:
    • Concurrent Computation: Computation on locally stored data
    • Communication: Send and Receive messages in a point-to-point manner
    • Barrier Synchronization: Wait and Synchronize all processors at end of
      superstep

A BSP system is composed of a number of networked computers with
both local memory and disk.

More detailed information
    • http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
A Vertical Structure of Superstep
Why the BSP on Hadoop?

We want to use Hadoop cluster for more complicated
computations
 • But, the MapReduce is not easy to program complex
   mathematical and graphcial operations
   o   e.g., Matrix Multiply, Page-Rank, Breadth-First Search, ..., etc
• Preserving data locality is an important factor in
  distributed/parallel environments.
   o   Data locality is also important in MapReduce for efficient processing.
• However, MapReduce frameworks do not preserve data
  locality in consecutive operations due to its inherent natures.
A Comparison to M/R Programming model

Map/Reduce does any filtering and/or transformations while
each mapper is reading input data split by InputFormat.
 • Simple API
   o   Map and Reduce
• Automatic parallelization and distribution
• During MapReduce processing, it generally passes input data
  through either many passes of MapReduce or MapReduce iteration
  in order to derive final results.
• Partitioner is too simple
   o MR does not provide an way to preserve data locality either a transition
     from Map to Reduce or between MapReduce iteration.
   o It incurs not only intensive communication cost but also unnecessary
     processing cost.
A Comparison to M/R Programming model

The BSP is tailored towards processing data with locality.
 • The BSP enables each peer to communicate only necessary
   data to other peers while peers are preserving data locality.
 • Simple API and Easy Synchronization
 • Like MapReduce, it makes programs to be automatically
   parallelized and distributed across cluster nodes.
What Applications are Appropriate for BSP?

• BSP will show its innate capability in applications with
  following characteristics:
   o Important to data locality
   o Processing data with complicated relations
   o Using many iterations and recursions
The BSP Implementation on Hadoop

• Written in Java
• The BSP package is now available in the Hama repository.
  o   Implementation available for Hadoop version greater than 0.20.x
  o   Allows to develop new applications
• Hadoop RPC is used for BSP peers to communicate each
  other.
• Barrier Sync mechanism is helped by zookeeper.
Serialize Printing of Hello World (shows synchronization)

// BSP start with 10 threads.
// Each thread will have a shuffled ID number from 0 to 9.

for (int i = 0; i < numberOfPeer; i++) {
    if (myId == i) {
      System.out.println("Hello BSP world from " + i + " of " + numberOfPeer);
    }

    peer.sync();
}

// BSP end
Breath-first Search on Graph

public void expand(Map<Vertex, Integer> input, Map<Vertex, Integer> nextQueue) {
   for (Map.Entry<Vertex, Integer> e : input.entrySet()) {
     Vertex vertex = e.getKey();
     if (vertex.isVisited() == true) { // only visit a vertex that has never been visited.
       continue;
     } else {
       vertex.visit();
     }

        // Put vertices adjacent to current vertex into nextQueue with increment distance.
        for (Integer i : vertex.getAdjacent()) {
          if (needToVisit(i, vertex, input.get(e.getKey()))) {
            nextQueue.put(getVertexByKey(i), e.getValue() + 1);
          }
        }
    }
}
Breath-first Search on Graph

public void process() {
   currentQueue = new HashMap<Vertex, Integer>();
   nextQueue = new HashMap<Integer, Integer>();
   currentQueue.put(startVertex,0); // initially put a start vertex into the current queue.

    while (true) {
     expand(currentQueue, nextQueue);

         // The peer.sync method can determine if certain vertex resides
         // on local disk according to vertex id.
         peer.sync(nextQueue); // Synchronize nextQueue with other peers.
                            // At this step, the peer communicates vertex IDs corresponding to
                            // vertices that resides on remote peers.
        if (peer.state == TERMINATE)
          break;
        // Convert nextQueue that contains vertex IDs into currentQueue queue that contains vertices.
        currentQueue = convertQueue(nextQueue);
        nextQueue = new HashMap<Vertex, Integer>();
    }
}
What's Next?

• Novel matrix computation algorithms using BSP
• Angrapa
  o   A graph computation framework based on BSP
  o   API familiar with graph features
• More improved fault tolerance for BSP
  o   Now, BSP has poor fault tolerance mechanisms.
• Support collective and compressed communication
  mechanisms for high performance computing
Who We Are
• Apache Hama Project Team
  o (http://incubator.apache.org/hama)
• Mailing List
  o   hama-user@incubator.apache.org
  o   hama-dev@incubator.apache.org

Mais conteúdo relacionado

Mais procurados

Spark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit
 
GoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analyticsGoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analyticscharithwiki
 
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PCVENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PCQin Liu
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Spark Summit
 
Download It
Download ItDownload It
Download Itbutest
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit
 
Working Experience_V5.0
Working Experience_V5.0Working Experience_V5.0
Working Experience_V5.0Danny Lai
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Ram Sriharsha
 
Map Reduce
Map ReduceMap Reduce
Map Reduceschapht
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...DataStax Academy
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flinkKenny Gorman
 
Data science on big data. Pragmatic approach
Data science on big data. Pragmatic approachData science on big data. Pragmatic approach
Data science on big data. Pragmatic approachPavel Mezentsev
 
Writing an Interactive Interface for SQL on Flink
Writing an Interactive Interface for SQL on FlinkWriting an Interactive Interface for SQL on Flink
Writing an Interactive Interface for SQL on FlinkEventador
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...Flink Forward
 
How c++ stored in ram
How c++ stored in ramHow c++ stored in ram
How c++ stored in ramAshok Raj
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit
 

Mais procurados (20)

Spark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza Karimi
 
GoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analyticsGoFFish - A Sub-graph centric framework for large scale graph analytics
GoFFish - A Sub-graph centric framework for large scale graph analytics
 
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PCVENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
VENUS: Vertex-Centric Streamlined Graph Computation on a Single PC
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
 
Apache MXNet AI
Apache MXNet AIApache MXNet AI
Apache MXNet AI
 
Download It
Download ItDownload It
Download It
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
 
Working Experience_V5.0
Working Experience_V5.0Working Experience_V5.0
Working Experience_V5.0
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
 
Streaming sql w kafka and flink
Streaming sql w  kafka and flinkStreaming sql w  kafka and flink
Streaming sql w kafka and flink
 
Data science on big data. Pragmatic approach
Data science on big data. Pragmatic approachData science on big data. Pragmatic approach
Data science on big data. Pragmatic approach
 
Scalding intro 20141125
Scalding intro 20141125Scalding intro 20141125
Scalding intro 20141125
 
Helm
HelmHelm
Helm
 
Writing an Interactive Interface for SQL on Flink
Writing an Interactive Interface for SQL on FlinkWriting an Interactive Interface for SQL on Flink
Writing an Interactive Interface for SQL on Flink
 
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck -  Pravega: Storage Rei...
Flink Forward SF 2017: Srikanth Satya & Tom Kaitchuck - Pravega: Storage Rei...
 
How c++ stored in ram
How c++ stored in ramHow c++ stored in ram
How c++ stored in ram
 
Hadoop - Apache Pig
Hadoop - Apache PigHadoop - Apache Pig
Hadoop - Apache Pig
 
Spark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca CanaliSpark Summit EU talk by Luca Canali
Spark Summit EU talk by Luca Canali
 

Semelhante a Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop

Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine OverviewKunal Gupta
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processingsscdotopen
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Soumee Maschatak
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online trainingHarika583
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkJames Chen
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfishFei Dong
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigDataThanusha154
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map ReduceUrvashi Kataria
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
FinalprojectpresentationSANTOSH WAYAL
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Provectus
 
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopHadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopYahoo Developer Network
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfssusere05ec21
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigKhanKhaja1
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airshipdave_revell
 
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop EcosystemUnveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystemmashoodsyed66
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 

Semelhante a Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop (20)

Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)
 
Hadoop live online training
Hadoop live online trainingHadoop live online training
Hadoop live online training
 
Apache Crunch
Apache CrunchApache Crunch
Apache Crunch
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfish
 
Learn what is Hadoop-and-BigData
Learn  what is Hadoop-and-BigDataLearn  what is Hadoop-and-BigData
Learn what is Hadoop-and-BigData
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
 
Finalprojectpresentation
FinalprojectpresentationFinalprojectpresentation
Finalprojectpresentation
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing HadoopHadoop Summit 2010 Benchmarking And Optimizing Hadoop
Hadoop Summit 2010 Benchmarking And Optimizing Hadoop
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
 
Hadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pigHadoop eco system with mapreduce hive and pig
Hadoop eco system with mapreduce hive and pig
 
HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
 
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop EcosystemUnveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
Unveiling Hive: A Comprehensive Exploration of Hive in Hadoop Ecosystem
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 

Último

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Apache HAMA: An Introduction toBulk Synchronization Parallel on Hadoop

  • 1. Apache HAMA: An Introduction to Bulk Synchronization Parallel on Hadoop (Ver 0.1) Hyunsik Choi <hyunsik.choi@gmail.com>, Edward J. Yoon <edwardyoon@apache.org>, Apache Hama Team
  • 2. Table Of Contents • An Overview of the BSP Model • Why the BSP on Hadoop? • A Comparison to M/R Programming Model • What Applications are Appropriate for BSP? • The BSP Implementation on Hadoop • Some Examples on BSP • What's Next?
  • 3. An Overview of the BSP Model The BSP (Bulk Synchronous Parallel) is a parallel computational model that performs a series supersteps. . A superstep consists of three ordered stages: • Concurrent Computation: Computation on locally stored data • Communication: Send and Receive messages in a point-to-point manner • Barrier Synchronization: Wait and Synchronize all processors at end of superstep A BSP system is composed of a number of networked computers with both local memory and disk. More detailed information • http://en.wikipedia.org/wiki/Bulk_synchronous_parallel
  • 4. A Vertical Structure of Superstep
  • 5. Why the BSP on Hadoop? We want to use Hadoop cluster for more complicated computations • But, the MapReduce is not easy to program complex mathematical and graphcial operations o e.g., Matrix Multiply, Page-Rank, Breadth-First Search, ..., etc • Preserving data locality is an important factor in distributed/parallel environments. o Data locality is also important in MapReduce for efficient processing. • However, MapReduce frameworks do not preserve data locality in consecutive operations due to its inherent natures.
  • 6. A Comparison to M/R Programming model Map/Reduce does any filtering and/or transformations while each mapper is reading input data split by InputFormat. • Simple API o Map and Reduce • Automatic parallelization and distribution • During MapReduce processing, it generally passes input data through either many passes of MapReduce or MapReduce iteration in order to derive final results. • Partitioner is too simple o MR does not provide an way to preserve data locality either a transition from Map to Reduce or between MapReduce iteration. o It incurs not only intensive communication cost but also unnecessary processing cost.
  • 7. A Comparison to M/R Programming model The BSP is tailored towards processing data with locality. • The BSP enables each peer to communicate only necessary data to other peers while peers are preserving data locality. • Simple API and Easy Synchronization • Like MapReduce, it makes programs to be automatically parallelized and distributed across cluster nodes.
  • 8. What Applications are Appropriate for BSP? • BSP will show its innate capability in applications with following characteristics: o Important to data locality o Processing data with complicated relations o Using many iterations and recursions
  • 9. The BSP Implementation on Hadoop • Written in Java • The BSP package is now available in the Hama repository. o Implementation available for Hadoop version greater than 0.20.x o Allows to develop new applications • Hadoop RPC is used for BSP peers to communicate each other. • Barrier Sync mechanism is helped by zookeeper.
  • 10. Serialize Printing of Hello World (shows synchronization) // BSP start with 10 threads. // Each thread will have a shuffled ID number from 0 to 9. for (int i = 0; i < numberOfPeer; i++) { if (myId == i) { System.out.println("Hello BSP world from " + i + " of " + numberOfPeer); } peer.sync(); } // BSP end
  • 11. Breath-first Search on Graph public void expand(Map<Vertex, Integer> input, Map<Vertex, Integer> nextQueue) { for (Map.Entry<Vertex, Integer> e : input.entrySet()) { Vertex vertex = e.getKey(); if (vertex.isVisited() == true) { // only visit a vertex that has never been visited. continue; } else { vertex.visit(); } // Put vertices adjacent to current vertex into nextQueue with increment distance. for (Integer i : vertex.getAdjacent()) { if (needToVisit(i, vertex, input.get(e.getKey()))) { nextQueue.put(getVertexByKey(i), e.getValue() + 1); } } } }
  • 12. Breath-first Search on Graph public void process() { currentQueue = new HashMap<Vertex, Integer>(); nextQueue = new HashMap<Integer, Integer>(); currentQueue.put(startVertex,0); // initially put a start vertex into the current queue. while (true) { expand(currentQueue, nextQueue); // The peer.sync method can determine if certain vertex resides // on local disk according to vertex id. peer.sync(nextQueue); // Synchronize nextQueue with other peers. // At this step, the peer communicates vertex IDs corresponding to // vertices that resides on remote peers. if (peer.state == TERMINATE) break; // Convert nextQueue that contains vertex IDs into currentQueue queue that contains vertices. currentQueue = convertQueue(nextQueue); nextQueue = new HashMap<Vertex, Integer>(); } }
  • 13. What's Next? • Novel matrix computation algorithms using BSP • Angrapa o A graph computation framework based on BSP o API familiar with graph features • More improved fault tolerance for BSP o Now, BSP has poor fault tolerance mechanisms. • Support collective and compressed communication mechanisms for high performance computing
  • 14. Who We Are • Apache Hama Project Team o (http://incubator.apache.org/hama) • Mailing List o hama-user@incubator.apache.org o hama-dev@incubator.apache.org