SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline

• Automatic         parallelization & distribution

• Fault-tolerant

• Provides        status and monitoring tools

• Clean         abstraction for programmers
          • Google

               •             Map reduce

               •     Page rank, crawler, google map

          • Hadoop


               •             Map function, reduce function

          • Qizmt


               •     C#               Map function, reduce function

          •    etc

               •     C++, C#, Java, Haskell


               map f lst: (’a->’b) -> (’a list) -> (’b list)
                      <key, value>                           .
               (= fold, accumulate, compress, inject)

               fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b
                  accumulator                            .     key
                        value        reduce                          .
MAPREDUCE                                                                                                        ?
                                          28   CHAPTER 2       THE BASICS OF A MAPREDUCE JOB

                                                                                       Provided by Hadoop
                                                       Provided by User                    Framework

                                                       Job Configuration

                                                                                         Input Splitting &

                                                         Input Format
                                                                                        Start of Individual

               • Input   format                         Input Locations                     Map Tasks

                                                         Map Function

               • Input   location                         Number of                    Shuffle, Partition/Sort
                                                         Reduce Tasks                    per Map Output

                                                       Reduce Function

               • Map   function                             Output
                                                                                         Merge Sort for
                                                                                       Map Outputs for Each
                                                           Key Type                       Reduce Task

               • Reduce    function                          Output
                                                           Value Type                    Start of Individual
                                                                                           Reduce Tasks

               • Output    format
                                                        Output Format

                                                        Output Location
                                                                                           Collection of
                                                                                           Final Output
               • Output    location
                                               Figure 2-1. Parts of a MapReduce job

	    	    	                                        The user is responsible for handling the job setup, specifying the input
          Input          Map                 Shuffle         Reduce         Output

                                Logical Flow
                                            Key        key              Reduce
                   2. map()                                  reduce()
                  (key,val) pairs
               Physical Flow
               Physical Flow

                  PROGRAM            Map function              Reduce function
                Distributed Grep     matched lines                     pass
     Reverse Web link graph <target, source>                  <target, list(src)>
          URL                          <URL, 1>              <URL, total count>
          Term-Vector per Host     <hostname, term-vector>   <hostname, all-term-vector>

                 Inverted Index     <word, doc id> <word, list(doc id)>
                Distributed Sort      <key,value>                      pass
                                            - HADOOP
                                                  THE BASICS OF MULTIMACHINE CLUSTERS

                                 Enable Job Control Options on the Web Interfaces
• Master                         Both the JobTracker and the NameNode provide a web interface for monitori
                                 trol. By default, the JobTracker provides web service on
                                 the NameNode provides web service on                                  . If the

      • Name       node                             parameter is set to    , the JobTracker web interface will ad
                                 and Change Job Priority options to the per-job detail page. The default locatio
                                 tional options is the bottom-left corner of the page (so you usually need to scr
                                 page to see them).

      • Job     tracker
                                 A Sample Cluster Configuration
                                 In this section, we will walk through a simple configuration of a six-node Had
• Slave(        =Worker )        cluster will be composed of six machines:           ,        ,         ,
                                          . The JobTracker and NameNode will reside on the machine
                                 NameNode will be placed on             . The DataNodes and TaskTrackers will b
                                 the same machines, and the nodes will be named             through          . Fi
      • Data     node            this setup.

                                         Master                    Slave01
      • Task     tracker
                                  http://master:50070/            Datanode
                                       JobTracker                     Datanode
                                  http://master:50030/                   Datanode

                                 Figure 3-2. A simple six-node cluster
MAP REDUCE                                           - GOOGLE
1.                 16MB ~ 64MB                  .
2.                               Master
       . Worker                   Master                       (map
     task, reduce task)             . master idle worker
3.   Map task                 worker
           map                           immediate key/value pair
4.                     pair                                          ,
                          Reduce                       .
           pair       master                       . master map worker
                                     reduce worker
5.   reduce worker master                                   , RPC
             map worker buffered data( immediate key/value
     pairs )            .                  immediate key
                  external sort             .
6.   reduce worker                                          ,
     reduce                     . reduce
                      (                       )
7.         map            reduce                 ,         user program
                 ,                                    MapReduce

•                             (DFS)

         • Google   Map reduce - Bigtable

         • Hadoop   - HBase

         • Hypertable   ( commercial )
                   Google Mapreduce example
                          Word count
               ence. Concurrency and Computation: Practice and Ex-                  input->set_filepattern(argv[i]);
                                                                                                                       class Adder : public Reducer {
               perience, 2004.                                                      input->set_mapper_class("WordCounter");
                                                      [11] William Gropp, Ewing Lusk, and Anthony Skjellum.              virtual void Reduce(ReduceInput* input) {
                                                             Using MPI: Portable Parallel Programming with the
          [17] L. G. Valiant. A bridging model for parallel computation.                                                   // Iterate over all entries with the
                                                             Message-Passing Interface. MIT Press, Cambridge, MA,          // same key and add the values
               Communications of the ACM, 33(8):103–111, 1997.                    // Specify the output files:
                                                                                                                           int64 value = 0;
                                                             1999.                //     /gfs/test/freq-00000-of-00100
          [18] Jim Wyllie. Spsort: How to sort a terabyte quickly.                //     /gfs/test/freq-00001-of-00100
                                                                                                                           while (!input->done()) {

                   EXAMPLE - WORDCOUNT
      L. Huston, R. Sukthankar, R. Wickremesinghe, M. Satya-
                                                      [12]                        //     ...
                                                                                                                             value += StringToInt(input->value());
                                                             narayanan, G. R. Ganger, E. Riedel, and A. out = spec.output(); input->NextValue();
                                                                                  MapReduceOutput*      Ailamaki. Di-
                                                             amond: A storage architecture for early discard in inter-     }
          A Word Frequency                                   active search. In Proceedings of the 2004 USENIX File
                                                                                                                           // Emit sum for input->key()
                                                             and Storage Technologies FAST Conference, April 2004.
          This section contains a program that counts the number
                                                  [13] Richard E. Ladner and Michael J. Fischer. Parallel prefix    }
          of occurrences of each unique word in a set of input files Journal ofOptional: do partial 1980. within map
                                                       computation.       // the ACM, 27(4):831–838, sums
          specified on the command line.                                   // tasks to save network bandwidth
                                                  [14] Michael O. Rabin. Efficient dispersal of information for
                                                 security, load balancing and fault tolerance. Journal of int main(int argc, char** argv) {
          #include "mapreduce/mapreduce.h"       the ACM, 36(2):335–348, 1989. parameters: use at most ParseCommandLineFlags(argc, argv);
                                                                     // Tuning                              2000
          // User’s map function                                     // Faloutsos, Garth A. Gibson, and
                                            [15] Erik Riedel, Christos   machines and 100 MB of memory per task
                                                                                                            MapReduceSpecification spec;
          class WordCounter : public Mapper {    David Nagle. Active disks for large-scale data process-
           public:                                                   spec.set_map_megabytes(100);
                                                 ing. IEEE Computer, pages 68–74, June 2001.
                                                                                                            // Store list of input files into "spec"
               virtual void Map(const MapInput& input) {                                                           for (int i = 1; i < argc; i++) {
                                                   [16] Douglas Thain, Todd Tannenbaum, and Miron Livny.
                  const string& text = input.value();                                                                MapReduceInput* input = spec.add_input();
                  const int n = text.size();            Distributed computing in practice:it
                                                                             // Now run The Condor experi-
                                                                             MapReduceResult result;
                  for (int i = 0; i < n; ) {            ence. Concurrency if (!MapReduce(spec, &result)) abort();
                                                                             and Computation: Practice and Ex-       input->set_filepattern(argv[i]);
                       // Skip past leading whitespace perience, 2004.                                               input->set_mapper_class("WordCounter");
                       while ((i < n) && isspace(text[i]))                                                         }
                         i++;                      [17] L. G. Valiant. A bridging model ’result’ computation. contains info
                                                                             // Done: for parallel structure
                                                        Communications of the ACM, 33(8):103–111,time taken, number of
                                                                             // about counters,
                                                                                                     1997.         // Specify the output files:
                       // Find word end                                      // machines used, etc.
                                                                                                                   //     /gfs/test/freq-00000-of-00100
                       int start = i;              [18] Jim Wyllie. Spsort: How to sort a terabyte quickly.        //     /gfs/test/freq-00001-of-00100
                       while ((i < n) && !isspace(text[i]))                  return 0;
                                                                                                                   //     ...
                         i++;                                             }
                                                                                                                   MapReduceOutput* out = spec.output();
                   if (start < i)
                      if (start < i)
                                                   A Word Frequency                                                out->set_num_tasks(100);
 ni-                    Emit(text.substr(start,i-start),"1");                                                      out->set_format("text");
gni-       To}} Emit(text.substr(start,i-start),"1");
                 appear in OSDI 2004                                                                                          13
 96.                                               This section contains a program that counts the number
 ’96.       }
ence     }; }                                      of occurrences of each unique word in a set of input files       // Optional: do partial sums within map
                                                   specified on the command line.                                   // tasks to save network bandwidth
 ge.                                                                                                               out->set_combiner_class("Adder");
 age.    // User’s reduce function
          // User’s reduce function                #include "mapreduce/mapreduce.h"
         class Adder : public Reducer {                                                                            // Tuning parameters: use at most 2000
um.       class Adder : public Reducer {
            virtual void Reduce(ReduceInput* // User’s map function
                                                   input) {                                                        // machines and 100 MB of memory per task
 um.          virtual void Reduce(ReduceInput* input) {
 the            // Iterate over all entries with the WordCounter : public Mapper {
                                                   class                                                           spec.set_machines(2000);
  the            // Iterate over all entries with the
                // same key and add the values public:                                                             spec.set_map_megabytes(100);
MA,              // same key and add the values
MA,             int64 value = 0;                                                                                   spec.set_reduce_megabytes(100);
                 int64 value = 0;                     virtual void Map(const MapInput& input) {
                while (!input->done()) {                const string& text = input.value();
                 while (!input->done()) {
 ya-               value += StringToInt(input->value()); int n = text.size();                                      // Now run it
 tya-                                                   const
                      value += StringToInt(input->value());
Di-                input->NextValue();                  for (int i = 0; i < n; ) {                                 MapReduceResult result;
  Di-           }
                                                                                                                   if (!MapReduce(spec, &result)) abort();
  er-            }                                         // Skip past leading whitespace
File                                                       while ((i < n) && isspace(text[i]))
 File           // Emit sum for input->key()                  i++;                                                 // Done: ’result’ structure contains info
04.              // Emit sum for input->key()
 04.            Emit(IntToString(value));
                                                                                                                   // about counters, time taken, number of
efix         }                                              // Find word end                                        // machines used, etc.
 efix          }
80.      };                                                int start = i;
980.      };
         REGISTER_REDUCER(Adder);                          while ((i < n) && !isspace(text[i]))                    return 0;
 for                                                                                                             }
   for                                                        i++;
   of    int main(int argc, char** argv) {
 l of     int main(int argc, char** argv) {
            ParseCommandLineFlags(argc, argv);
            	 ParseCommandLineFlags(argc, argv);
               Qizmt - Map reduce framework on Windows
                 •   C#             mapreducer job
                 •   Built-in IDE/Debugger
                 •                             mapreducer job          /      /   /
                 •   Delta-only exchange option for Mapreduce jobs
                 •               /
                 •   Easily add machines to a cluster to increase processing power and capacity
                 •   CAC (Cluster Assembly Cache) for exposing .Net DLLs to mapreduce jobs
                 •             Job
                      ◦ Mapreduce -
                      ◦ Remote -                               (                            )
                      ◦ Local - For orchestrating a pipeline of Mapreducer and Remote jobs
                      ◦   Sorted - Shuffle         Key        (                             )
                      ◦   Grouped -
                      ◦   Hashsorted - core            hashtable       , Key                    .

          Input                      Map                   Shuffle                   Reduce             Output

                             1.                map()
                                                           Sorted /            key                  Reduce
                              2. map()                    Grouped /                  reduce()
                             (key,val) pairs              Hashsorted

•        Hadoop


         •                 C++   map, reduce

         •        But, cygwin

•        Qizmt

         •                                  ‘              ’   .


         •        Master            .

         •                 IDE          .

         •                                              .

         •                                      -
Map reduce

Mais conteúdo relacionado


Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014soujavajug
Secrets in Kubernetes
Secrets in KubernetesSecrets in Kubernetes
Secrets in KubernetesJerry Jalava
Talend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewTalend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewRajan Kanitkar
Statistical Significance | Statistics
Statistical Significance | StatisticsStatistical Significance | Statistics
Statistical Significance | StatisticsTransweb Global Inc
FTP Client and Server | Computer Science
FTP Client and Server | Computer ScienceFTP Client and Server | Computer Science
FTP Client and Server | Computer ScienceTransweb Global Inc
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceSachin Aggarwal
Spark architecture
Spark architectureSpark architecture
Spark architecturedatamantra
Client server architecture
Client server architectureClient server architecture
Client server architectureBhargav Amin
Lecture 5 6 .ad hoc network
Lecture 5 6 .ad hoc networkLecture 5 6 .ad hoc network
Lecture 5 6 .ad hoc networkChandra Meena
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsAnton Kirillov
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advancedChirag Ahuja
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra

Destaque (20)

Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Hadoop - Introduction to map reduce programming - Reunião 12/04/2014
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
Big data gaurav
Big data gauravBig data gaurav
Big data gaurav
Secrets in Kubernetes
Secrets in KubernetesSecrets in Kubernetes
Secrets in Kubernetes
Talend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewTalend Big Data Capabilities Overview
Talend Big Data Capabilities Overview
Statistical Significance | Statistics
Statistical Significance | StatisticsStatistical Significance | Statistics
Statistical Significance | Statistics
Hadoop File System Shell Commands,
Hadoop File System Shell Commands,Hadoop File System Shell Commands,
Hadoop File System Shell Commands,
FTP Client and Server | Computer Science
FTP Client and Server | Computer ScienceFTP Client and Server | Computer Science
FTP Client and Server | Computer Science
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
Spark architecture
Spark architectureSpark architecture
Spark architecture
Ad hoc networks
Ad hoc networksAd hoc networks
Ad hoc networks
MPP vs Hadoop
MPP vs HadoopMPP vs Hadoop
MPP vs Hadoop
Client server architecture
Client server architectureClient server architecture
Client server architecture
Lecture 5 6 .ad hoc network
Lecture 5 6 .ad hoc networkLecture 5 6 .ad hoc network
Lecture 5 6 .ad hoc network
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Density Function | Statistics
Density Function | StatisticsDensity Function | Statistics
Density Function | Statistics
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark

Semelhante a Map reduce

Buzz words
Buzz wordsBuzz words
Buzz wordscwensel
Big Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindBig Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindEMC
サンプルから見るMap reduceコード
サンプルから見るMap reduceコードサンプルから見るMap reduceコード
サンプルから見るMap reduceコードShinpei Ohtani
サンプルから見るMapReduceコードShinpei Ohtani
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
Zh Tw Introduction To Map Reduce
Zh Tw Introduction To Map ReduceZh Tw Introduction To Map Reduce
Zh Tw Introduction To Map Reducekevin liao
Adaptive MapReduce using Situation-Aware Mappers
Adaptive MapReduce using Situation-Aware MappersAdaptive MapReduce using Situation-Aware Mappers
Adaptive MapReduce using Situation-Aware Mappersrvernica
FME's Role in a Map Revision Production Workflow and R&D Environment
FME's Role in a Map Revision Production Workflow and R&D EnvironmentFME's Role in a Map Revision Production Workflow and R&D Environment
FME's Role in a Map Revision Production Workflow and R&D EnvironmentSafe Software
Introduction to MapReduce using Disco
Introduction to MapReduce using DiscoIntroduction to MapReduce using Disco
Introduction to MapReduce using DiscoJim Roepcke
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmNilaNila16
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceDr Ganesh Iyer
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce AnandMHadoop
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Matthew Lease
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...kcitp
Introduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdfIntroduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdfBikalAdhikari4

Semelhante a Map reduce (20)

Buzz words
Buzz wordsBuzz words
Buzz words
Big Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindBig Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilind
サンプルから見るMap reduceコード
サンプルから見るMap reduceコードサンプルから見るMap reduceコード
サンプルから見るMap reduceコード
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Zh Tw Introduction To Map Reduce
Zh Tw Introduction To Map ReduceZh Tw Introduction To Map Reduce
Zh Tw Introduction To Map Reduce
Adaptive MapReduce using Situation-Aware Mappers
Adaptive MapReduce using Situation-Aware MappersAdaptive MapReduce using Situation-Aware Mappers
Adaptive MapReduce using Situation-Aware Mappers
Unit3 MapReduce
Unit3 MapReduceUnit3 MapReduce
Unit3 MapReduce
FME's Role in a Map Revision Production Workflow and R&D Environment
FME's Role in a Map Revision Production Workflow and R&D EnvironmentFME's Role in a Map Revision Production Workflow and R&D Environment
FME's Role in a Map Revision Production Workflow and R&D Environment
Introduction to MapReduce using Disco
Introduction to MapReduce using DiscoIntroduction to MapReduce using Disco
Introduction to MapReduce using Disco
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
Session 19 - MapReduce
Session 19  - MapReduce Session 19  - MapReduce
Session 19 - MapReduce
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Lecture 2: Data-Intensive Computing for Text Analysis (Fall 2011)
Map Reduce
Map ReduceMap Reduce
Map Reduce
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
Kansas City Big Data: The Future Of Insights - Keynote: "Big Data Technologie...
Introduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdfIntroduction to the Map-Reduce framework.pdf
Introduction to the Map-Reduce framework.pdf

Mais de Hyosung Jeon

windows via c++ Ch 5. Job
windows via c++ Ch 5. Jobwindows via c++ Ch 5. Job
windows via c++ Ch 5. JobHyosung Jeon
9장 도메인 주도 설계
9장 도메인 주도 설계9장 도메인 주도 설계
9장 도메인 주도 설계Hyosung Jeon
Mongo db 복제(Replication)
Mongo db 복제(Replication)Mongo db 복제(Replication)
Mongo db 복제(Replication)Hyosung Jeon
xUnitTestPattern/chapter12Hyosung Jeon
목적이 부여된 에이전트 행동
목적이 부여된 에이전트 행동목적이 부여된 에이전트 행동
목적이 부여된 에이전트 행동Hyosung Jeon

Mais de Hyosung Jeon (7)

Nodejs express
Nodejs expressNodejs express
Nodejs express
windows via c++ Ch 5. Job
windows via c++ Ch 5. Jobwindows via c++ Ch 5. Job
windows via c++ Ch 5. Job
9장 도메인 주도 설계
9장 도메인 주도 설계9장 도메인 주도 설계
9장 도메인 주도 설계
Mongo db 복제(Replication)
Mongo db 복제(Replication)Mongo db 복제(Replication)
Mongo db 복제(Replication)
목적이 부여된 에이전트 행동
목적이 부여된 에이전트 행동목적이 부여된 에이전트 행동
목적이 부여된 에이전트 행동


A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda

Último (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf

Map reduce

  • 2.
  • 3. • Automatic parallelization & distribution • Fault-tolerant • Provides status and monitoring tools • Clean abstraction for programmers
  • 4. MAP REDUCE • Google • Map reduce • Page rank, crawler, google map • Hadoop • • Map function, reduce function • Qizmt • • C# Map function, reduce function • etc • C++, C#, Java, Haskell •
  • 5. MAP map f lst: (’a->’b) -> (’a list) -> (’b list) f . <key, value> .
  • 6. REDUCE (= fold, accumulate, compress, inject) fold f x0 lst: ('a*'b->'b)->'b->('a list)->'b ,f accumulator . key value reduce .
  • 7. MAPREDUCE ? 28 CHAPTER 2 THE BASICS OF A MAPREDUCE JOB Provided by Hadoop Provided by User Framework • Job Configuration . Input Splitting & Distribution Input Format Start of Individual • Input format Input Locations Map Tasks Map Function • Input location Number of Shuffle, Partition/Sort Reduce Tasks per Map Output Reduce Function • Map function Output Merge Sort for Map Outputs for Each Key Type Reduce Task • Reduce function Output Value Type Start of Individual Reduce Tasks • Output format Output Format Output Location Collection of Final Output • Output location Figure 2-1. Parts of a MapReduce job The user is responsible for handling the job setup, specifying the input
  • 8. MAP REDUCE Input Map Shuffle Reduce Output 1. Logical Flow map() Key key Reduce 2. map() reduce() (key,val) pairs
  • 9. MAP REDUCE Physical Flow
  • 10. MAP REDUCE Physical Flow Job
  • 11. ? PROGRAM Map function Reduce function Distributed Grep matched lines pass Reverse Web link graph <target, source> <target, list(src)> URL <URL, 1> <URL, total count> Term-Vector per Host <hostname, term-vector> <hostname, all-term-vector> Inverted Index <word, doc id> <word, list(doc id)> Distributed Sort <key,value> pass
  • 12.
  • 13. CLUSTER 80 CHAPTER 3 - HADOOP THE BASICS OF MULTIMACHINE CLUSTERS Enable Job Control Options on the Web Interfaces • Master Both the JobTracker and the NameNode provide a web interface for monitori trol. By default, the JobTracker provides web service on the NameNode provides web service on . If the • Name node parameter is set to , the JobTracker web interface will ad and Change Job Priority options to the per-job detail page. The default locatio tional options is the bottom-left corner of the page (so you usually need to scr page to see them). • Job tracker A Sample Cluster Configuration In this section, we will walk through a simple configuration of a six-node Had • Slave( =Worker ) cluster will be composed of six machines: , , , . The JobTracker and NameNode will reside on the machine NameNode will be placed on . The DataNodes and TaskTrackers will b the same machines, and the nodes will be named through . Fi • Data node this setup. Master Slave01 NameNode • Task tracker Slave02 http://master:50070/ Datanode Slave03 JobTracker Datanode TasktrackerSlave04 http://master:50030/ Datanode TasktrackerSlave05 Datanode Tasktracker DataNode Tasktracker TaskTracker Figure 3-2. A simple six-node cluster
  • 14. MAP REDUCE - GOOGLE 1. 16MB ~ 64MB . . 2. Master . Worker Master (map task, reduce task) . master idle worker . 3. Map task worker map immediate key/value pair . 4. pair , Reduce . pair master . master map worker reduce worker . 5. reduce worker master , RPC map worker buffered data( immediate key/value pairs ) . immediate key . external sort . 6. reduce worker , . reduce . reduce ( ) 7. map reduce , user program , MapReduce .
  • 15. ? • (DFS) • Google Map reduce - Bigtable • Hadoop - HBase • Hypertable ( commercial )
  • 16. EXAMPLE SOURCE CODE Google Mapreduce example Word count
  • 17. ence. Concurrency and Computation: Practice and Ex- input->set_filepattern(argv[i]); class Adder : public Reducer { perience, 2004. input->set_mapper_class("WordCounter"); [11] William Gropp, Ewing Lusk, and Anthony Skjellum. virtual void Reduce(ReduceInput* input) { } Using MPI: Portable Parallel Programming with the [17] L. G. Valiant. A bridging model for parallel computation. // Iterate over all entries with the Message-Passing Interface. MIT Press, Cambridge, MA, // same key and add the values Communications of the ACM, 33(8):103–111, 1997. // Specify the output files: int64 value = 0; 1999. // /gfs/test/freq-00000-of-00100 [18] Jim Wyllie. Spsort: How to sort a terabyte quickly. // /gfs/test/freq-00001-of-00100 while (!input->done()) { EXAMPLE - WORDCOUNT L. Huston, R. Sukthankar, R. Wickremesinghe, M. Satya- [12] // ... value += StringToInt(input->value()); narayanan, G. R. Ganger, E. Riedel, and A. out = spec.output(); input->NextValue(); MapReduceOutput* Ailamaki. Di- amond: A storage architecture for early discard in inter- } out->set_filebase("/gfs/test/freq"); A Word Frequency active search. In Proceedings of the 2004 USENIX File out->set_num_tasks(100); // Emit sum for input->key() and Storage Technologies FAST Conference, April 2004. out->set_format("text"); Emit(IntToString(value)); out->set_reducer_class("Adder"); This section contains a program that counts the number [13] Richard E. Ladner and Michael J. Fischer. Parallel prefix } }; of occurrences of each unique word in a set of input files Journal ofOptional: do partial 1980. within map computation. // the ACM, 27(4):831–838, sums REGISTER_REDUCER(Adder); specified on the command line. // tasks to save network bandwidth [14] Michael O. Rabin. Efficient dispersal of information for out->set_combiner_class("Adder"); security, load balancing and fault tolerance. Journal of int main(int argc, char** argv) { #include "mapreduce/mapreduce.h" the ACM, 36(2):335–348, 1989. parameters: use at most ParseCommandLineFlags(argc, argv); // Tuning 2000 // User’s map function // Faloutsos, Garth A. Gibson, and [15] Erik Riedel, Christos machines and 100 MB of memory per task MapReduceSpecification spec; spec.set_machines(2000); class WordCounter : public Mapper { David Nagle. Active disks for large-scale data process- public: spec.set_map_megabytes(100); ing. IEEE Computer, pages 68–74, June 2001. spec.set_reduce_megabytes(100); // Store list of input files into "spec" virtual void Map(const MapInput& input) { for (int i = 1; i < argc; i++) { [16] Douglas Thain, Todd Tannenbaum, and Miron Livny. const string& text = input.value(); MapReduceInput* input = spec.add_input(); const int n = text.size(); Distributed computing in practice:it // Now run The Condor experi- input->set_format("text"); MapReduceResult result; for (int i = 0; i < n; ) { ence. Concurrency if (!MapReduce(spec, &result)) abort(); and Computation: Practice and Ex- input->set_filepattern(argv[i]); // Skip past leading whitespace perience, 2004. input->set_mapper_class("WordCounter"); while ((i < n) && isspace(text[i])) } i++; [17] L. G. Valiant. A bridging model ’result’ computation. contains info // Done: for parallel structure Communications of the ACM, 33(8):103–111,time taken, number of // about counters, 1997. // Specify the output files: // Find word end // machines used, etc. // /gfs/test/freq-00000-of-00100 int start = i; [18] Jim Wyllie. Spsort: How to sort a terabyte quickly. // /gfs/test/freq-00001-of-00100 while ((i < n) && !isspace(text[i])) return 0; // ... i++; } MapReduceOutput* out = spec.output(); out->set_filebase("/gfs/test/freq"); can scan if (start < i) if (start < i) A Word Frequency out->set_num_tasks(100); ni- Emit(text.substr(start,i-start),"1"); out->set_format("text"); gni- To}} Emit(text.substr(start,i-start),"1"); appear in OSDI 2004 13 out->set_reducer_class("Adder"); 96. This section contains a program that counts the number ’96. } nce ence }; } of occurrences of each unique word in a set of input files // Optional: do partial sums within map }; REGISTER_MAPPER(WordCounter); REGISTER_MAPPER(WordCounter); specified on the command line. // tasks to save network bandwidth ge. out->set_combiner_class("Adder"); age. // User’s reduce function // User’s reduce function #include "mapreduce/mapreduce.h" class Adder : public Reducer { // Tuning parameters: use at most 2000 um. class Adder : public Reducer { virtual void Reduce(ReduceInput* // User’s map function input) { // machines and 100 MB of memory per task um. virtual void Reduce(ReduceInput* input) { the // Iterate over all entries with the WordCounter : public Mapper { class spec.set_machines(2000); the // Iterate over all entries with the // same key and add the values public: spec.set_map_megabytes(100); MA, // same key and add the values MA, int64 value = 0; spec.set_reduce_megabytes(100); int64 value = 0; virtual void Map(const MapInput& input) { while (!input->done()) { const string& text = input.value(); while (!input->done()) { ya- value += StringToInt(input->value()); int n = text.size(); // Now run it tya- const value += StringToInt(input->value()); Di- input->NextValue(); for (int i = 0; i < n; ) { MapReduceResult result; Di- } input->NextValue(); if (!MapReduce(spec, &result)) abort(); er- } // Skip past leading whitespace nter- File while ((i < n) && isspace(text[i])) File // Emit sum for input->key() i++; // Done: ’result’ structure contains info 04. // Emit sum for input->key() 04. Emit(IntToString(value)); Emit(IntToString(value)); // about counters, time taken, number of efix } // Find word end // machines used, etc. efix } 80. }; int start = i; 980. }; REGISTER_REDUCER(Adder); while ((i < n) && !isspace(text[i])) return 0; REGISTER_REDUCER(Adder); for } for i++; of int main(int argc, char** argv) { l of int main(int argc, char** argv) { ParseCommandLineFlags(argc, argv); ParseCommandLineFlags(argc, argv);
  • 18. QIZMT Qizmt - Map reduce framework on Windows
  • 20. CORE MYSPACE QIZMT FEATURES • C# mapreducer job • • Built-in IDE/Debugger • mapreducer job / / / • Delta-only exchange option for Mapreduce jobs • / • Easily add machines to a cluster to increase processing power and capacity • CAC (Cluster Assembly Cache) for exposing .Net DLLs to mapreduce jobs • Job ◦ Mapreduce - ◦ Remote - ( ) ◦ Local - For orchestrating a pipeline of Mapreducer and Remote jobs • ◦ Sorted - Shuffle Key ( ) ◦ Grouped - ◦ Hashsorted - core hashtable , Key . Input Map Shuffle Reduce Output 1. map() Sorted / key Reduce 2. map() Grouped / reduce() (key,val) pairs Hashsorted
  • 23.
  • 24. Hadoop • • C++ map, reduce • But, cygwin • Qizmt • ‘ ’ . • • Master . • IDE . • . • -
  • 25. Q&A