SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
2011/7/10 @a_bicky
• Takeshi Arabiki
    ‣
    ‣   Twitter: @a_bicky

    ‣         : id:a_bicky


•
                                    R


•
                 http://d.hatena.ne.jp/a_bicky/
• MapReduce
•           MapReduce
• MapReduce
•           MapReduce
•
•
MapReduce
MapReduce
•                                         TB PB

    Facebook   20TB

•                                 ”   ”
                      ”     ”


    ‣

    ‣
    ‣                      etc.

    ↑ MPI                                  orz

                          MapReduce
MapReduce
• Google

•                         map               reduce
    >>> map(lambda x: x ** 2, range(1, 6))      map
    [1, 4, 9, 16, 25]
    >>> reduce(lambda a, b: a + b, range(1, 6)) reduce
    15

•                                                                          OK

•

                                                                     KVS

                                 MapReduce               Big Table

                                        Google File System
Hadoop

• Google
•            MapReduce            Hadoop   MapReduce



                Google                      Hadoop
                                  KVS                                    KVS
                                               Hadoop
    MapReduce         Big Table                                  HBase
                                              MapReduce
                                             Hadoop Distributed File System
        Google File System
                                                       (HDFS)


             Google                                     Hadoop
Hadoop MapReduce
                                    JobTracker
       JobClient




               assign map task       assign reduce task
HDFS                                                       HDFS

                   mapper        copy & sort

                                                 reducer
                   mapper
                                                 reducer
                   mapper


                    Map           Shuffle         Reduce
                   phase          phase           phase
MapReduce
WordCount
                            JobTracker
                JobClient




     HDFS




the end of money is

  the end of love
WordCount
                                              JobTracker
                JobClient




                            assign map task    assign reduce task
     HDFS             the end of love

                               mapper


the end of money is
                                                           reducer
  the end of love       the end of money is


                               mapper
WordCount
                                                  JobTracker
                JobClient

                              the	

 	

      1
                              end	

 	

      1
                              of	

 	

       1
                              money	

        1
     HDFS                     is	

  	

      1


                            mapper


the end of money is           the	

    	

   1
                              end	

    	

   1                reducer
  the end of love             of	

     	

   1
                              love	

   	

   1


                            mapper


                             Map
                            phase
WordCount
                                                     JobTracker
                JobClient

                              the	

 	

      1
                              end	

 	

      1                 end	

 	

    1
                              of	

 	

       1                 end	

 	

    1
                              money	

        1                 is	

   	

   1
                              is	

  	

      1                 love	

 	

   1
     HDFS                                                       money	

      1
                                                                of	

 	

     1
                            mapper                copy & sort   of	

 	

     1
                                                                the	

 	

    1
                                                                the	

 	

    1
the end of money is           the	

    	

   1
                              end	

    	

   1                      reducer
  the end of love             of	

     	

   1
                              love	

   	

   1


                            mapper


                                                    Shuffle
                                                    phase
WordCount
                                     JobTracker
                JobClient




                                             end	

 	

    <1, 1>
     HDFS                                    is	

   	

   <1>
                                             love	

 	

   <1>
                            mapper           money	

      <1>
                                             of	

 	

     <1, 1>
                                             the	

 	

    <1, 1>
the end of money is
                                                    reducer
  the end of love



                            mapper
WordCount
                                     JobTracker
                JobClient




                                             end	

 	

    <1, 1>
     HDFS                                    is	

   	

   <1>
                                                                      HDFS
                                             love	

 	

   <1>
                            mapper           money	

      <1>
                                             of	

 	

     <1, 1>
                                             the	

 	

    <1, 1>   end	

 	

    2
                                                                    is	

   	

   1
the end of money is                                                 love	

 	

   1
                                                    reducer         money	

      1
  the end of love                                                   of	

 	

     2
                                                                    the	

 	

    2

                            mapper


                                                     Reduce
                                                      phase
MapReduce
※                            Java
mapred.pl
 1   #!/usr/bin/env perl                 23   package main;   #       MapReduce Framework
 2   use strict;                         24   my $phase = shift;
 3   use warnings;                       25   if ($phase eq 'map') { # map phase
 4                                       26     while (my $line = <STDIN>) {
 5   package MapReduce;                  27       chomp $line; #         map
 6   sub map {                     map   28       MapReduce::map($line);
 7     my $text = shift;                 29     }
 8     my @words = split /s/, $text;    30   } elsif ($phase eq 'reduce') { # reduce phase
 9     foreach my $word (@words) {       31     my ($prev_key, @values);
10       print $word, "t", 1, "n";     32     while (my $line = <STDIN>) {
11     }                                 33       chomp $line;
12   } #                                 34       my ($key, $value) = split /t/, $line;
13                                       35       if (!$prev_key || $key eq $prev_key) {
14   sub reduce {               reduce   36         push @values, $value;
15     my ($key, @values) = @_;          37       } else { #        (     ) reduce
16     my $cnt = 0;                      38         MapReduce::reduce($prev_key, @values);
17     foreach my $value (@values) {     39         @values = ($value);
18       $cnt += $value;                 40       }
19     }                                 41       $prev_key = $key;
20     print $key, "t", $cnt, "n";     42     } #             (     ) reduce
21   }                                   43     MapReduce::reduce($prev_key, @values);
22                                       44   }
MapReduce
    $ cat text.txt | ./mapred.pl map |   sort   | ./mapred.pl reduce




text.txt

 the end of money is
                         mapper                   reducer
   the end of love
MapReduce
   $ cat text.txt | ./mapred.pl map |                sort   | ./mapred.pl reduce

  6 sub map {                          the	

 	

       1
  7   my $text = shift;                end	

 	

       1
  8   my @words = split /s/, $text;   of	

 	

        1
  9   foreach my $word (@words) {      money	

         1
 10     print $word, "t", 1, "n";    is	

   	

      1
 11   }                                the	

 	

       1
 12 }                                  end	

 	

       1
                                       of	

 	

        1
                                       love	

 	

      1
the end of money is
                         mapper                               reducer
  the end of love


                           map


                           Map
                          phase
MapReduce
   $ cat text.txt | ./mapred.pl map |            sort        | ./mapred.pl reduce

                                   the	

 	

       1   end	

 	

       1
                                   end	

 	

       1   end	

 	

       1
                                   of	

 	

        1   is	

   	

      1
                                   money	

         1   love	

 	

      1
                                   is	

   	

      1   money	

         1
                                   the	

 	

       1   of	

 	

        1
                                   end	

 	

       1   of	

 	

        1
                                   of	

 	

        1   the	

 	

       1
                                   love	

 	

      1   the	

 	

       1
the end of money is                     copy & sort
                        mapper                                        reducer
  the end of love




                                           Shuffle
                                           phase
MapReduce
   $ cat text.txt | ./mapred.pl map |   sort            | ./mapred.pl reduce
                                                                   14 sub reduce {
                                                                   15   my ($key, @values) = @_;
                                          end	

 	

      <1, 1>
                                                                   16   my $cnt = 0;
                                          is	

   	

     <1>      17   foreach my $value (@values) {
                                          love	

 	

     <1>      18     $cnt += $value;
                                          money	

        <1>      19   }
                                          of	

 	

       <1, 1>   20   print $key, "t", $cnt, "n";
                                          the	

 	

      <1, 1>   21 }


                                                                                 end	

 	

    2
the end of money is                                                              is	

   	

   1
                        mapper                            reducer                love	

 	

   1
  the end of love                                                                money	

      1
                                                                                 of	

 	

     2
                                                                                 the	

 	

    2

                                                              reduce

                                                           Reduce
                                                            phase
MapReduce
MapReduce
• Split
• Map
• Combine
• Shuffle
• Reduce
Split
• HDFS               mapper

• HDFS             64MB 128MB


• mapper                        HDFS
                                       PC
Map
•   map

•
          HDFS
Combine
• Map                     reducer

    WordCount   Map



•

•
Shuffle
     • Map                                      Combine                                               reducer

                           reducer
              shuffle                 sort

mapper                       hash(the) % 2 = 0                                         reducer
                             hash(end) % 2 = 0
Map                          hash(is) % 2 = 0
                                 the	

   	

     1          end	

   	

   1          end	

   	

   1
the	

 	

      1                                     sort   end	

   	

   1   copy   end	

   	

   1
                                 end	

   	

     1                                                         end	

 	

    1
end	

 	

      1   partition    is	

    	

     1          is	

    	

   1          is	

    	

   1
                                                                                                            end	

 	

    1
of	

 	

       1                the	

   	

     1          the	

   	

   1          the	

   	

   1
                                                                                                            fuga	

 	

   1
money	

        1                end	

   	

     1          the	

   	

   1          the	

   	

   1
                                                                                                            hoge	

 	

   1
is	

   	

     1                                                                           sort & merge
                     hash(key) % 2                                                                          is	

   	

   1
the	

 	

      1
                                                                                                            the	

 	

    1
end	

 	

      1                                                               copy                        the	

 	

    1
of	

 	

       1                                                                      hoge	

 	

    1
love	

 	

     1             of	

 	

           1          love	

 	

    1          fuga	

 	

    1
                                                      sort   money	

       1
                partition money	

                1
                              of	

 	

           1          of	

 	

      1
                              love	

 	

         1          of	

 	

      1
               hash(of) % 2 = 1
               hash(money) % 2 = 1
               hash(love) % 2 = 1                                                      reducer
Reduce
• shuffle               reducer


•             reduce

•                       HDFS
MapReduce
MapReduce
•

    ‣   Word Count
    ‣   Grep
    ‣                etc.

•
MapReduce
•   MapReduce
       mapper → reducer → mapper → reducer
        HDFS                          MapReduce

•   WordCount
                        MapReduce
     MapReduce
MapReduce: Hadoop Streaming

•           Java               map            reduce

    Perl, Python, Ruby, JavaScript etc.

• Java MapReduce                                       map
               Hadoop Streaming               mapper

                                               map           combine
      ”                ”

     Hadoop Streaming             WordCount    map
          #!/usr/bin/env perl
          use strict;
          use warnings;

          while (my $line = <STDIN>) {
              my @words = split / /, $line;
              foreach my $word (@words) {
                  print $word . "t" . 1 . "n";
              }
          }
MapReduce: Hadoop Streaming

•           Java               map        reduce

    Perl, Python, Ruby, JavaScript etc.

• Java MapReduce                                         map
               Hadoop Streaming           mapper

                                            map                  combine
      ”                ”
                                          http://hapyrus.com/
                                                         cf. http://www.slideshare.net/fujibee/tokyo-
                                                                     webmining12-8349942
MapReduce: DSL

• Pig
  ‣     Yahoo!              SQL            ”                  ”
  ‣
                                                                           http://pig.apache.org/

  ‣            MapReduce
• Hive
  ‣     Facebook                 SQL
  ‣                                                                        http://hive.apache.org/

  ‣                        SQL                                       Pig

• Cascading
  ‣     Pig                 Java                        API
  ‣     Java



                                 http://www.cascading.org/1.2/userguide/html/ch10.html
• MapReduce

•

•
• Java
•       SlideShare
    •      Map Reduce                                                                  http://
           www.slideshare.net/doryokujin/map-reduce-8349406
    •      Hadoop      http://www.slideshare.net/pfi/hadoop-2525724
    •      Hadoop for programmer http://www.slideshare.net/shiumachi/hadoop-for-
           programmer-5202246


•       Web
    •      MapReduce - naoya                            http://d.hatena.ne.jp/naoya/
           20080511/1210506301
    •                                               Hadoop      http://www.atmarkit.co.jp/fjava/index/
           index_hadoop_tm.html
    •      Hadoop hBase                                                          1/2       CodeZine
           http://codezine.jp/article/detail/2448
•
    •                 ( ),           (   ),              ( ),           (   ),             (     ),
                (     ), Hadoop               ,       , 2011
    •      Tom White ( ),             (   ),          (     ), Hadoop,                     ,
           2010
    •      Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large
           Clusters, 6th OSDI, 2004

Mais conteúdo relacionado

Destaque

Rデバッグあれこれ
RデバッグあれこれRデバッグあれこれ
RデバッグあれこれTakeshi Arabiki
 
Introduction to Japanese Morphological Analysis
Introduction to Japanese Morphological AnalysisIntroduction to Japanese Morphological Analysis
Introduction to Japanese Morphological AnalysisTakeshi Arabiki
 
RではじめるTwitter解析
RではじめるTwitter解析RではじめるTwitter解析
RではじめるTwitter解析Takeshi Arabiki
 
R による文書分類入門
R による文書分類入門R による文書分類入門
R による文書分類入門Takeshi Arabiki
 
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜Takeshi Arabiki
 

Destaque (6)

Rデバッグあれこれ
RデバッグあれこれRデバッグあれこれ
Rデバッグあれこれ
 
Introduction to Japanese Morphological Analysis
Introduction to Japanese Morphological AnalysisIntroduction to Japanese Morphological Analysis
Introduction to Japanese Morphological Analysis
 
RではじめるTwitter解析
RではじめるTwitter解析RではじめるTwitter解析
RではじめるTwitter解析
 
R による文書分類入門
R による文書分類入門R による文書分類入門
R による文書分類入門
 
HMM, MEMM, CRF メモ
HMM, MEMM, CRF メモHMM, MEMM, CRF メモ
HMM, MEMM, CRF メモ
 
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜
 

Último

Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jNeo4j
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandIES VE
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 

Último (20)

Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 

はじめてのまっぷりでゅ〜す

  • 2. • Takeshi Arabiki ‣ ‣ Twitter: @a_bicky ‣ : id:a_bicky • R • http://d.hatena.ne.jp/a_bicky/
  • 3. • MapReduce • MapReduce • MapReduce • MapReduce • •
  • 5. MapReduce • TB PB Facebook 20TB • ” ” ” ” ‣ ‣ ‣ etc. ↑ MPI orz MapReduce
  • 6. MapReduce • Google • map reduce >>> map(lambda x: x ** 2, range(1, 6)) map [1, 4, 9, 16, 25] >>> reduce(lambda a, b: a + b, range(1, 6)) reduce 15 • OK • KVS MapReduce Big Table Google File System
  • 7. Hadoop • Google • MapReduce Hadoop MapReduce Google Hadoop KVS KVS Hadoop MapReduce Big Table HBase MapReduce Hadoop Distributed File System Google File System (HDFS) Google Hadoop
  • 8. Hadoop MapReduce JobTracker JobClient assign map task assign reduce task HDFS HDFS mapper copy & sort reducer mapper reducer mapper Map Shuffle Reduce phase phase phase
  • 10. WordCount JobTracker JobClient HDFS the end of money is the end of love
  • 11. WordCount JobTracker JobClient assign map task assign reduce task HDFS the end of love mapper the end of money is reducer the end of love the end of money is mapper
  • 12. WordCount JobTracker JobClient the 1 end 1 of 1 money 1 HDFS is 1 mapper the end of money is the 1 end 1 reducer the end of love of 1 love 1 mapper Map phase
  • 13. WordCount JobTracker JobClient the 1 end 1 end 1 of 1 end 1 money 1 is 1 is 1 love 1 HDFS money 1 of 1 mapper copy & sort of 1 the 1 the 1 the end of money is the 1 end 1 reducer the end of love of 1 love 1 mapper Shuffle phase
  • 14. WordCount JobTracker JobClient end <1, 1> HDFS is <1> love <1> mapper money <1> of <1, 1> the <1, 1> the end of money is reducer the end of love mapper
  • 15. WordCount JobTracker JobClient end <1, 1> HDFS is <1> HDFS love <1> mapper money <1> of <1, 1> the <1, 1> end 2 is 1 the end of money is love 1 reducer money 1 the end of love of 2 the 2 mapper Reduce phase
  • 16. MapReduce ※ Java mapred.pl 1 #!/usr/bin/env perl 23 package main; # MapReduce Framework 2 use strict; 24 my $phase = shift; 3 use warnings; 25 if ($phase eq 'map') { # map phase 4 26 while (my $line = <STDIN>) { 5 package MapReduce; 27 chomp $line; # map 6 sub map { map 28 MapReduce::map($line); 7 my $text = shift; 29 } 8 my @words = split /s/, $text; 30 } elsif ($phase eq 'reduce') { # reduce phase 9 foreach my $word (@words) { 31 my ($prev_key, @values); 10 print $word, "t", 1, "n"; 32 while (my $line = <STDIN>) { 11 } 33 chomp $line; 12 } # 34 my ($key, $value) = split /t/, $line; 13 35 if (!$prev_key || $key eq $prev_key) { 14 sub reduce { reduce 36 push @values, $value; 15 my ($key, @values) = @_; 37 } else { # ( ) reduce 16 my $cnt = 0; 38 MapReduce::reduce($prev_key, @values); 17 foreach my $value (@values) { 39 @values = ($value); 18 $cnt += $value; 40 } 19 } 41 $prev_key = $key; 20 print $key, "t", $cnt, "n"; 42 } # ( ) reduce 21 } 43 MapReduce::reduce($prev_key, @values); 22 44 }
  • 17. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce text.txt the end of money is mapper reducer the end of love
  • 18. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce 6 sub map { the 1 7 my $text = shift; end 1 8 my @words = split /s/, $text; of 1 9 foreach my $word (@words) { money 1 10 print $word, "t", 1, "n"; is 1 11 } the 1 12 } end 1 of 1 love 1 the end of money is mapper reducer the end of love map Map phase
  • 19. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce the 1 end 1 end 1 end 1 of 1 is 1 money 1 love 1 is 1 money 1 the 1 of 1 end 1 of 1 of 1 the 1 love 1 the 1 the end of money is copy & sort mapper reducer the end of love Shuffle phase
  • 20. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce 14 sub reduce { 15 my ($key, @values) = @_; end <1, 1> 16 my $cnt = 0; is <1> 17 foreach my $value (@values) { love <1> 18 $cnt += $value; money <1> 19 } of <1, 1> 20 print $key, "t", $cnt, "n"; the <1, 1> 21 } end 2 the end of money is is 1 mapper reducer love 1 the end of love money 1 of 2 the 2 reduce Reduce phase
  • 22. MapReduce • Split • Map • Combine • Shuffle • Reduce
  • 23. Split • HDFS mapper • HDFS 64MB 128MB • mapper HDFS PC
  • 24. Map • map • HDFS
  • 25. Combine • Map reducer WordCount Map • •
  • 26. Shuffle • Map Combine reducer reducer shuffle sort mapper hash(the) % 2 = 0 reducer hash(end) % 2 = 0 Map hash(is) % 2 = 0 the 1 end 1 end 1 the 1 sort end 1 copy end 1 end 1 end 1 end 1 partition is 1 is 1 is 1 end 1 of 1 the 1 the 1 the 1 fuga 1 money 1 end 1 the 1 the 1 hoge 1 is 1 sort & merge hash(key) % 2 is 1 the 1 the 1 end 1 copy the 1 of 1 hoge 1 love 1 of 1 love 1 fuga 1 sort money 1 partition money 1 of 1 of 1 love 1 of 1 hash(of) % 2 = 1 hash(money) % 2 = 1 hash(love) % 2 = 1 reducer
  • 27. Reduce • shuffle reducer • reduce • HDFS
  • 29. MapReduce • ‣ Word Count ‣ Grep ‣ etc. •
  • 30. MapReduce • MapReduce mapper → reducer → mapper → reducer HDFS MapReduce • WordCount MapReduce MapReduce
  • 31. MapReduce: Hadoop Streaming • Java map reduce Perl, Python, Ruby, JavaScript etc. • Java MapReduce map Hadoop Streaming mapper map combine ” ” Hadoop Streaming WordCount map #!/usr/bin/env perl use strict; use warnings; while (my $line = <STDIN>) { my @words = split / /, $line; foreach my $word (@words) { print $word . "t" . 1 . "n"; } }
  • 32. MapReduce: Hadoop Streaming • Java map reduce Perl, Python, Ruby, JavaScript etc. • Java MapReduce map Hadoop Streaming mapper map combine ” ” http://hapyrus.com/ cf. http://www.slideshare.net/fujibee/tokyo- webmining12-8349942
  • 33. MapReduce: DSL • Pig ‣ Yahoo! SQL ” ” ‣ http://pig.apache.org/ ‣ MapReduce • Hive ‣ Facebook SQL ‣ http://hive.apache.org/ ‣ SQL Pig • Cascading ‣ Pig Java API ‣ Java http://www.cascading.org/1.2/userguide/html/ch10.html
  • 34.
  • 36.
  • 37. SlideShare • Map Reduce http:// www.slideshare.net/doryokujin/map-reduce-8349406 • Hadoop http://www.slideshare.net/pfi/hadoop-2525724 • Hadoop for programmer http://www.slideshare.net/shiumachi/hadoop-for- programmer-5202246 • Web • MapReduce - naoya http://d.hatena.ne.jp/naoya/ 20080511/1210506301 • Hadoop http://www.atmarkit.co.jp/fjava/index/ index_hadoop_tm.html • Hadoop hBase 1/2 CodeZine http://codezine.jp/article/detail/2448 • • ( ), ( ), ( ), ( ), ( ), ( ), Hadoop , , 2011 • Tom White ( ), ( ), ( ), Hadoop, , 2010 • Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, 6th OSDI, 2004