SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
2011/7/10 @a_bicky
• Takeshi Arabiki
    ‣
    ‣   Twitter: @a_bicky

    ‣         : id:a_bicky


•
                                    R


•
                 http://d.hatena.ne.jp/a_bicky/
• MapReduce
•           MapReduce
• MapReduce
•           MapReduce
•
•
MapReduce
MapReduce
•                                         TB PB

    Facebook   20TB

•                                 ”   ”
                      ”     ”


    ‣

    ‣
    ‣                      etc.

    ↑ MPI                                  orz

                          MapReduce
MapReduce
• Google

•                         map               reduce
    >>> map(lambda x: x ** 2, range(1, 6))      map
    [1, 4, 9, 16, 25]
    >>> reduce(lambda a, b: a + b, range(1, 6)) reduce
    15

•                                                                          OK

•

                                                                     KVS

                                 MapReduce               Big Table

                                        Google File System
Hadoop

• Google
•            MapReduce            Hadoop   MapReduce



                Google                      Hadoop
                                  KVS                                    KVS
                                               Hadoop
    MapReduce         Big Table                                  HBase
                                              MapReduce
                                             Hadoop Distributed File System
        Google File System
                                                       (HDFS)


             Google                                     Hadoop
Hadoop MapReduce
                                    JobTracker
       JobClient




               assign map task       assign reduce task
HDFS                                                       HDFS

                   mapper        copy & sort

                                                 reducer
                   mapper
                                                 reducer
                   mapper


                    Map           Shuffle         Reduce
                   phase          phase           phase
MapReduce
WordCount
                            JobTracker
                JobClient




     HDFS




the end of money is

  the end of love
WordCount
                                              JobTracker
                JobClient




                            assign map task    assign reduce task
     HDFS             the end of love

                               mapper


the end of money is
                                                           reducer
  the end of love       the end of money is


                               mapper
WordCount
                                                  JobTracker
                JobClient

                              the	

 	

      1
                              end	

 	

      1
                              of	

 	

       1
                              money	

        1
     HDFS                     is	

  	

      1


                            mapper


the end of money is           the	

    	

   1
                              end	

    	

   1                reducer
  the end of love             of	

     	

   1
                              love	

   	

   1


                            mapper


                             Map
                            phase
WordCount
                                                     JobTracker
                JobClient

                              the	

 	

      1
                              end	

 	

      1                 end	

 	

    1
                              of	

 	

       1                 end	

 	

    1
                              money	

        1                 is	

   	

   1
                              is	

  	

      1                 love	

 	

   1
     HDFS                                                       money	

      1
                                                                of	

 	

     1
                            mapper                copy & sort   of	

 	

     1
                                                                the	

 	

    1
                                                                the	

 	

    1
the end of money is           the	

    	

   1
                              end	

    	

   1                      reducer
  the end of love             of	

     	

   1
                              love	

   	

   1


                            mapper


                                                    Shuffle
                                                    phase
WordCount
                                     JobTracker
                JobClient




                                             end	

 	

    <1, 1>
     HDFS                                    is	

   	

   <1>
                                             love	

 	

   <1>
                            mapper           money	

      <1>
                                             of	

 	

     <1, 1>
                                             the	

 	

    <1, 1>
the end of money is
                                                    reducer
  the end of love



                            mapper
WordCount
                                     JobTracker
                JobClient




                                             end	

 	

    <1, 1>
     HDFS                                    is	

   	

   <1>
                                                                      HDFS
                                             love	

 	

   <1>
                            mapper           money	

      <1>
                                             of	

 	

     <1, 1>
                                             the	

 	

    <1, 1>   end	

 	

    2
                                                                    is	

   	

   1
the end of money is                                                 love	

 	

   1
                                                    reducer         money	

      1
  the end of love                                                   of	

 	

     2
                                                                    the	

 	

    2

                            mapper


                                                     Reduce
                                                      phase
MapReduce
※                            Java
mapred.pl
 1   #!/usr/bin/env perl                 23   package main;   #       MapReduce Framework
 2   use strict;                         24   my $phase = shift;
 3   use warnings;                       25   if ($phase eq 'map') { # map phase
 4                                       26     while (my $line = <STDIN>) {
 5   package MapReduce;                  27       chomp $line; #         map
 6   sub map {                     map   28       MapReduce::map($line);
 7     my $text = shift;                 29     }
 8     my @words = split /s/, $text;    30   } elsif ($phase eq 'reduce') { # reduce phase
 9     foreach my $word (@words) {       31     my ($prev_key, @values);
10       print $word, "t", 1, "n";     32     while (my $line = <STDIN>) {
11     }                                 33       chomp $line;
12   } #                                 34       my ($key, $value) = split /t/, $line;
13                                       35       if (!$prev_key || $key eq $prev_key) {
14   sub reduce {               reduce   36         push @values, $value;
15     my ($key, @values) = @_;          37       } else { #        (     ) reduce
16     my $cnt = 0;                      38         MapReduce::reduce($prev_key, @values);
17     foreach my $value (@values) {     39         @values = ($value);
18       $cnt += $value;                 40       }
19     }                                 41       $prev_key = $key;
20     print $key, "t", $cnt, "n";     42     } #             (     ) reduce
21   }                                   43     MapReduce::reduce($prev_key, @values);
22                                       44   }
MapReduce
    $ cat text.txt | ./mapred.pl map |   sort   | ./mapred.pl reduce




text.txt

 the end of money is
                         mapper                   reducer
   the end of love
MapReduce
   $ cat text.txt | ./mapred.pl map |                sort   | ./mapred.pl reduce

  6 sub map {                          the	

 	

       1
  7   my $text = shift;                end	

 	

       1
  8   my @words = split /s/, $text;   of	

 	

        1
  9   foreach my $word (@words) {      money	

         1
 10     print $word, "t", 1, "n";    is	

   	

      1
 11   }                                the	

 	

       1
 12 }                                  end	

 	

       1
                                       of	

 	

        1
                                       love	

 	

      1
the end of money is
                         mapper                               reducer
  the end of love


                           map


                           Map
                          phase
MapReduce
   $ cat text.txt | ./mapred.pl map |            sort        | ./mapred.pl reduce

                                   the	

 	

       1   end	

 	

       1
                                   end	

 	

       1   end	

 	

       1
                                   of	

 	

        1   is	

   	

      1
                                   money	

         1   love	

 	

      1
                                   is	

   	

      1   money	

         1
                                   the	

 	

       1   of	

 	

        1
                                   end	

 	

       1   of	

 	

        1
                                   of	

 	

        1   the	

 	

       1
                                   love	

 	

      1   the	

 	

       1
the end of money is                     copy & sort
                        mapper                                        reducer
  the end of love




                                           Shuffle
                                           phase
MapReduce
   $ cat text.txt | ./mapred.pl map |   sort            | ./mapred.pl reduce
                                                                   14 sub reduce {
                                                                   15   my ($key, @values) = @_;
                                          end	

 	

      <1, 1>
                                                                   16   my $cnt = 0;
                                          is	

   	

     <1>      17   foreach my $value (@values) {
                                          love	

 	

     <1>      18     $cnt += $value;
                                          money	

        <1>      19   }
                                          of	

 	

       <1, 1>   20   print $key, "t", $cnt, "n";
                                          the	

 	

      <1, 1>   21 }


                                                                                 end	

 	

    2
the end of money is                                                              is	

   	

   1
                        mapper                            reducer                love	

 	

   1
  the end of love                                                                money	

      1
                                                                                 of	

 	

     2
                                                                                 the	

 	

    2

                                                              reduce

                                                           Reduce
                                                            phase
MapReduce
MapReduce
• Split
• Map
• Combine
• Shuffle
• Reduce
Split
• HDFS               mapper

• HDFS             64MB 128MB


• mapper                        HDFS
                                       PC
Map
•   map

•
          HDFS
Combine
• Map                     reducer

    WordCount   Map



•

•
Shuffle
     • Map                                      Combine                                               reducer

                           reducer
              shuffle                 sort

mapper                       hash(the) % 2 = 0                                         reducer
                             hash(end) % 2 = 0
Map                          hash(is) % 2 = 0
                                 the	

   	

     1          end	

   	

   1          end	

   	

   1
the	

 	

      1                                     sort   end	

   	

   1   copy   end	

   	

   1
                                 end	

   	

     1                                                         end	

 	

    1
end	

 	

      1   partition    is	

    	

     1          is	

    	

   1          is	

    	

   1
                                                                                                            end	

 	

    1
of	

 	

       1                the	

   	

     1          the	

   	

   1          the	

   	

   1
                                                                                                            fuga	

 	

   1
money	

        1                end	

   	

     1          the	

   	

   1          the	

   	

   1
                                                                                                            hoge	

 	

   1
is	

   	

     1                                                                           sort & merge
                     hash(key) % 2                                                                          is	

   	

   1
the	

 	

      1
                                                                                                            the	

 	

    1
end	

 	

      1                                                               copy                        the	

 	

    1
of	

 	

       1                                                                      hoge	

 	

    1
love	

 	

     1             of	

 	

           1          love	

 	

    1          fuga	

 	

    1
                                                      sort   money	

       1
                partition money	

                1
                              of	

 	

           1          of	

 	

      1
                              love	

 	

         1          of	

 	

      1
               hash(of) % 2 = 1
               hash(money) % 2 = 1
               hash(love) % 2 = 1                                                      reducer
Reduce
• shuffle               reducer


•             reduce

•                       HDFS
MapReduce
MapReduce
•

    ‣   Word Count
    ‣   Grep
    ‣                etc.

•
MapReduce
•   MapReduce
       mapper → reducer → mapper → reducer
        HDFS                          MapReduce

•   WordCount
                        MapReduce
     MapReduce
MapReduce: Hadoop Streaming

•           Java               map            reduce

    Perl, Python, Ruby, JavaScript etc.

• Java MapReduce                                       map
               Hadoop Streaming               mapper

                                               map           combine
      ”                ”

     Hadoop Streaming             WordCount    map
          #!/usr/bin/env perl
          use strict;
          use warnings;

          while (my $line = <STDIN>) {
              my @words = split / /, $line;
              foreach my $word (@words) {
                  print $word . "t" . 1 . "n";
              }
          }
MapReduce: Hadoop Streaming

•           Java               map        reduce

    Perl, Python, Ruby, JavaScript etc.

• Java MapReduce                                         map
               Hadoop Streaming           mapper

                                            map                  combine
      ”                ”
                                          http://hapyrus.com/
                                                         cf. http://www.slideshare.net/fujibee/tokyo-
                                                                     webmining12-8349942
MapReduce: DSL

• Pig
  ‣     Yahoo!              SQL            ”                  ”
  ‣
                                                                           http://pig.apache.org/

  ‣            MapReduce
• Hive
  ‣     Facebook                 SQL
  ‣                                                                        http://hive.apache.org/

  ‣                        SQL                                       Pig

• Cascading
  ‣     Pig                 Java                        API
  ‣     Java



                                 http://www.cascading.org/1.2/userguide/html/ch10.html
• MapReduce

•

•
• Java
•       SlideShare
    •      Map Reduce                                                                  http://
           www.slideshare.net/doryokujin/map-reduce-8349406
    •      Hadoop      http://www.slideshare.net/pfi/hadoop-2525724
    •      Hadoop for programmer http://www.slideshare.net/shiumachi/hadoop-for-
           programmer-5202246


•       Web
    •      MapReduce - naoya                            http://d.hatena.ne.jp/naoya/
           20080511/1210506301
    •                                               Hadoop      http://www.atmarkit.co.jp/fjava/index/
           index_hadoop_tm.html
    •      Hadoop hBase                                                          1/2       CodeZine
           http://codezine.jp/article/detail/2448
•
    •                 ( ),           (   ),              ( ),           (   ),             (     ),
                (     ), Hadoop               ,       , 2011
    •      Tom White ( ),             (   ),          (     ), Hadoop,                     ,
           2010
    •      Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large
           Clusters, 6th OSDI, 2004

Mais conteúdo relacionado

Destaque (6)

Rデバッグあれこれ
RデバッグあれこれRデバッグあれこれ
Rデバッグあれこれ
 
Introduction to Japanese Morphological Analysis
Introduction to Japanese Morphological AnalysisIntroduction to Japanese Morphological Analysis
Introduction to Japanese Morphological Analysis
 
RではじめるTwitter解析
RではじめるTwitter解析RではじめるTwitter解析
RではじめるTwitter解析
 
R による文書分類入門
R による文書分類入門R による文書分類入門
R による文書分類入門
 
HMM, MEMM, CRF メモ
HMM, MEMM, CRF メモHMM, MEMM, CRF メモ
HMM, MEMM, CRF メモ
 
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

はじめてのまっぷりでゅ〜す

  • 2. • Takeshi Arabiki ‣ ‣ Twitter: @a_bicky ‣ : id:a_bicky • R • http://d.hatena.ne.jp/a_bicky/
  • 3. • MapReduce • MapReduce • MapReduce • MapReduce • •
  • 5. MapReduce • TB PB Facebook 20TB • ” ” ” ” ‣ ‣ ‣ etc. ↑ MPI orz MapReduce
  • 6. MapReduce • Google • map reduce >>> map(lambda x: x ** 2, range(1, 6)) map [1, 4, 9, 16, 25] >>> reduce(lambda a, b: a + b, range(1, 6)) reduce 15 • OK • KVS MapReduce Big Table Google File System
  • 7. Hadoop • Google • MapReduce Hadoop MapReduce Google Hadoop KVS KVS Hadoop MapReduce Big Table HBase MapReduce Hadoop Distributed File System Google File System (HDFS) Google Hadoop
  • 8. Hadoop MapReduce JobTracker JobClient assign map task assign reduce task HDFS HDFS mapper copy & sort reducer mapper reducer mapper Map Shuffle Reduce phase phase phase
  • 10. WordCount JobTracker JobClient HDFS the end of money is the end of love
  • 11. WordCount JobTracker JobClient assign map task assign reduce task HDFS the end of love mapper the end of money is reducer the end of love the end of money is mapper
  • 12. WordCount JobTracker JobClient the 1 end 1 of 1 money 1 HDFS is 1 mapper the end of money is the 1 end 1 reducer the end of love of 1 love 1 mapper Map phase
  • 13. WordCount JobTracker JobClient the 1 end 1 end 1 of 1 end 1 money 1 is 1 is 1 love 1 HDFS money 1 of 1 mapper copy & sort of 1 the 1 the 1 the end of money is the 1 end 1 reducer the end of love of 1 love 1 mapper Shuffle phase
  • 14. WordCount JobTracker JobClient end <1, 1> HDFS is <1> love <1> mapper money <1> of <1, 1> the <1, 1> the end of money is reducer the end of love mapper
  • 15. WordCount JobTracker JobClient end <1, 1> HDFS is <1> HDFS love <1> mapper money <1> of <1, 1> the <1, 1> end 2 is 1 the end of money is love 1 reducer money 1 the end of love of 2 the 2 mapper Reduce phase
  • 16. MapReduce ※ Java mapred.pl 1 #!/usr/bin/env perl 23 package main; # MapReduce Framework 2 use strict; 24 my $phase = shift; 3 use warnings; 25 if ($phase eq 'map') { # map phase 4 26 while (my $line = <STDIN>) { 5 package MapReduce; 27 chomp $line; # map 6 sub map { map 28 MapReduce::map($line); 7 my $text = shift; 29 } 8 my @words = split /s/, $text; 30 } elsif ($phase eq 'reduce') { # reduce phase 9 foreach my $word (@words) { 31 my ($prev_key, @values); 10 print $word, "t", 1, "n"; 32 while (my $line = <STDIN>) { 11 } 33 chomp $line; 12 } # 34 my ($key, $value) = split /t/, $line; 13 35 if (!$prev_key || $key eq $prev_key) { 14 sub reduce { reduce 36 push @values, $value; 15 my ($key, @values) = @_; 37 } else { # ( ) reduce 16 my $cnt = 0; 38 MapReduce::reduce($prev_key, @values); 17 foreach my $value (@values) { 39 @values = ($value); 18 $cnt += $value; 40 } 19 } 41 $prev_key = $key; 20 print $key, "t", $cnt, "n"; 42 } # ( ) reduce 21 } 43 MapReduce::reduce($prev_key, @values); 22 44 }
  • 17. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce text.txt the end of money is mapper reducer the end of love
  • 18. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce 6 sub map { the 1 7 my $text = shift; end 1 8 my @words = split /s/, $text; of 1 9 foreach my $word (@words) { money 1 10 print $word, "t", 1, "n"; is 1 11 } the 1 12 } end 1 of 1 love 1 the end of money is mapper reducer the end of love map Map phase
  • 19. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce the 1 end 1 end 1 end 1 of 1 is 1 money 1 love 1 is 1 money 1 the 1 of 1 end 1 of 1 of 1 the 1 love 1 the 1 the end of money is copy & sort mapper reducer the end of love Shuffle phase
  • 20. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce 14 sub reduce { 15 my ($key, @values) = @_; end <1, 1> 16 my $cnt = 0; is <1> 17 foreach my $value (@values) { love <1> 18 $cnt += $value; money <1> 19 } of <1, 1> 20 print $key, "t", $cnt, "n"; the <1, 1> 21 } end 2 the end of money is is 1 mapper reducer love 1 the end of love money 1 of 2 the 2 reduce Reduce phase
  • 22. MapReduce • Split • Map • Combine • Shuffle • Reduce
  • 23. Split • HDFS mapper • HDFS 64MB 128MB • mapper HDFS PC
  • 24. Map • map • HDFS
  • 25. Combine • Map reducer WordCount Map • •
  • 26. Shuffle • Map Combine reducer reducer shuffle sort mapper hash(the) % 2 = 0 reducer hash(end) % 2 = 0 Map hash(is) % 2 = 0 the 1 end 1 end 1 the 1 sort end 1 copy end 1 end 1 end 1 end 1 partition is 1 is 1 is 1 end 1 of 1 the 1 the 1 the 1 fuga 1 money 1 end 1 the 1 the 1 hoge 1 is 1 sort & merge hash(key) % 2 is 1 the 1 the 1 end 1 copy the 1 of 1 hoge 1 love 1 of 1 love 1 fuga 1 sort money 1 partition money 1 of 1 of 1 love 1 of 1 hash(of) % 2 = 1 hash(money) % 2 = 1 hash(love) % 2 = 1 reducer
  • 27. Reduce • shuffle reducer • reduce • HDFS
  • 29. MapReduce • ‣ Word Count ‣ Grep ‣ etc. •
  • 30. MapReduce • MapReduce mapper → reducer → mapper → reducer HDFS MapReduce • WordCount MapReduce MapReduce
  • 31. MapReduce: Hadoop Streaming • Java map reduce Perl, Python, Ruby, JavaScript etc. • Java MapReduce map Hadoop Streaming mapper map combine ” ” Hadoop Streaming WordCount map #!/usr/bin/env perl use strict; use warnings; while (my $line = <STDIN>) { my @words = split / /, $line; foreach my $word (@words) { print $word . "t" . 1 . "n"; } }
  • 32. MapReduce: Hadoop Streaming • Java map reduce Perl, Python, Ruby, JavaScript etc. • Java MapReduce map Hadoop Streaming mapper map combine ” ” http://hapyrus.com/ cf. http://www.slideshare.net/fujibee/tokyo- webmining12-8349942
  • 33. MapReduce: DSL • Pig ‣ Yahoo! SQL ” ” ‣ http://pig.apache.org/ ‣ MapReduce • Hive ‣ Facebook SQL ‣ http://hive.apache.org/ ‣ SQL Pig • Cascading ‣ Pig Java API ‣ Java http://www.cascading.org/1.2/userguide/html/ch10.html
  • 34.
  • 36.
  • 37. SlideShare • Map Reduce http:// www.slideshare.net/doryokujin/map-reduce-8349406 • Hadoop http://www.slideshare.net/pfi/hadoop-2525724 • Hadoop for programmer http://www.slideshare.net/shiumachi/hadoop-for- programmer-5202246 • Web • MapReduce - naoya http://d.hatena.ne.jp/naoya/ 20080511/1210506301 • Hadoop http://www.atmarkit.co.jp/fjava/index/ index_hadoop_tm.html • Hadoop hBase 1/2 CodeZine http://codezine.jp/article/detail/2448 • • ( ), ( ), ( ), ( ), ( ), ( ), Hadoop , , 2011 • Tom White ( ), ( ), ( ), Hadoop, , 2010 • Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, 6th OSDI, 2004