SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
MapReduce





MapReduce
•   Google
•   Google
    •
    •                                        Map    Reduce
• Google                                           map   reduce

•                    MapReduce
    •   Map
        –     [1,2,3,4] – (*2)  [2,3,6,8]
    •   Reduce
        –     [1,2,3,4] – (sum)  10


–                              (Divide and Conquer)




                                                                  Copyright 2009 - Trend Micro Inc.
MapReduce
•   MapReduce        Google
                                                       Map   Reduce
       MapReduce
•
    – Map                      ”   ” key/value                ”   ”
      intermediate key/value
    – Reduce                            intermediate key
      intermediate values                 key/value
•                MapReduce




                                                                      Copyright 2009 - Trend Micro Inc.
•
    –
    –
    –


•

    http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/




                                                                       Copyright 2009 - Trend Micro Inc.
MapReduce
•
    – map         (K1, V1)  list(K2, V2)
    – reduce       (K2, list(V2))  list(K3, V3)

• grep
    – Map: (offset, line)  [(match, 1)]
    – Reduce: (match, [1, 1, ...])  [(match, n)]

• MapReduce                :




                                                    Copyright 2009 - Trend Micro Inc.
6
Classification   Copyright 2009 - Trend Micro Inc.
‧   ➝
‧       ➝




            Copyright 2009 - Trend Micro Inc.
Word Count




Classification   Copyright 2009 - Trend Micro Inc.
MapReduce

•               (Distributed Grep)
    –                                (pattern)

•               (Distributed Sort)
    –

•        URL               (Count of URL Access Frequency)
    –     Web                  URL




                                                     Copyright 2009 - Trend Micro Inc.
MapReduce




Classification    Copyright 2007 - Trend Micro Inc.
Hadoop      MapReduce

• Apache Hadoop      Google   MapReduce
   –              MapReduce
   –   Java
   –   Hadoop             (HDFS)
• Yahoo!
• Google, Yahoo!, IBM, Amazon          Hadoop
•          (Trend Micro)    Hadoop MapReduce




                                                Copyright 2009 - Trend Micro Inc.
Hadoop MapReduce

 •     Map/Reduce framework
         – JobTracker
         – TaskTracker
 •     JobTracker
         – Job
         –     Job            JobTracker                                Job.
 •     TaskTrackers
        •         Job




                                                                    Copyright 2009 - Trend Micro Inc.
                                Copyright 2007 - Trend Micro Inc.
Classification
Hadoop MapReduce
  class MyJob {

                 class Map {                 //    Map
                 }
                 class Reduce {             //     Reduce
                 }


                 }
                 main() {
                            //        job
                             JobConf conf = new JobConf(“MyJob.class”);
                                 conf.setInputPath(…);
                                 conf.setOutputPath(…);
                                 conf.setMapperClass(Map.class);
                                 conf.setReduceClass(Reduce.class)
                            //       Job
                                 JobClient.runJob(conf);
                 }
Classification                                           Copyright 2007 - Trend Micro Inc.
  }
•
    –
    –
    –

    –
        HDFS                                 MapReduce
•
    –
    –
         •
         •


             , GUID,       ,         ,
         1, 123, 131231231, VSAPI, open file
         2, 456, 123123123, VSAPI, connect internet




                                                      Copyright 2007 - Trend Micro Inc.
Map
•        Mapper                map()
•   Map : (K1, V1)  list(K2, V2)


map( WritableComparable, Writable,
  OutputCollector, Reporter)


•         input                                map()
•   OutputCollector               collect() method

OutputCollector.collect( WritableComparable,Writable )




                                                   Copyright 2007 - Trend Micro Inc.
Map

class MapClass extends MapReduceBase

implements Mapper<LongWritable, Text, Text, IntWritable> {

	       private final static IntWritable one = new IntWritable(1);

	       private Text hour = new Text();

	       public void map( LongWritable key, Text value,
OutputCollector<Text,IntWritable> output, Reporter reporter) throws
IOException {

	             String line = ((Text) value).toString();

               String[] token = line.split(quot;,quot;);

               String timestamp = token[1];

               Calendar c = Calendar.getInstance();

               c.setTimeInMillis(Long.parseLong(timestamp));

               Integer h = c.get(Calendar.HOUR);

               hour.set(h.toString());

               output.collect(hour, one)

}}}                                    Copyright 2007 - Trend Micro Inc.
Reduce
•     Reducer                   reduce() method
• Reduce : (K2, list(V2))  list(K3, V3)

     reduce (WritableComparable, Iterator,
              OutputCollector, Reporter)



• OutputCollector            collect() method

   OutputCollector.collect( WritableComparable,Writable )




                                     Copyright 2007 - Trend Micro Inc.
Reduce

class ReduceClass extends MapReduceBase implements Reducer< Text,
IntWritable, Text, IntWritable> {

	       IntWritable SumValue = new IntWritable();

	       public void reduce( Text key, Iterator<IntWritable> values,

	       OutputCollector<Text, IntWritable> output, Reporter reporter)

	       throws IOException {

	       	       int sum = 0;

	       	       while (values.hasNext())

	       	       	        sum += values.next().get();

	       	       SumValue.set(sum);

	       	       output.collect(key, SumValue);

}}



                                      Copyright 2007 - Trend Micro Inc.
•   JobConf
    – Mapper    Reducer   Inputformat     OutputFormat Combiler Petitioner

    –
    –
    –
        • map    reduce
        •


•                         JobClient                             JobConf


	   JobClient.runJob(conf);
	   JobClient.submitJob(conf);
	   JobClient.setJobEndNotificationURI(URI);



                                        Copyright 2007 - Trend Micro Inc.
Main Function
Class MyJob{
public static void main(String[] args) {
	       JobConf conf = new JobConf(MyJob.class);
	       conf.setJobName(”Caculate feedback log time distributionquot;);
	       // set path
	       conf.setInputPath(new Path(args[0]));
	       conf.setOutputPath(new Path(args[1]));
	       // set map reduce
	       conf.setOutputKeyClass(Text.class);            // set every word as key
	       conf.setOutputValueClass(IntWritable.class); // set 1 as value
	       conf.setMapperClass(MapClass.class);
	       conf.setCombinerClass(Reduce.class);
	       conf.setReducerClass(ReduceClass.class);
	       onf.setInputFormat(TextInputFormat.class);
	       conf.setOutputFormat(TextOutputFormat.class);
	       // run
	       JobClient.runJob(conf);
}}


                                      Copyright 2007 - Trend Micro Inc.
1.
     –   javac -classpath hadoop-*-core.jar -d MyJava
         MyJob.java
2.
     –   jar –cvf MyJob.jar -C MyJava .
3.
     –   bin/hadoop jar MyJob.jar MyJob input/ output/




                                          Copyright 2007 - Trend Micro Inc.
• bin/hadoop jar MyJob.jar MyJob input/ output/




                                                                   Copyright 2009 - Trend Micro Inc.
                               Copyright 2007 - Trend Micro Inc.
Classification
Web Console
http://172.16.203.132:50030/




                                                                         Copyright 2009 - Trend Micro Inc.
                                     Copyright 2007 - Trend Micro Inc.
Classification
Hadoop MapReduce
 • Mapper                  ?
         – Mapper       Input          Input      Hadoop
                        Mapper
         –      JobConf       setNumMapTasks(int)     Hadoop
           Mapper                   Hadoop


 • Reducer                 ?
         –         JobConf     JobConf.setNumReduceTasks(int)
                 Reducer
         –       Reducer                            Reducer
                       MapReduce Map Reduce




                                                                          Copyright 2009 - Trend Micro Inc.
                                      Copyright 2007 - Trend Micro Inc.
Classification
Non-Java Interface
• Hadoop Pipes
  –   MapReduce       C++ API
  – C++             java
• Hadoop Streaming
  –               MapReduce




                                Copyright 2007 - Trend Micro Inc.
• Google MapReduce
  – http://labs.google.com/papers/mapreduce.html
• Google                   MapReduce
  – http://code.google.com/edu/submissions/mapreduce/listing.html
• Google                   MapReduce
  – http://code.google.com/edu/submissions/mapreduce-minilecture/
    listing.html
• Hadoop
  – http://hadoop.apache.org/core/




                                 Copyright 2007 - Trend Micro Inc.
•              Eclipse                 MapReduce                           (IBM
        )
    –          Eclipse          Hadoop
    –
            • http://code.google.com/edu/parallel/tools/hadoopvm/hadoop-
              eclipse-plugin.jar
• Hadoop                       (Google                      )
    –        VMware                                 Hadoop


               VMware                                     Google
    –
            • http://code.google.com/edu/parallel/tools/hadoopvm/
              index.html



                                       Copyright 2007 - Trend Micro Inc.

Mais conteúdo relacionado

Semelhante a Zh Tw Introduction To Map Reduce

Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraSomnath Mazumdar
 
Functional Web Development
Functional Web DevelopmentFunctional Web Development
Functional Web DevelopmentFITC
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...CloudxLab
 
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspectiveBig Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspectiveEMC
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkboorad
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Google_A_Behind_the_Scenes_Tour_-_Jeff_Dean
Google_A_Behind_the_Scenes_Tour_-_Jeff_DeanGoogle_A_Behind_the_Scenes_Tour_-_Jeff_Dean
Google_A_Behind_the_Scenes_Tour_-_Jeff_DeanHiroshi Ono
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduceDavid Gleich
 
Zh Tw Introduction To Hadoop And Hdfs
Zh Tw Introduction To Hadoop And HdfsZh Tw Introduction To Hadoop And Hdfs
Zh Tw Introduction To Hadoop And Hdfskevin liao
 
GoMR: A MapReduce Framework for Go
GoMR: A MapReduce Framework for GoGoMR: A MapReduce Framework for Go
GoMR: A MapReduce Framework for GoConnorZanin
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The CloudsJacky Chu
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea
 
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)moai kids
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to MahoutTed Dunning
 

Semelhante a Zh Tw Introduction To Map Reduce (20)

Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Map Reduce introduction
Map Reduce introductionMap Reduce introduction
Map Reduce introduction
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
 
Functional Web Development
Functional Web DevelopmentFunctional Web Development
Functional Web Development
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
 
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspectiveBig Data & Analytics MapReduce/Hadoop – A programmer’s perspective
Big Data & Analytics MapReduce/Hadoop – A programmer’s perspective
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talkHadoop and Storm - AJUG talk
Hadoop and Storm - AJUG talk
 
MapReduce
MapReduceMapReduce
MapReduce
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Google_A_Behind_the_Scenes_Tour_-_Jeff_Dean
Google_A_Behind_the_Scenes_Tour_-_Jeff_DeanGoogle_A_Behind_the_Scenes_Tour_-_Jeff_Dean
Google_A_Behind_the_Scenes_Tour_-_Jeff_Dean
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
Zh Tw Introduction To Hadoop And Hdfs
Zh Tw Introduction To Hadoop And HdfsZh Tw Introduction To Hadoop And Hdfs
Zh Tw Introduction To Hadoop And Hdfs
 
GoMR: A MapReduce Framework for Go
GoMR: A MapReduce Framework for GoGoMR: A MapReduce Framework for Go
GoMR: A MapReduce Framework for Go
 
Taste Java In The Clouds
Taste Java In The CloudsTaste Java In The Clouds
Taste Java In The Clouds
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to Mahout
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Zh Tw Introduction To Map Reduce

  • 2. MapReduce • Google • Google • • Map Reduce • Google map reduce • MapReduce • Map – [1,2,3,4] – (*2)  [2,3,6,8] • Reduce – [1,2,3,4] – (sum)  10 – (Divide and Conquer) Copyright 2009 - Trend Micro Inc.
  • 3. MapReduce • MapReduce Google Map Reduce MapReduce • – Map ” ” key/value ” ” intermediate key/value – Reduce intermediate key intermediate values key/value • MapReduce Copyright 2009 - Trend Micro Inc.
  • 4. – – – • http://www.dbms2.com/2008/08/26/known-applications-of-mapreduce/ Copyright 2009 - Trend Micro Inc.
  • 5. MapReduce • – map (K1, V1)  list(K2, V2) – reduce (K2, list(V2))  list(K3, V3) • grep – Map: (offset, line)  [(match, 1)] – Reduce: (match, [1, 1, ...])  [(match, n)] • MapReduce : Copyright 2009 - Trend Micro Inc.
  • 6. 6 Classification Copyright 2009 - Trend Micro Inc.
  • 7. ➝ ‧ ➝ Copyright 2009 - Trend Micro Inc.
  • 8. Word Count Classification Copyright 2009 - Trend Micro Inc.
  • 9. MapReduce • (Distributed Grep) – (pattern) • (Distributed Sort) – • URL (Count of URL Access Frequency) – Web URL Copyright 2009 - Trend Micro Inc.
  • 10. MapReduce Classification Copyright 2007 - Trend Micro Inc.
  • 11. Hadoop MapReduce • Apache Hadoop Google MapReduce – MapReduce – Java – Hadoop (HDFS) • Yahoo! • Google, Yahoo!, IBM, Amazon Hadoop • (Trend Micro) Hadoop MapReduce Copyright 2009 - Trend Micro Inc.
  • 12. Hadoop MapReduce • Map/Reduce framework – JobTracker – TaskTracker • JobTracker – Job – Job JobTracker Job. • TaskTrackers • Job Copyright 2009 - Trend Micro Inc. Copyright 2007 - Trend Micro Inc. Classification
  • 13. Hadoop MapReduce class MyJob { class Map { // Map } class Reduce { // Reduce } } main() { // job JobConf conf = new JobConf(“MyJob.class”); conf.setInputPath(…); conf.setOutputPath(…); conf.setMapperClass(Map.class); conf.setReduceClass(Reduce.class) // Job JobClient.runJob(conf); } Classification Copyright 2007 - Trend Micro Inc. }
  • 14. – – – – HDFS MapReduce • – – • • , GUID, , , 1, 123, 131231231, VSAPI, open file 2, 456, 123123123, VSAPI, connect internet Copyright 2007 - Trend Micro Inc.
  • 15. Map • Mapper map() • Map : (K1, V1)  list(K2, V2) map( WritableComparable, Writable, OutputCollector, Reporter) • input map() • OutputCollector collect() method OutputCollector.collect( WritableComparable,Writable ) Copyright 2007 - Trend Micro Inc.
  • 16. Map class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text hour = new Text(); public void map( LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter reporter) throws IOException { String line = ((Text) value).toString(); String[] token = line.split(quot;,quot;); String timestamp = token[1]; Calendar c = Calendar.getInstance(); c.setTimeInMillis(Long.parseLong(timestamp)); Integer h = c.get(Calendar.HOUR); hour.set(h.toString()); output.collect(hour, one) }}} Copyright 2007 - Trend Micro Inc.
  • 17. Reduce • Reducer reduce() method • Reduce : (K2, list(V2))  list(K3, V3) reduce (WritableComparable, Iterator, OutputCollector, Reporter) • OutputCollector collect() method OutputCollector.collect( WritableComparable,Writable ) Copyright 2007 - Trend Micro Inc.
  • 18. Reduce class ReduceClass extends MapReduceBase implements Reducer< Text, IntWritable, Text, IntWritable> { IntWritable SumValue = new IntWritable(); public void reduce( Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) sum += values.next().get(); SumValue.set(sum); output.collect(key, SumValue); }} Copyright 2007 - Trend Micro Inc.
  • 19. JobConf – Mapper Reducer Inputformat OutputFormat Combiler Petitioner – – – • map reduce • • JobClient JobConf JobClient.runJob(conf); JobClient.submitJob(conf); JobClient.setJobEndNotificationURI(URI); Copyright 2007 - Trend Micro Inc.
  • 20. Main Function Class MyJob{ public static void main(String[] args) { JobConf conf = new JobConf(MyJob.class); conf.setJobName(”Caculate feedback log time distributionquot;); // set path conf.setInputPath(new Path(args[0])); conf.setOutputPath(new Path(args[1])); // set map reduce conf.setOutputKeyClass(Text.class); // set every word as key conf.setOutputValueClass(IntWritable.class); // set 1 as value conf.setMapperClass(MapClass.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(ReduceClass.class); onf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); // run JobClient.runJob(conf); }} Copyright 2007 - Trend Micro Inc.
  • 21. 1. – javac -classpath hadoop-*-core.jar -d MyJava MyJob.java 2. – jar –cvf MyJob.jar -C MyJava . 3. – bin/hadoop jar MyJob.jar MyJob input/ output/ Copyright 2007 - Trend Micro Inc.
  • 22. • bin/hadoop jar MyJob.jar MyJob input/ output/ Copyright 2009 - Trend Micro Inc. Copyright 2007 - Trend Micro Inc. Classification
  • 23. Web Console http://172.16.203.132:50030/ Copyright 2009 - Trend Micro Inc. Copyright 2007 - Trend Micro Inc. Classification
  • 24. Hadoop MapReduce • Mapper ? – Mapper Input Input Hadoop Mapper – JobConf setNumMapTasks(int) Hadoop Mapper Hadoop • Reducer ? – JobConf JobConf.setNumReduceTasks(int) Reducer – Reducer Reducer MapReduce Map Reduce Copyright 2009 - Trend Micro Inc. Copyright 2007 - Trend Micro Inc. Classification
  • 25. Non-Java Interface • Hadoop Pipes – MapReduce C++ API – C++ java • Hadoop Streaming – MapReduce Copyright 2007 - Trend Micro Inc.
  • 26. • Google MapReduce – http://labs.google.com/papers/mapreduce.html • Google MapReduce – http://code.google.com/edu/submissions/mapreduce/listing.html • Google MapReduce – http://code.google.com/edu/submissions/mapreduce-minilecture/ listing.html • Hadoop – http://hadoop.apache.org/core/ Copyright 2007 - Trend Micro Inc.
  • 27. Eclipse MapReduce (IBM ) – Eclipse Hadoop – • http://code.google.com/edu/parallel/tools/hadoopvm/hadoop- eclipse-plugin.jar • Hadoop (Google ) – VMware Hadoop VMware Google – • http://code.google.com/edu/parallel/tools/hadoopvm/ index.html Copyright 2007 - Trend Micro Inc.