SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
MapReduce
@shot6
Cloudera
                   Avro	
           Sqoop	
  
Desktop	
  



 Pig	
        Hive	
          HBase	
           Chukwa	
  


 Map                                       Zoo
                         HDFS	
  
Reduce	
                                  Keeper	
  

                   Core	
  
Cloudera
                   Avro	
           Sqoop	
  
Desktop	
  



 Pig	
        Hive	
          HBase	
           Chukwa	
  


 Map                                       Zoo
                         HDFS	
  
Reduce	
                                  Keeper	
  

                   Core	
  
•                 MapReduce

     –    Mapper/Reducer
• 
MapReduce                      	
•              WordCount
• 
• 
     – Mapper/Reducer       Job   ⾏行行
     – InputFormat/OutputFormat         ⽅方
     – HDFS(FileSystem)
     –     Writable     ⽅方
WordCount	
•  Hadoop          Hello World
•                   API
   (org.apache.hadoop.mapreduce)
•  API
Grep	
•  grep
  – grepJob/sortJob 2
        ⾏行行
  – JobConf/Mapper/Reducer            ⽅方
  – Mapper RegexMapper     ⾏行行   <Text,
    Long> SequenceFileFormat
  – sortJob
  –                                ⼒力力
  – 
Grep
                  -	
•  JobConf
•  Mapper
•  Reducer
o.a.hadoop.mapred.JobConf	
• 
     –           mapred-default.xml
     –    conf/mapred-site.xml
     – XML    ⾝身
       DOM
     – ⾃自        ⽬目    ⼿手
     –  ⼦子
       •  JobConf child = new JobConf(   Conf,   jar
                                 );
mapred-site.xml	
<configuration>
<!–                 -->
<property>
 <key>mapred.job.tracker</key>
 <value>your-site:9001</value>
</property>
</configuration>
o.a.hadoop.mapred.Mapper	
•  Mapper
•  InputSplit    Mapper
•  MapTask/MapRunner
•  map(KEY, VALUE, COLLECTOR,
   REPORTER)
     – KEY:Map      VALUE:Map
     – COLLECTOR:
     – REPORTER:                     API
•         MapReduceBase
o.a.hadoop.mapred.MapTask	
•  Map
•  initiazlize              (Task Reducer    )
  –                                     ⽣生
  –               (o.a.h.mapred.TaskStatus.State)
       •  RUNNING, SUCCEEDED, FAILED, UNASSIGNED,
          KILLED, COMMIT_PENDING, FAILED_UNCLEAN,
          KILLED_UNCLEAN
  – OutputCommiter ⽣生
       •  Task        ⼒力力            ⾏行行
       •                                         ⼒力力
          – mapred.work.output.dir
o.a.h.mapred.MapTask cont	
•  run        runOldMapper
•  JobClient
   InputSplit
•  RecordReader
o.a.h.mapred.MapTask cont2	
•  Reduce
  –              spill                   (*            )
       •  $mapred.local.dir/taskTracker/jobcache/$
          {taskid}/output/spill${spillNumber}.out
  – Reducer
                 ⼒力力
       •  Combiner        min.num.spills.for.combine
                          combiner
  –              RecordWriter                 ⼒力力
•  MapRunner
o.a.h.mapred.MapRunner	
•  MapRunnable
  – mapred.map.runner.class
  – Hadoop
    PipeMapRunner
  –               Map
    MultiThreadedMapRunner
o.a.h.mapred.MapRunner
                cont	
•  run(RecordReader, OutputCollector,
   Reporter)
     – RecordReader: InputFormat Split
         Reader(InputFormat/RecordReader
                           )
• 
     – RecordReader
     – 
                ⾝身
     – 
MapTask	
      MapRunner	
              Mapper	
         Record            Output
                                                         Reader	
          Collector	
       Input
      Split⽣生 	
  
                                                          	
                                                                                   	

                                                                             Spill
              & run	
                            createKey()                SpillThread
                                                 createValue()	
                    	
  

                                                 next(key, value)	

            EOF         	
     Map(key, value,
                                                                           Spill
                               outputCollector, reporter)
m(_ _)m
•  Mapper
     – JobConf
     – Mapper/MapRunner/MapTask
• 
     – Reducer
       •  Reducer   ⾏行行
       •  Reducer                 ⾏行行
     – InputFormat/RecordReader
o.a.h.mapred.Reducer	
•  Reducer
•  InputSplit      Mapper
•  ReduceTask/ReduceRunner
•  reduce(KEY, Iterator<VALUE>,
   COLLECTOR, REPORTER)
     – KEY:    Iterator<VALUE>:
     – COLLECTOR:
     – REPORTER:                       API
•         MapReduceBase
o.a.h.mapred.ReduceTask	
•  SHUFFLE
•  ReduceTask.ReduceCopier
  – fetchOutputs(            Merger.MergeQueue)
    •  Map                            x   mapred.reduce.parallel.copies

         – MapOutputCopier
    •  Map
          ⾏行行 LocalFSMerger
    •                  ⾏行行 InMemFSMergeThread
    •  GetMapEventsThread
         – Map
         – <     , MapOutputLocation(taskId, host, httpUrl)>
    •    ⼀一 TaskTracker                                         ⼯工
o.a.h.mapred.ReduceTask	
•  run(RecordReader, OutputCollector,
   Reporter)
•  SORT
  – Memory, disk                        ⽣生
    •  RowKeyValueItetator
  – Reducer ⽣生
  – RecordWriter ⽣生
  – ReduceValuesIterator       ⾏行行

Mais conteúdo relacionado

Mais procurados

Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018Piotr Wikiel
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat SheetHortonworks
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkThoughtWorks
 
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドHadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドTatsuya Sasaki
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Infrastructure as Code with Terraform
Infrastructure as Code with TerraformInfrastructure as Code with Terraform
Infrastructure as Code with TerraformMario IC
 
Lua: the world's most infuriating language
Lua: the world's most infuriating languageLua: the world's most infuriating language
Lua: the world's most infuriating languagejgrahamc
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Groupgethue
 
Build your own_map_by_yourself
Build your own_map_by_yourselfBuild your own_map_by_yourself
Build your own_map_by_yourselfMarc Huang
 
REST Active Resource - 7º Encontro do GURU Sorocaba
REST Active Resource - 7º Encontro do GURU SorocabaREST Active Resource - 7º Encontro do GURU Sorocaba
REST Active Resource - 7º Encontro do GURU SorocabaLucas Renan
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamZheng Shao
 

Mais procurados (20)

Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Hive commands
Hive commandsHive commands
Hive commands
 
Sql cheat sheet
Sql cheat sheetSql cheat sheet
Sql cheat sheet
 
Shark - Lab Assignment
Shark - Lab AssignmentShark - Lab Assignment
Shark - Lab Assignment
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
 
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドHadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッド
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Infrastructure as Code with Terraform
Infrastructure as Code with TerraformInfrastructure as Code with Terraform
Infrastructure as Code with Terraform
 
Lua: the world's most infuriating language
Lua: the world's most infuriating languageLua: the world's most infuriating language
Lua: the world's most infuriating language
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Group
 
Build your own_map_by_yourself
Build your own_map_by_yourselfBuild your own_map_by_yourself
Build your own_map_by_yourself
 
REST Active Resource - 7º Encontro do GURU Sorocaba
REST Active Resource - 7º Encontro do GURU SorocabaREST Active Resource - 7º Encontro do GURU Sorocaba
REST Active Resource - 7º Encontro do GURU Sorocaba
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive Team
 
Using spaces (Drupal)
Using spaces (Drupal)Using spaces (Drupal)
Using spaces (Drupal)
 
Advanced Sqoop
Advanced Sqoop Advanced Sqoop
Advanced Sqoop
 
What's New In JDK 10
What's New In JDK 10What's New In JDK 10
What's New In JDK 10
 

Semelhante a サンプルから見るMapReduceコード

Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and PipesHanborq Inc.
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionSubhas Kumar Ghosh
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & StorageIlayaraja P
 
Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloudrhatr
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystemAndrew Brust
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
Hadoop M/R Pig Hive
Hadoop M/R Pig HiveHadoop M/R Pig Hive
Hadoop M/R Pig Hivezahid-mian
 

Semelhante a サンプルから見るMapReduceコード (20)

Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
mapreduce ppt.ppt
mapreduce ppt.pptmapreduce ppt.ppt
mapreduce ppt.ppt
 
L3.fa14.ppt
L3.fa14.pptL3.fa14.ppt
L3.fa14.ppt
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & Storage
 
Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloud
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Hadoop M/R Pig Hive
Hadoop M/R Pig HiveHadoop M/R Pig Hive
Hadoop M/R Pig Hive
 

Mais de Shinpei Ohtani

AWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayAWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayShinpei Ohtani
 
ECS for Docker Meetup #4
ECS for Docker Meetup #4ECS for Docker Meetup #4
ECS for Docker Meetup #4Shinpei Ohtani
 
JVM的な何か@JVM Operation Casual Talk
JVM的な何か@JVM Operation Casual TalkJVM的な何か@JVM Operation Casual Talk
JVM的な何か@JVM Operation Casual TalkShinpei Ohtani
 
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来Shinpei Ohtani
 
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 FallAmazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 FallShinpei Ohtani
 
プログラマブルクラウドの薦め
プログラマブルクラウドの薦めプログラマブルクラウドの薦め
プログラマブルクラウドの薦めShinpei Ohtani
 
Hadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダHadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダShinpei Ohtani
 
サンプルから見るMap reduceコード
サンプルから見るMap reduceコードサンプルから見るMap reduceコード
サンプルから見るMap reduceコードShinpei Ohtani
 
Hadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダHadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダShinpei Ohtani
 
Struts2を始めよう!
Struts2を始めよう!Struts2を始めよう!
Struts2を始めよう!Shinpei Ohtani
 

Mais de Shinpei Ohtani (17)

Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
AWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayAWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API Gateway
 
ECS for Docker Meetup #4
ECS for Docker Meetup #4ECS for Docker Meetup #4
ECS for Docker Meetup #4
 
JVM的な何か@JVM Operation Casual Talk
JVM的な何か@JVM Operation Casual TalkJVM的な何か@JVM Operation Casual Talk
JVM的な何か@JVM Operation Casual Talk
 
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
 
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 FallAmazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
 
プログラマブルクラウドの薦め
プログラマブルクラウドの薦めプログラマブルクラウドの薦め
プログラマブルクラウドの薦め
 
Hadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダHadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダ
 
サンプルから見るMap reduceコード
サンプルから見るMap reduceコードサンプルから見るMap reduceコード
サンプルから見るMap reduceコード
 
Hadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダHadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダ
 
はやわかりHadoop
はやわかりHadoopはやわかりHadoop
はやわかりHadoop
 
T2 Web Framework
T2 Web FrameworkT2 Web Framework
T2 Web Framework
 
T2 Hacks
T2 HacksT2 Hacks
T2 Hacks
 
T2 webframework
T2 webframeworkT2 webframework
T2 webframework
 
Struts2を始めよう!
Struts2を始めよう!Struts2を始めよう!
Struts2を始めよう!
 
Struts2 in a nutshell
Struts2 in a nutshellStruts2 in a nutshell
Struts2 in a nutshell
 
ASP.NET MVC 1.0
ASP.NET MVC 1.0ASP.NET MVC 1.0
ASP.NET MVC 1.0
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 

Último (20)

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

サンプルから見るMapReduceコード

  • 2. Cloudera Avro   Sqoop   Desktop   Pig   Hive   HBase   Chukwa   Map Zoo HDFS   Reduce   Keeper   Core  
  • 3. Cloudera Avro   Sqoop   Desktop   Pig   Hive   HBase   Chukwa   Map Zoo HDFS   Reduce   Keeper   Core  
  • 4. •  MapReduce –  Mapper/Reducer • 
  • 5. MapReduce •  WordCount •  •  – Mapper/Reducer Job ⾏行行 – InputFormat/OutputFormat ⽅方 – HDFS(FileSystem) –  Writable ⽅方
  • 6. WordCount •  Hadoop Hello World •  API (org.apache.hadoop.mapreduce) •  API
  • 7. Grep •  grep – grepJob/sortJob 2 ⾏行行 – JobConf/Mapper/Reducer ⽅方 – Mapper RegexMapper ⾏行行 <Text, Long> SequenceFileFormat – sortJob –  ⼒力力 – 
  • 8. Grep - •  JobConf •  Mapper •  Reducer
  • 9. o.a.hadoop.mapred.JobConf •  –  mapred-default.xml –  conf/mapred-site.xml – XML ⾝身 DOM – ⾃自 ⽬目 ⼿手 –  ⼦子 •  JobConf child = new JobConf( Conf, jar );
  • 10. mapred-site.xml <configuration> <!– --> <property> <key>mapred.job.tracker</key> <value>your-site:9001</value> </property> </configuration>
  • 11. o.a.hadoop.mapred.Mapper •  Mapper •  InputSplit Mapper •  MapTask/MapRunner •  map(KEY, VALUE, COLLECTOR, REPORTER) – KEY:Map VALUE:Map – COLLECTOR: – REPORTER: API •  MapReduceBase
  • 12. o.a.hadoop.mapred.MapTask •  Map •  initiazlize (Task Reducer ) –  ⽣生 –  (o.a.h.mapred.TaskStatus.State) •  RUNNING, SUCCEEDED, FAILED, UNASSIGNED, KILLED, COMMIT_PENDING, FAILED_UNCLEAN, KILLED_UNCLEAN – OutputCommiter ⽣生 •  Task ⼒力力 ⾏行行 •  ⼒力力 – mapred.work.output.dir
  • 13. o.a.h.mapred.MapTask cont •  run runOldMapper •  JobClient InputSplit •  RecordReader
  • 14. o.a.h.mapred.MapTask cont2 •  Reduce –  spill (* ) •  $mapred.local.dir/taskTracker/jobcache/$ {taskid}/output/spill${spillNumber}.out – Reducer ⼒力力 •  Combiner min.num.spills.for.combine combiner –  RecordWriter ⼒力力 •  MapRunner
  • 15. o.a.h.mapred.MapRunner •  MapRunnable – mapred.map.runner.class – Hadoop PipeMapRunner –  Map MultiThreadedMapRunner
  • 16. o.a.h.mapred.MapRunner cont •  run(RecordReader, OutputCollector, Reporter) – RecordReader: InputFormat Split Reader(InputFormat/RecordReader ) •  – RecordReader –  ⾝身 – 
  • 17. MapTask MapRunner Mapper Record Output Reader Collector Input Split⽣生   Spill & run createKey() SpillThread createValue()   next(key, value) EOF   Map(key, value, Spill outputCollector, reporter)
  • 19. •  Mapper – JobConf – Mapper/MapRunner/MapTask •  – Reducer •  Reducer ⾏行行 •  Reducer ⾏行行 – InputFormat/RecordReader
  • 20. o.a.h.mapred.Reducer •  Reducer •  InputSplit Mapper •  ReduceTask/ReduceRunner •  reduce(KEY, Iterator<VALUE>, COLLECTOR, REPORTER) – KEY: Iterator<VALUE>: – COLLECTOR: – REPORTER: API •  MapReduceBase
  • 21. o.a.h.mapred.ReduceTask •  SHUFFLE •  ReduceTask.ReduceCopier – fetchOutputs( Merger.MergeQueue) •  Map x mapred.reduce.parallel.copies – MapOutputCopier •  Map ⾏行行 LocalFSMerger •  ⾏行行 InMemFSMergeThread •  GetMapEventsThread – Map – < , MapOutputLocation(taskId, host, httpUrl)> •  ⼀一 TaskTracker ⼯工
  • 22. o.a.h.mapred.ReduceTask •  run(RecordReader, OutputCollector, Reporter) •  SORT – Memory, disk ⽣生 •  RowKeyValueItetator – Reducer ⽣生 – RecordWriter ⽣生 – ReduceValuesIterator ⾏行行