SlideShare uma empresa Scribd logo
1 de 19
Baixar para ler offline
HadoopStreaming 
IT가맹점개발팀 
이태영 
2014.11.11 
5번째스터디 
파이썬으로MR 개발하기
•개발자, 팀이익숙한언어를사용해서MR 개발가능 
•특정언어에서제공하는라이브러리사용가능 
•표준I/O로데이터교환-자바MR에비해성능저하 
•그러나개발생산성이보장받는다면? 
HadoopStreaming은표준입출력(stdio)를제공하는모든언어이용가능 
※ 두가지요소가정의되어야함 
1.Map 기능이정의된실행가능Mapper 파일 
2.Reduce 기능이정의된실행가능Reducer 파일 
HadoopStreaming
MapReduce 
1.MAP의역할-표준입력으로입력데이터처리 
2.MAP의역할-표준출력으로Key, Value 출력 
3.REDUCER역할-MAP의출력<Key, Value>를표준입력으로처리 
4.REDUCER 역할-표준출력으로Key, Value 출력 
데이터 
입력 
파이썬 
Map 처리 
파이썬 
Reduce 처리 
PIPE 
파일읽기, 
PIPE, 
스트리밍등 
MR 처리 
결과 
출력
Python 설치 
표준I/O 데이터Mapper 예제 
1.python 사이트에서2.7.8 다운로드후압축해제 
2.계정홈디렉토리에서python 심볼릭링크로파이썬디렉토리경로생성 
3../configure 
4../make 
python명령어도 
귀찮으니py로축약
print ‘Hello BC’ 
hello.py 
매번py를쳐주기귀찮다. 
파이썬스크립트자체를실행파일로! 
#!/home/hadoop/python/py 
print ‘Hello BC’ 
hello.py 
[hadoop@big01 ~]$ chmod 755 hello.py 
[hadoop@big01 ~]$ ./hello.py 
Hello BC 
[hadoop@big01 ~]$ pyhello.py 
Hello BC 
Python 실행 
Hello BC 예제실행 
#! (SHA BANG)
#!/home/hadoop/python/py 
import sys 
for line in sys.stdin: 
line = line.strip() 
words = line.split() 
for word in words: 
print '{0}t{1}'.format(word, 1) 
[hadoop@big01 python]$ echo "bc bc bc card bc card" | mapper.py 
bc1 
bc1 
bc1 
card1 
bc1 
card1 
it1 
mapper.py 
Python MAP 
표준I/O Mapper 실행예제
[hadoop@big01 python]$ echo "bc bc bc card bc card it" | mapper.py 
bc1 
bc1 
bc1 
card1 
bc1 
card1 
it1 
[hadoop@big01 python]$ echo "bc bc bc card bc card it" | mapper.py | sort –k 1 
bc1 
bc1 
bc1 
bc1 
card1 
card1 
it1 
첫번째필드기준정렬 
Python MAP 
Mapper 출력값을정렬
import sys 
current_word = None 
current_count = 0 
word = None 
for line in sys.stdin: 
line = line.strip() 
word, count= line.split('t',1) 
count = int(count) 
if current_word == word: 
current_count += count 
else: 
if current_word: 
print '{0}t{1}'.format(current_word, current_count) 
current_count = count 
current_word = word 
if current_word == word: 
print '{0}t{1}'.format(current_word, current_count) 
reducer.py 
기준단어와같다면카운트+1 
기준단어가None이아니라면 
M/R 결과출력 
새로운기준단어설정 
마지막라인처리용 
Python REDUCE 
표준I/O의Reducer 예제 
[hadoop@big01 python]$ echo "bc bc bc card bc card it" | ./mapper.py | sort –k 1 | ./reducer.py 
bc4 
card2 
it1
Python ♥Hadoop 
HadoopStreaming 
1. HadoopStreaming에서mapper/reducer는실행가능한쉘로지정되어야한다. 
[OK]Hadoop jar hadoop-streaming*.jar –mapper map.py–reducer reduce.py… 
[NO] Hadoop jar hadoop-streaming*.jar –mapper python map.py–reducer python reduce.py… 
2. Python스크립트는어디에서든접근가능하도록디렉토리PATH를설정 
조건 
Caused by: java.lang.RuntimeException: configuration exception 
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) 
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) 
... 22 more 
Caused by: java.io.IOException: Cannot run program "mapper.py": error=2, 그런파일이나디렉터리가없습니다 
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) 
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) 
... 23 more 
Caused by: java.io.IOException: error=2, 그런파일이나디렉터리가없습니다 
at java.lang.UNIXProcess.forkAndExec(Native Method) 
at java.lang.UNIXProcess.<init>(UNIXProcess.java:187) 
at java.lang.ProcessImpl.start(ProcessImpl.java:134) 
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) 
... 24 more 
안하면
Python ♥Hadoop 
HadoopStreaming 
hadoop jar hadoop-streaming-2.5.1.jar  
-input myInputDirs  
-output myOutputDir  
-mapper /bin/cat  
-reducer /usr/bin/wc 
$HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar 
Hadoop 2.x의HadoopStreaming위치 
hadoop command [genericOptions] [streamingOptions]
Python ♥Hadoop 
HadoopStreaming 명령어(command) 
Parameter 
Optional/Required 
Description 
-inputdirectoryname or filename 
Required 
Input location for mapper 
-outputdirectoryname 
Required 
Output location for reducer 
-mapperexecutable or JavaClassName 
Required 
Mapper executable 
-reducerexecutable or JavaClassName 
Required 
Reducer executable 
-filefilename 
Optional 
Make the mapper, reducer, or combiner executable available locally on the compute nodes 
-inputformat JavaClassName 
Optional 
Class you supply should return key/value pairs of Text class. If not specified, TextInputFormat is used as the default 
-outputformat JavaClassName 
Optional 
Class you supply should take key/value pairs of Text class. If not specified, TextOutputformat is used as the default 
-partitioner JavaClassName 
Optional 
Class that determines which reduce a key is sent to 
-combiner streamingCommand or JavaClassName 
Optional 
Combiner executable for map output 
-cmdenv name=value 
Optional 
Pass environment variable to streaming commands 
-inputreader 
Optional 
For backwards-compatibility: specifies a record reader class (instead of an input format class) 
-verbose 
Optional 
Verbose output 
-lazyOutput 
Optional 
Create output lazily. For example, if the output format is based on FileOutputFormat, the output file is created only on the first call to output.collect (or Context.write) 
-numReduceTasks 
Optional 
Specify the number of reducers 
-mapdebug 
Optional 
Script to call when map task fails 
-reducedebug 
Optional 
Script to call when reduce task fails 
hadoop command [genericOptions] [streamingOptions]
Python ♥Hadoop 
HadoopStreaming 제네릭옵션 
Parameter 
Optional/Required 
Description 
-conf configuration_file 
Optional 
Specify an application configuration file 
-D property=value 
Optional 
Use value for given property 
-fs host:port or local 
Optional 
Specify a namenode 
-files 
Optional 
Specify comma-separated files to be copied to the Map/Reduce cluster 
-libjars 
Optional 
Specify comma-separated jar files to include in the classpath 
-archives 
Optional 
Specify comma-separated archives to be unarchived on the compute machines 
hadoop command [genericOptions] [streamingOptions] 
사용예 
hadoop jar hadoop-streaming-2.5.1.jar  
-D mapreduce.job.reduces=2  
-input myInputDirs  
-output myOutputDir  
-mapper /bin/cat  
-reducer /usr/bin/wc
Python ♥Hadoop 
HadoopStreaming 실행: WordCount 
[hadoop@big01 ~]$ hadoop jar hadoop-streaming-2.5.1.jar  
-input alice -output wc_alice 
-mapper mapper.py-reducer reducer.py 
-file mapper.py -file reducer.py 
packageJobJar: [mapper.py, reducer.py, /tmp/hadoop-hadoop/hadoop-unjar2252553335408523254/] [] /tmp/streamjob911479792088347698.jar tmpDir=null 
14/11/1123:51:41 INFO client.RMProxy: Connecting to ResourceManager at big01/192.168.56.101:8040 
14/11/1123:51:41 INFO client.RMProxy: Connecting to ResourceManager at big01/192.168.56.101:8040 
14/11/1123:51:43 INFO mapred.FileInputFormat: Total input paths to process : 1 
14/11/1123:51:43 INFO mapreduce.JobSubmitter: number of splits:2 
14/11/1123:51:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1416242552451_0009 
14/11/1123:51:44 INFO impl.YarnClientImpl: Submitted application application_1416242552451_0009 
14/11/1123:51:44 INFO mapreduce.Job: The url to track the job: http://big01:8088/proxy/application_1416242552451_0009/ 
14/11/1123:51:44 INFO mapreduce.Job: Running job: job_1416242552451_0009 
14/11/1123:51:53 INFO mapreduce.Job: Job job_1416242552451_0009 running in uber mode : false 
14/11/1123:51:53 INFO mapreduce.Job: map 0% reduce 0% 
14/11/1123:52:05 INFO mapreduce.Job: map 100% reduce 0% 
14/11/1123:52:13 INFO mapreduce.Job: map 100% reduce 100% 
14/11/1123:52:13 INFO mapreduce.Job: Job job_1416242552451_0009 completed successfully 
14/11/1123:52:13 INFO mapreduce.Job: Counters: 49 
File System Counters 
…..
Python ♥Hadoop 
HadoopStreaming 실행: WordCount 
[hadoop@big01 ~]$ hadoop jar hadoop-streaming-2.5.1.jar  
-input alice -output wc_alice 
-mapper mapper.py-reducer reducer.py 
-file mapper.py -file reducer.py 
packageJobJar: [mapper.py, reducer.py, /tmp/hadoop-hadoop/hadoop-unjar2252553335408523254/] [] /tmp/streamjob911479792088347698.jar tmpDir=null 
14/11/1123:51:41 INFO client.RMProxy: Connecting to ResourceManager at big01/192.168.56.101:8040 
14/11/1123:51:41 INFO client.RMProxy: Connecting to ResourceManager at big01/192.168.56.101:8040 
14/11/1123:51:43 INFO mapred.FileInputFormat: Total input paths to process : 1 
14/11/1123:51:43 INFO mapreduce.JobSubmitter: number of splits:2 
14/11/1123:51:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1416242552451_0009 
14/11/1123:51:44 INFO impl.YarnClientImpl: Submitted application application_1416242552451_0009 
14/11/1123:51:44 INFO mapreduce.Job: The url to track the job: http://big01:8088/proxy/application_1416242552451_0009/ 
14/11/1123:51:44 INFO mapreduce.Job: Running job: job_1416242552451_0009 
14/11/1123:51:53 INFO mapreduce.Job: Job job_1416242552451_0009 running in uber mode : false 
14/11/1123:51:53 INFO mapreduce.Job: map 0% reduce 0% 
14/11/1123:52:05 INFO mapreduce.Job: map 100% reduce 0% 
14/11/1123:52:13 INFO mapreduce.Job: map 100% reduce 100% 
14/11/1123:52:13 INFO mapreduce.Job: Job job_1416242552451_0009 completed successfully 
14/11/1123:52:13 INFO mapreduce.Job: Counters: 49 
File System Counters 
…..
Python ♥Hadoop 
HadoopStreaming 결과확인
Python ♥Hadoop 
HadoopStreaming 결과확인 
…. 
you'd8 
you'll4 
you're15 
you've5 
you,25 
you,'6 
you--all1 
you--are1 
you.1 
you.'1 
you:1 
you?2 
you?'7 
young5 
your62 
yours1 
yours."'1 
yourself5 
yourself!'1 
yourself,1 
yourself,'1 
yourself.'2 
youth,3 
youth,'3 
zigzag,1 
part-00000 를열어보면
Python ♥Hadoop 
HadoopStreaming 예제: WordCount 고도화 
#!/home/hadoop/python/py 
import sys 
Import re 
for line in sys.stdin: 
line = line.strip() 
line = re.sub('[=.#/?:$'!,"}]', '', line) 
words = line.split() 
for word in words: 
print '{0}t{1}'.format(word, 1) 
mapper.py 수정 
[hadoop@big01 ~]$ hadoop jar hadoop-streaming-2.5.1.jar  
-input alice -output wc_alice2 
-mapper mapper.py-reducer reducer.py 
-file mapper.py -file reducer.py 
packageJobJar: [mapper.py, reducer.py, /tmp/hadoop-hadoop/hadoop-unjar2252553335408523254/] [] /tmp/streamjob911479792088347698.jar tmpDir=null 
14/11/1123:51:41 INFO client.RMProxy: Connecting to ResourceManager at big01/192.168.56.101:8040 
14/11/1123:51:41 INFO client.RMProxy: Connecting to ResourceManager at big01/192.168.56.101:8040 
정규표현식, 특수문자제거
Python ♥Hadoop 
HadoopStreaming 결과확인 
….. 
ye;1 
year2 
years2 
yelled1 
yelp1 
yer4 
yesterday3 
yet18 
yet--Oh1 
yet--and1 
yet--its1 
you357 
you)1 
you--all1 
you--are1 
youd8 
youll4 
young5 
your62 
youre15 
yours2 
yourself10 
youth6 
youve5 
zigzag1 
wc_alice2의part-00000 를열어보면
끝.

Mais conteúdo relacionado

Mais procurados

Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsViswanath Gangavaram
 
Perl at SkyCon'12
Perl at SkyCon'12Perl at SkyCon'12
Perl at SkyCon'12Tim Bunce
 
Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql databasebigdatagurus_meetup
 
Hypertable
HypertableHypertable
Hypertablebetaisao
 
Database Architectures and Hypertable
Database Architectures and HypertableDatabase Architectures and Hypertable
Database Architectures and Hypertablehypertable
 
Hadoop spark performance comparison
Hadoop spark performance comparisonHadoop spark performance comparison
Hadoop spark performance comparisonarunkumar sadhasivam
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215exsuns
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Alex Levenson
 
Application Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyApplication Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyTim Bunce
 
Fluentd unified logging layer
Fluentd   unified logging layerFluentd   unified logging layer
Fluentd unified logging layerKiyoto Tamura
 
2015 bioinformatics python_io_wim_vancriekinge
2015 bioinformatics python_io_wim_vancriekinge2015 bioinformatics python_io_wim_vancriekinge
2015 bioinformatics python_io_wim_vancriekingeProf. Wim Van Criekinge
 
Apache Pig: A big data processor
Apache Pig: A big data processorApache Pig: A big data processor
Apache Pig: A big data processorTushar B Kute
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and PipesHanborq Inc.
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoopDavid Chiu
 

Mais procurados (20)

Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labsApache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
Apache pig power_tools_by_viswanath_gangavaram_r&d_dsg_i_labs
 
Perl at SkyCon'12
Perl at SkyCon'12Perl at SkyCon'12
Perl at SkyCon'12
 
Hypertable - massively scalable nosql database
Hypertable - massively scalable nosql databaseHypertable - massively scalable nosql database
Hypertable - massively scalable nosql database
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
Hypertable
HypertableHypertable
Hypertable
 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
 
Database Architectures and Hypertable
Database Architectures and HypertableDatabase Architectures and Hypertable
Database Architectures and Hypertable
 
Hadoop spark performance comparison
Hadoop spark performance comparisonHadoop spark performance comparison
Hadoop spark performance comparison
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
Application Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.keyApplication Logging in the 21st century - 2014.key
Application Logging in the 21st century - 2014.key
 
Fluentd unified logging layer
Fluentd   unified logging layerFluentd   unified logging layer
Fluentd unified logging layer
 
2015 bioinformatics python_io_wim_vancriekinge
2015 bioinformatics python_io_wim_vancriekinge2015 bioinformatics python_io_wim_vancriekinge
2015 bioinformatics python_io_wim_vancriekinge
 
Apache Pig: A big data processor
Apache Pig: A big data processorApache Pig: A big data processor
Apache Pig: A big data processor
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
 
Kyotoproducts
KyotoproductsKyotoproducts
Kyotoproducts
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoop
 

Destaque

2016317 파이썬기초_파이썬_다중설치부터_Jupyter를이용한프로그래밍_이태영
2016317 파이썬기초_파이썬_다중설치부터_Jupyter를이용한프로그래밍_이태영2016317 파이썬기초_파이썬_다중설치부터_Jupyter를이용한프로그래밍_이태영
2016317 파이썬기초_파이썬_다중설치부터_Jupyter를이용한프로그래밍_이태영Tae Young Lee
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with PythonDonald Miner
 
20141029 하둡2.5와 hive설치 및 예제
20141029 하둡2.5와 hive설치 및 예제20141029 하둡2.5와 hive설치 및 예제
20141029 하둡2.5와 hive설치 및 예제Tae Young Lee
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Uri Laserson
 
하둡 (Hadoop) 및 관련기술 훑어보기
하둡 (Hadoop) 및 관련기술 훑어보기하둡 (Hadoop) 및 관련기술 훑어보기
하둡 (Hadoop) 및 관련기술 훑어보기beom kyun choi
 
20141223 머하웃(mahout) 협업필터링_추천시스템구현
20141223 머하웃(mahout) 협업필터링_추천시스템구현20141223 머하웃(mahout) 협업필터링_추천시스템구현
20141223 머하웃(mahout) 협업필터링_추천시스템구현Tae Young Lee
 
대용량 분산 아키텍쳐 설계 #4. soa 아키텍쳐
대용량 분산 아키텍쳐 설계 #4. soa 아키텍쳐대용량 분산 아키텍쳐 설계 #4. soa 아키텍쳐
대용량 분산 아키텍쳐 설계 #4. soa 아키텍쳐Terry Cho
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴
대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴
대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴Terry Cho
 
대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐
대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐
대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐Terry Cho
 
대용량 분산 아키텍쳐 설계 #5. rest
대용량 분산 아키텍쳐 설계 #5. rest대용량 분산 아키텍쳐 설계 #5. rest
대용량 분산 아키텍쳐 설계 #5. restTerry Cho
 
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론Terry Cho
 
Pig and Python to Process Big Data
Pig and Python to Process Big DataPig and Python to Process Big Data
Pig and Python to Process Big DataShawn Hermans
 
Planning Number Of Instance And Thread In Web Application Server
Planning Number Of Instance And Thread In Web Application ServerPlanning Number Of Instance And Thread In Web Application Server
Planning Number Of Instance And Thread In Web Application ServerTerry Cho
 
Enterprise Soa Concept
Enterprise Soa ConceptEnterprise Soa Concept
Enterprise Soa ConceptTerry Cho
 
Testing process
Testing processTesting process
Testing processTerry Cho
 
Hadoop 제주대
Hadoop 제주대Hadoop 제주대
Hadoop 제주대DaeHeon Oh
 
20141214 빅데이터실전기술 - 유사도 및 군집화 방법 (Similarity&Clustering)
20141214 빅데이터실전기술 - 유사도 및 군집화 방법 (Similarity&Clustering) 20141214 빅데이터실전기술 - 유사도 및 군집화 방법 (Similarity&Clustering)
20141214 빅데이터실전기술 - 유사도 및 군집화 방법 (Similarity&Clustering) Tae Young Lee
 
Recommendatioin system basic
Recommendatioin system basicRecommendatioin system basic
Recommendatioin system basicSoo-Kyung Choi
 
H3 2011 파이썬으로 클라우드 하고 싶어요
H3 2011 파이썬으로 클라우드 하고 싶어요H3 2011 파이썬으로 클라우드 하고 싶어요
H3 2011 파이썬으로 클라우드 하고 싶어요KTH
 

Destaque (20)

2016317 파이썬기초_파이썬_다중설치부터_Jupyter를이용한프로그래밍_이태영
2016317 파이썬기초_파이썬_다중설치부터_Jupyter를이용한프로그래밍_이태영2016317 파이썬기초_파이썬_다중설치부터_Jupyter를이용한프로그래밍_이태영
2016317 파이썬기초_파이썬_다중설치부터_Jupyter를이용한프로그래밍_이태영
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with Python
 
20141029 하둡2.5와 hive설치 및 예제
20141029 하둡2.5와 hive설치 및 예제20141029 하둡2.5와 hive설치 및 예제
20141029 하둡2.5와 hive설치 및 예제
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
 
하둡 (Hadoop) 및 관련기술 훑어보기
하둡 (Hadoop) 및 관련기술 훑어보기하둡 (Hadoop) 및 관련기술 훑어보기
하둡 (Hadoop) 및 관련기술 훑어보기
 
20141223 머하웃(mahout) 협업필터링_추천시스템구현
20141223 머하웃(mahout) 협업필터링_추천시스템구현20141223 머하웃(mahout) 협업필터링_추천시스템구현
20141223 머하웃(mahout) 협업필터링_추천시스템구현
 
대용량 분산 아키텍쳐 설계 #4. soa 아키텍쳐
대용량 분산 아키텍쳐 설계 #4. soa 아키텍쳐대용량 분산 아키텍쳐 설계 #4. soa 아키텍쳐
대용량 분산 아키텍쳐 설계 #4. soa 아키텍쳐
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴
대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴
대용량 분산 아키텍쳐 설계 #2 대용량 분산 시스템 아키텍쳐 디자인 패턴
 
대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐
대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐
대용량 분산 아키텍쳐 설계 #3 대용량 분산 시스템 아키텍쳐
 
대용량 분산 아키텍쳐 설계 #5. rest
대용량 분산 아키텍쳐 설계 #5. rest대용량 분산 아키텍쳐 설계 #5. rest
대용량 분산 아키텍쳐 설계 #5. rest
 
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
대용량 분산 아키텍쳐 설계 #1 아키텍쳐 설계 방법론
 
Pig and Python to Process Big Data
Pig and Python to Process Big DataPig and Python to Process Big Data
Pig and Python to Process Big Data
 
Planning Number Of Instance And Thread In Web Application Server
Planning Number Of Instance And Thread In Web Application ServerPlanning Number Of Instance And Thread In Web Application Server
Planning Number Of Instance And Thread In Web Application Server
 
Enterprise Soa Concept
Enterprise Soa ConceptEnterprise Soa Concept
Enterprise Soa Concept
 
Testing process
Testing processTesting process
Testing process
 
Hadoop 제주대
Hadoop 제주대Hadoop 제주대
Hadoop 제주대
 
20141214 빅데이터실전기술 - 유사도 및 군집화 방법 (Similarity&Clustering)
20141214 빅데이터실전기술 - 유사도 및 군집화 방법 (Similarity&Clustering) 20141214 빅데이터실전기술 - 유사도 및 군집화 방법 (Similarity&Clustering)
20141214 빅데이터실전기술 - 유사도 및 군집화 방법 (Similarity&Clustering)
 
Recommendatioin system basic
Recommendatioin system basicRecommendatioin system basic
Recommendatioin system basic
 
H3 2011 파이썬으로 클라우드 하고 싶어요
H3 2011 파이썬으로 클라우드 하고 싶어요H3 2011 파이썬으로 클라우드 하고 싶어요
H3 2011 파이썬으로 클라우드 하고 싶어요
 

Semelhante a 20141111 파이썬으로 Hadoop MR프로그래밍

Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...CloudxLab
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 
Hadoop tutorial hand-outs
Hadoop tutorial hand-outsHadoop tutorial hand-outs
Hadoop tutorial hand-outspardhavi reddy
 
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)moai kids
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataDataWorks Summit
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learnedtcurdt
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Yahoo Developer Network
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questionsKalyan Hadoop
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaDesing Pathshala
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Soumee Maschatak
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview questionpappupassindia
 
Graphing Nagios services with pnp4nagios
Graphing Nagios services with pnp4nagiosGraphing Nagios services with pnp4nagios
Graphing Nagios services with pnp4nagiosjasonholtzapple
 
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...DrPDShebaKeziaMalarc
 

Semelhante a 20141111 파이썬으로 Hadoop MR프로그래밍 (20)

Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 
Hadoop tutorial hand-outs
Hadoop tutorial hand-outsHadoop tutorial hand-outs
Hadoop tutorial hand-outs
 
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big Data
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Hadoop Performance comparison
Hadoop Performance comparisonHadoop Performance comparison
Hadoop Performance comparison
 
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design PathshalaAdvance Map reduce - Apache hadoop Bigdata training by Design Pathshala
Advance Map reduce - Apache hadoop Bigdata training by Design Pathshala
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
 
Graphing Nagios services with pnp4nagios
Graphing Nagios services with pnp4nagiosGraphing Nagios services with pnp4nagios
Graphing Nagios services with pnp4nagios
 
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
Pig - A Data Flow Language and Execution Environment for Exploring Very Large...
 
pig.ppt
pig.pptpig.ppt
pig.ppt
 
Data Science
Data ScienceData Science
Data Science
 

Último

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 

Último (20)

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

20141111 파이썬으로 Hadoop MR프로그래밍

  • 1. HadoopStreaming IT가맹점개발팀 이태영 2014.11.11 5번째스터디 파이썬으로MR 개발하기
  • 2. •개발자, 팀이익숙한언어를사용해서MR 개발가능 •특정언어에서제공하는라이브러리사용가능 •표준I/O로데이터교환-자바MR에비해성능저하 •그러나개발생산성이보장받는다면? HadoopStreaming은표준입출력(stdio)를제공하는모든언어이용가능 ※ 두가지요소가정의되어야함 1.Map 기능이정의된실행가능Mapper 파일 2.Reduce 기능이정의된실행가능Reducer 파일 HadoopStreaming
  • 3. MapReduce 1.MAP의역할-표준입력으로입력데이터처리 2.MAP의역할-표준출력으로Key, Value 출력 3.REDUCER역할-MAP의출력<Key, Value>를표준입력으로처리 4.REDUCER 역할-표준출력으로Key, Value 출력 데이터 입력 파이썬 Map 처리 파이썬 Reduce 처리 PIPE 파일읽기, PIPE, 스트리밍등 MR 처리 결과 출력
  • 4. Python 설치 표준I/O 데이터Mapper 예제 1.python 사이트에서2.7.8 다운로드후압축해제 2.계정홈디렉토리에서python 심볼릭링크로파이썬디렉토리경로생성 3../configure 4../make python명령어도 귀찮으니py로축약
  • 5. print ‘Hello BC’ hello.py 매번py를쳐주기귀찮다. 파이썬스크립트자체를실행파일로! #!/home/hadoop/python/py print ‘Hello BC’ hello.py [hadoop@big01 ~]$ chmod 755 hello.py [hadoop@big01 ~]$ ./hello.py Hello BC [hadoop@big01 ~]$ pyhello.py Hello BC Python 실행 Hello BC 예제실행 #! (SHA BANG)
  • 6. #!/home/hadoop/python/py import sys for line in sys.stdin: line = line.strip() words = line.split() for word in words: print '{0}t{1}'.format(word, 1) [hadoop@big01 python]$ echo "bc bc bc card bc card" | mapper.py bc1 bc1 bc1 card1 bc1 card1 it1 mapper.py Python MAP 표준I/O Mapper 실행예제
  • 7. [hadoop@big01 python]$ echo "bc bc bc card bc card it" | mapper.py bc1 bc1 bc1 card1 bc1 card1 it1 [hadoop@big01 python]$ echo "bc bc bc card bc card it" | mapper.py | sort –k 1 bc1 bc1 bc1 bc1 card1 card1 it1 첫번째필드기준정렬 Python MAP Mapper 출력값을정렬
  • 8. import sys current_word = None current_count = 0 word = None for line in sys.stdin: line = line.strip() word, count= line.split('t',1) count = int(count) if current_word == word: current_count += count else: if current_word: print '{0}t{1}'.format(current_word, current_count) current_count = count current_word = word if current_word == word: print '{0}t{1}'.format(current_word, current_count) reducer.py 기준단어와같다면카운트+1 기준단어가None이아니라면 M/R 결과출력 새로운기준단어설정 마지막라인처리용 Python REDUCE 표준I/O의Reducer 예제 [hadoop@big01 python]$ echo "bc bc bc card bc card it" | ./mapper.py | sort –k 1 | ./reducer.py bc4 card2 it1
  • 9. Python ♥Hadoop HadoopStreaming 1. HadoopStreaming에서mapper/reducer는실행가능한쉘로지정되어야한다. [OK]Hadoop jar hadoop-streaming*.jar –mapper map.py–reducer reduce.py… [NO] Hadoop jar hadoop-streaming*.jar –mapper python map.py–reducer python reduce.py… 2. Python스크립트는어디에서든접근가능하도록디렉토리PATH를설정 조건 Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:222) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program "mapper.py": error=2, 그런파일이나디렉터리가없습니다 at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:209) ... 23 more Caused by: java.io.IOException: error=2, 그런파일이나디렉터리가없습니다 at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:187) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 24 more 안하면
  • 10. Python ♥Hadoop HadoopStreaming hadoop jar hadoop-streaming-2.5.1.jar -input myInputDirs -output myOutputDir -mapper /bin/cat -reducer /usr/bin/wc $HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-2.5.1.jar Hadoop 2.x의HadoopStreaming위치 hadoop command [genericOptions] [streamingOptions]
  • 11. Python ♥Hadoop HadoopStreaming 명령어(command) Parameter Optional/Required Description -inputdirectoryname or filename Required Input location for mapper -outputdirectoryname Required Output location for reducer -mapperexecutable or JavaClassName Required Mapper executable -reducerexecutable or JavaClassName Required Reducer executable -filefilename Optional Make the mapper, reducer, or combiner executable available locally on the compute nodes -inputformat JavaClassName Optional Class you supply should return key/value pairs of Text class. If not specified, TextInputFormat is used as the default -outputformat JavaClassName Optional Class you supply should take key/value pairs of Text class. If not specified, TextOutputformat is used as the default -partitioner JavaClassName Optional Class that determines which reduce a key is sent to -combiner streamingCommand or JavaClassName Optional Combiner executable for map output -cmdenv name=value Optional Pass environment variable to streaming commands -inputreader Optional For backwards-compatibility: specifies a record reader class (instead of an input format class) -verbose Optional Verbose output -lazyOutput Optional Create output lazily. For example, if the output format is based on FileOutputFormat, the output file is created only on the first call to output.collect (or Context.write) -numReduceTasks Optional Specify the number of reducers -mapdebug Optional Script to call when map task fails -reducedebug Optional Script to call when reduce task fails hadoop command [genericOptions] [streamingOptions]
  • 12. Python ♥Hadoop HadoopStreaming 제네릭옵션 Parameter Optional/Required Description -conf configuration_file Optional Specify an application configuration file -D property=value Optional Use value for given property -fs host:port or local Optional Specify a namenode -files Optional Specify comma-separated files to be copied to the Map/Reduce cluster -libjars Optional Specify comma-separated jar files to include in the classpath -archives Optional Specify comma-separated archives to be unarchived on the compute machines hadoop command [genericOptions] [streamingOptions] 사용예 hadoop jar hadoop-streaming-2.5.1.jar -D mapreduce.job.reduces=2 -input myInputDirs -output myOutputDir -mapper /bin/cat -reducer /usr/bin/wc
  • 13. Python ♥Hadoop HadoopStreaming 실행: WordCount [hadoop@big01 ~]$ hadoop jar hadoop-streaming-2.5.1.jar -input alice -output wc_alice -mapper mapper.py-reducer reducer.py -file mapper.py -file reducer.py packageJobJar: [mapper.py, reducer.py, /tmp/hadoop-hadoop/hadoop-unjar2252553335408523254/] [] /tmp/streamjob911479792088347698.jar tmpDir=null 14/11/1123:51:41 INFO client.RMProxy: Connecting to ResourceManager at big01/192.168.56.101:8040 14/11/1123:51:41 INFO client.RMProxy: Connecting to ResourceManager at big01/192.168.56.101:8040 14/11/1123:51:43 INFO mapred.FileInputFormat: Total input paths to process : 1 14/11/1123:51:43 INFO mapreduce.JobSubmitter: number of splits:2 14/11/1123:51:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1416242552451_0009 14/11/1123:51:44 INFO impl.YarnClientImpl: Submitted application application_1416242552451_0009 14/11/1123:51:44 INFO mapreduce.Job: The url to track the job: http://big01:8088/proxy/application_1416242552451_0009/ 14/11/1123:51:44 INFO mapreduce.Job: Running job: job_1416242552451_0009 14/11/1123:51:53 INFO mapreduce.Job: Job job_1416242552451_0009 running in uber mode : false 14/11/1123:51:53 INFO mapreduce.Job: map 0% reduce 0% 14/11/1123:52:05 INFO mapreduce.Job: map 100% reduce 0% 14/11/1123:52:13 INFO mapreduce.Job: map 100% reduce 100% 14/11/1123:52:13 INFO mapreduce.Job: Job job_1416242552451_0009 completed successfully 14/11/1123:52:13 INFO mapreduce.Job: Counters: 49 File System Counters …..
  • 14. Python ♥Hadoop HadoopStreaming 실행: WordCount [hadoop@big01 ~]$ hadoop jar hadoop-streaming-2.5.1.jar -input alice -output wc_alice -mapper mapper.py-reducer reducer.py -file mapper.py -file reducer.py packageJobJar: [mapper.py, reducer.py, /tmp/hadoop-hadoop/hadoop-unjar2252553335408523254/] [] /tmp/streamjob911479792088347698.jar tmpDir=null 14/11/1123:51:41 INFO client.RMProxy: Connecting to ResourceManager at big01/192.168.56.101:8040 14/11/1123:51:41 INFO client.RMProxy: Connecting to ResourceManager at big01/192.168.56.101:8040 14/11/1123:51:43 INFO mapred.FileInputFormat: Total input paths to process : 1 14/11/1123:51:43 INFO mapreduce.JobSubmitter: number of splits:2 14/11/1123:51:43 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1416242552451_0009 14/11/1123:51:44 INFO impl.YarnClientImpl: Submitted application application_1416242552451_0009 14/11/1123:51:44 INFO mapreduce.Job: The url to track the job: http://big01:8088/proxy/application_1416242552451_0009/ 14/11/1123:51:44 INFO mapreduce.Job: Running job: job_1416242552451_0009 14/11/1123:51:53 INFO mapreduce.Job: Job job_1416242552451_0009 running in uber mode : false 14/11/1123:51:53 INFO mapreduce.Job: map 0% reduce 0% 14/11/1123:52:05 INFO mapreduce.Job: map 100% reduce 0% 14/11/1123:52:13 INFO mapreduce.Job: map 100% reduce 100% 14/11/1123:52:13 INFO mapreduce.Job: Job job_1416242552451_0009 completed successfully 14/11/1123:52:13 INFO mapreduce.Job: Counters: 49 File System Counters …..
  • 16. Python ♥Hadoop HadoopStreaming 결과확인 …. you'd8 you'll4 you're15 you've5 you,25 you,'6 you--all1 you--are1 you.1 you.'1 you:1 you?2 you?'7 young5 your62 yours1 yours."'1 yourself5 yourself!'1 yourself,1 yourself,'1 yourself.'2 youth,3 youth,'3 zigzag,1 part-00000 를열어보면
  • 17. Python ♥Hadoop HadoopStreaming 예제: WordCount 고도화 #!/home/hadoop/python/py import sys Import re for line in sys.stdin: line = line.strip() line = re.sub('[=.#/?:$'!,"}]', '', line) words = line.split() for word in words: print '{0}t{1}'.format(word, 1) mapper.py 수정 [hadoop@big01 ~]$ hadoop jar hadoop-streaming-2.5.1.jar -input alice -output wc_alice2 -mapper mapper.py-reducer reducer.py -file mapper.py -file reducer.py packageJobJar: [mapper.py, reducer.py, /tmp/hadoop-hadoop/hadoop-unjar2252553335408523254/] [] /tmp/streamjob911479792088347698.jar tmpDir=null 14/11/1123:51:41 INFO client.RMProxy: Connecting to ResourceManager at big01/192.168.56.101:8040 14/11/1123:51:41 INFO client.RMProxy: Connecting to ResourceManager at big01/192.168.56.101:8040 정규표현식, 특수문자제거
  • 18. Python ♥Hadoop HadoopStreaming 결과확인 ….. ye;1 year2 years2 yelled1 yelp1 yer4 yesterday3 yet18 yet--Oh1 yet--and1 yet--its1 you357 you)1 you--all1 you--are1 youd8 youll4 young5 your62 youre15 yours2 yourself10 youth6 youve5 zigzag1 wc_alice2의part-00000 를열어보면
  • 19. 끝.