Mais conteúdo relacionado Semelhante a Bigdata 大資料分析實務 (進階上機課程) (20) Bigdata 大資料分析實務 (進階上機課程)8. Vim文字編輯器介紹
• 使用『 vi filename 』進入一般指令模式
• 按下 i 進入編輯模式,開始編輯文字
• 按下 [ESC] 按鈕回到一般指令模式
• 按: 進入指令列模式,檔案儲存(w)並離開(q) vi 環境
hp://linux.vbird.org/linux_basic/0310vi.php#vi
10. Hadoop 系統架構
• Master /slave architecture
– NameNode,DataNode
– Resource Manager,NodeManager
master
slave1
NN
DN
RM
NM
10
slave2
DN
NM
17. HDFS 命令列操作
• 基本指令
– hadoop fs –ls file_in_hdfs
– hadoop fs –lsr dir_in_hdfs
– hadoop fs –rm file_in_hdfs
– hadoop fs –rmr dir_in_hdfs
– hadoop fs -mkdir dir_in_hdfs
– hadoop fs –cat file_in_hdfs
– hadoop fs –get file_in_hdfs file_in_local
– hadoop fs –put file_in_local file_in_hdfs
17
20. 不用寫程式,也能自動執行
• 僅定義config檔即可
#vim
example
agent.sources
=
source1
agent.channels
=
channel1
agent.sinks
=
sink1
agent.sources.source1.type
=
spooldir
agent.sources.source1.channels
=
channel1
agent.sources.source1.spoolDir
=
/home/hadoop/flumedata
agent.sources.source1.fileHeader
=
false
agent.sinks.sink1.type=hdfs
agent.sinks.sink1.channel=channel1
agent.sinks.sink1.hdfs.path=hdfs://master:9000/user/hadoop
agent.sinks.sink1.hdfs.fileType=DataStream
agent.sinks.sink1.hdfs.writeFormat=TEXT
agent.sinks.sink1.hdfs.rollSize
=
0
agent.sinks.sink1.hdfs.rollCount
=
0
agent.sinks.sink1.hdfs.idleTimeout
=
0
agent.channels.channel1.type
=
memory
agent.channels.channel1.capacity
=
100
#cd
~/flume/conf
#flume-‐ng
agent
-‐n
agent
-‐c
.
-‐f
./example
…
22. 1. 由RM做全局的資源分配
2. NM定時回報目前的資源使用量
3. 每個JOB會有一個負責的AppMaster控制Job
4. 將資源管理與工作控制分開
5. YARN為一通用的資源管理系統
可達成在YARN上運行多種框架
22
24. Step by Step
#vim
wordcount.data
aaa
bbb
ccc
ddd
bbb
ccc
ddd
eee
#
hadoop
fs
-‐mkdir
mr.wordcount
#
hadoop
fs
-‐put
wordcount.data
mr.wordcount
#
hadoop
fs
-‐ls
mr.wordcount
#
hadoop
jar
MR-‐sample.jar
org.nchc.train.mr.wordcount.WordCount
mr.wordcount/wordcount.data
output
...omit...
File
Input
Format
Counters
Bytes
Read=32
File
Output
Format
Counters
Bytes
Wrien=30
#
hadoop
fs
-‐cat
output/part-‐r-‐00000
aaa
1
bbb
2
ccc
2
ddd
2
eee
1
26. 動手對資料做分類
國文
數學
ID
1
0
10
ID
2
10
0
ID
3
10
10
ID
4
20
10
ID
5
10
20
ID
6
20
20
ID
7
50
60
ID
8
60
50
ID
9
60
60
ID
10
90
90
國文
數學
ID
1
0
10
ID
2
10
0
ID
3
10
10
ID
4
20
10
ID
5
10
20
ID
6
20
20
ID
7
50
60
ID
8
60
50
ID
9
60
60
ID
10
90
90
28. Step by Step
#vi
clustering.data
0
10
10
0
10
10
20
10
10
20
20
20
50
60
60
50
60
60
90
90
#
hadoop
fs
-‐mkdir
testdata
#
hadoop
fs
-‐put
clustering.data
testdata
#
hadoop
fs
-‐ls
-‐R
testdata
-‐rw-‐r-‐-‐r-‐-‐
3
root
hdfs
288374
2014-‐02-‐05
21:53
testdata/clustering.data
#
mahout
org.apache.mahout.clustering.synthecccontrol.canopy.Job
-‐t1
3
-‐t2
2
-‐i
testdata
-‐o
output
...omit...
14/09/08
01:31:07
INFO
clustering.ClusterDumper:
Wrote
3
clusters
14/09/08
01:31:07
INFO
driver.MahoutDriver:
Program
took
104405
ms
(Minutes:
1.7400833333333334)
#mahout
clusterdump
-‐-‐input
output/clusters-‐0-‐final
-‐-‐pointsDir
output/clusteredPoints
C-‐0{n=1
c=[9.000,
9.000]
r=[]}
Weight
:
[props
-‐
opconal]:
Point:
1.0:
[9.000,
9.000]
C-‐1{n=2
c=[5.833,
5.583]
r=[0.167,
0.083]}
Weight
:
[props
-‐
opconal]:
Point:
1.0:
[5.000,
6.000]
1.0:
[6.000,
5.000]
1.0:
[6.000,
6.000]
C-‐2{n=4
c=[1.313,
1.333]
r=[0.345,
0.527]}
Weight
:
[props
-‐
opconal]:
Point:
1.0:
[1:1.000]
1.0:
[0:1.000]
1.0:
[1.000,
1.000]
1.0:
[2.000,
1.000]
1.0:
[1.000,
2.000]
1.0:
[2.000,
2.000]
34. book-‐a book-‐b book-‐c
User
1 5 4 5
User
2 4 5 4
User
3 5 4 4~5
User
4 1 2 1~2
User
5 2 1 1
推薦系統原理
book-‐a book-‐b book-‐c
User
1 5 4 5
User
2 4 5 4
User
3 5 4
User
4 1 2
User
5 2 1 1
35. Step by Step
#vi
recom.data
1,1,5
1,2,4
1,3,5
2,1,4
2,2,5
2,3,4
3,1,5
3,2,4
4,1,1
4,2,2
5,1,2
5,2,1
5,3,1
#
hadoop
fs
-‐mkdir
testdata
#
hadoop
fs
-‐put
recom.data
testdata
#
hadoop
fs
-‐ls
-‐R
testdata
-‐rw-‐r-‐-‐r-‐-‐
3
root
hdfs
288374
2014-‐02-‐05
21:53
testdata/recom.data
#
mahout
recommenditembased
-‐s
SIMILARITY_EUCLIDEAN_DISTANCE
-‐i
testdata
-‐o
output
...omit…
File
Input
Format
Counters
Bytes
Read=287
File
Output
Format
Counters
Bytes
Wrien=32
14/09/04
05:46:56
INFO
driver.MahoutDriver:
Program
took
434965
ms
(Minutes:
7.249416666666667)
#
hadoop
fs
-‐cat
output/part-‐r-‐00000
3
[3:4.4787264]
4
[3:1.5212735]
36. book-‐a book-‐b book-‐c
User
1 5 4 5
User
2 4 5 4
User
3 5 4 4~5
User
4 1 2 1~2
User
5 2 1 1
分析結果
#
hadoop
fs
-‐ca
3
[3:4.478726
4
[3:1.521273
1. 我們預測User4不太喜歡book-c,所以我不會推薦book-c給User4
2. 我們預測User3喜歡book-c,所以我會推薦book-c給User3
37. Try It!
book1
book2
book3
book4
book5
book6
book7
book8
Book9
User1
3
2
1
5
5
1
3
1
User2
2
3
1
3
5
4
3
User3
1
2
3
3
2
1
User4
2
1
2
1
1
2
User5
3
3
1
3
2
2
3
3
2
User6
1
3
2
2
1
user7
4
4
1
5
1
3
3
4
user對book的評價表
37
38. 總結
• 使用虛擬機器技能 + 1
• 使用Linux技能 + 1
• 使用HDFS技能 + 1
• 使用Flume技能 + 1
• 使用MapReduce 技能 + 1
• 使用Mahout做分群技能 + 1
• 使用Mahout做推荐技能 + 1