18. 5WLc F $ LNTP
5WLc T P T
5IF = G
5WLc 8b LW 86
5WLc E8F 5 L
5IF ?LWMOL
7? 5ZZ
e e e
n
5WLc
9 E
5WLc
EPO STQ
5WLc LNST P
?PL T R
5WLc
D TNUFTRS
5IF 8L L CTZP T Pe e
19. 5WLc F $ LNTP
5WLc T P T
5IF = G
5WLc 8b LW 86
5WLc E8F 5 L
5IF ?LWMOL
7? 5ZZ
e e e
n
5WLc
9 E
5WLc
EPO STQ
5WLc LNST P
?PL T R
5WLc
D TNUFTRS
5IF 8L L CTZP T Pe e
20. Amazon S3
e o r e
~ b e e ~b
n z n n
11 111111111w
) 6 fl , fl
e
00(0000000
n
n5
n
n6
n n7
HTTP/
HTTPS
e
TD
3EC
e
21. 3RG TS C-
e
B BK WN L
e
5, B C
C T GMK
9G K G
4C
BK WN L
5QT TS
e
9E
C T GMK
9G K G
e
QGW I
D GSWIT K
e
9QGI K
e
G G
KQ SK
C- 3EC e
28. 5WLc F $ LNTP
5WLc T P T
5IF = G
5WLc 8b LW 86
5WLc E8F 5 L
5IF ?LWMOL
7? 5ZZ
e e e
n
5WLc
9 E
5WLc
EPO STQ
5WLc LNST P
?PL T R
5WLc
D TNUFTRS
5IF 8L L CTZP T Pe e
29. 3RG TS SKW W 5
n w e r
e e ~ o
Data
Sources
AWS Endpoint App.1
[Aggregate &
De-
Duplicate]
Data
Sources
Data
Sources
C- e
BK WN L
App.2
[Sliding
Window
Analysis]
Availability
Zone
Shard 1
Shard 2
Shard N
Availability
Zone
Availability
Zone
Amazon Kinesis
e
-3F& ,.N%
5
5,
30. 3RG TS SKW W KNTWK
e e 3RG TS C- 3RG TS BK WN L
1 n n F EPO STQ i n
i “
e 1 i n n i i
n .(
1 n n i n
39. 5WLc F $ LNTP
5WLc T P T
5IF = G
5WLc 8b LW 86
5WLc E8F 5 L
5IF ?LWMOL
7? 5ZZ
e e e
n
5WLc
9 E
5WLc
EPO STQ
5WLc LNST P
?PL T R
5WLc
D TNUFTRS
5IF 8L L CTZP T Pe e
42. • n FD?
• E86 n
• n 9G?
• ”i
•
n n $ n
orderid name price
1 Book 100
2 Pen 50
…
n Eraser 70
orderid name price
1 Book 100
2 Pen 50
…
n Eraser 70
gB 4 Ch gBK WN L h
48. 5WLc F $ LNTP
5WLc T P T
5IF = G
5WLc 8b LW 86
5WLc E8F 5 L
5IF ?LWMOL
7? 5ZZ
e e e
n
5WLc
9 E
5WLc
EPO STQ
5WLc LNST P
?PL T R
5WLc
D TNUFTRS
5IF 8L L CTZP T Pe e
49. 3RG TS QGW I G BK IKg Bh
w e :G TT )C G P
e e ~
5IF n
• 6TR 8L L
• n n
n
•
•
• n
•
• i
• FZ
LO Z
5WLc 9 E
)
E= FZ
51. C- e l
• 8:F n F n
• =ACHG BHGCHG 2 p “
hadoop jar YOUR_JAR.jar !
--src s3://YOUR_BUCKET/logs/ !
--dest s3://YOUR_BUCKET/output/!
hadoop jar YOUR_JAR.jar !
--src s3://YOUR_BUCKET/logs/ !
--desct hdfs:///output/ !
• F n F “
• F n n 8:F “
52. Amazon Elastic MapReduce
• LO Z n n
• WLZ PO NP$ ST_P$ ZTR$ PLWT R
” LO Z
• ST_P FD?
EPO STQ fl
iGE5AF:BE H8: H85:
” r
• n
• i
fl
“
Amazon Redshift
• E86
• FD?
• 6= n
• fl n
•
•
53. SQL on Big Data
10 GigE
(HPC)
COPY
UNLOAD
SQL Clients/BI Tools
128GB RAM
16TB disk
16 cores
JDBC/ODBC
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
128GB RAM
16TB disk
16 coresCompute
Node
Leader
Node
SQL Clients/BI Tools
JDBC/ODBC
Master
Node
Core/Task
Node
Core/Task
Node
Core/Task
Node
READ
WRITE
EMRFSS3 / EMR / DynamoDB / SSH
54. SQL on EMR
Query EngineApplicatio
n
Storage
YARN
Map
Reduce Tez Spark
Hive
Spark
SQL
Presto
JDBC/ODBC
HiveMetastore
HDFS
EMRFS
Hue
Zeppelin
SELECT COUNT(*)
55. e e
T TRS i5WLc F
fl n 5WLc
9 L TN LZEPO NP
C P i
R 86 n
n “
m C=
L L b TN i5WLc EPO STQ
n i
LM PL NSL T ” 6= n
56. e e
T TRS i5WLc F
fl n 5WLc
9 L TN LZEPO NP
C P i
R 86 n
n “
m C=
L L b TN i5WLc EPO STQ
n i
LM PL NSL T ” 6= n
57. 5WLc F $ LNTP
5WLc T P T
5IF = G
5WLc 8b LW 86
5WLc E8F 5 L
5IF ?LWMOL
7? 5ZZ
e e e
n
5WLc
9 E
5WLc
EPO STQ
5WLc LNST P
?PL T R
5WLc
D TNUFTRS
5IF 8L L CTZP T Pe e
61. 4 +)+
F L OL O 9OT T 2
• ) n 1 n ) r
• ( - 6 FC=79 n )( 6 fl
9 P Z T P 9OT T 2
• F L OL O 9OT T i n
n i i58
• , n )0 n )
r
• ( 0 6 FC=79 n )( 6 fl
x n “
62. 5WLc F $ LNTP
5WLc T P T
5IF = G
5WLc 8b LW 86
5WLc E8F 5 L
5IF ?LWMOL
7? 5ZZ
e e e
n
5WLc
9 E
5WLc
EPO STQ
5WLc LNST P
?PL T R
5WLc
D TNUFTRS
5IF 8L L CTZP T Pe e
63. 3RG TS B C
n i i i i
e c e e e
3F G 3F H
e
” e e
(5 )
64. 3RG TS 3 T G
• 5WLc fl E86 F
• n o n
• n n o bFD? - . r
k n
k
.,G6
fl
71. e 4 M G G TS 3EC
n
• -( n i n i i
n o
• n k
•
• 6TR 8L L 3
72. Data Science
Amazon Redshift
ETL処理
~ w
K1 SZKS , +/ n
e eAmazon
Kinesis
Amazon
EMR
http://www.slideshare.net/AmazonWebServices/bdt306-how-hearst-publishing-manages-clickstream-analytics-with-aws