TAM Sports_IPL 17 Till Match 37_Celebrity Endorsement _Report.pdf
ย
R belgium 20121116-awson-cloud-beamer
1. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
R on Amazon cloud
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer)
2012
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
2. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Outline
1 Getting started on Amazon cloud
2 Some concrete applications using Hadoop
3 About RBelgium
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
3. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Basics on AWS
Register for AWS EC2 and S3 account
(http://aws.amazon.com/)
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
4. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Basics on AWS
Register for AWS EC2 and S3 account
(http://aws.amazon.com/)
Account Number, Access Key ID, Secret Access Key, 509
Certi๏ฌcate
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
5. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Basics on AWS
Register for AWS EC2 and S3 account
(http://aws.amazon.com/)
Account Number, Access Key ID, Secret Access Key, 509
Certi๏ฌcate
S3, EC2, EMR, . . .
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
6. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Basics on AWS
Register for AWS EC2 and S3 account
(http://aws.amazon.com/)
Account Number, Access Key ID, Secret Access Key, 509
Certi๏ฌcate
S3, EC2, EMR, . . .
Not followed or some more info ?
http://aws.amazon.com/documentation/gettingstarted/
http://www.bucketexplorer.com/documentation/
amazon-s3--what-is-my-aws-access-and-secret-key.html
http://www.yusufhm.info/content/
adding-x509-certificate-aws-iam-user-api-command-line-tools-0
...
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
7. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Why AWS?
Simple to use Just start up an instance with an AMI
Elastic: Auto-scaling groups (RAM,CPU) + Load balancing
(I/O) + Elastic IPs
On demand: anytime, what you want (limit to 20 EC2
instances without demand), normal, spot, reserved and
EBS-optimized (see http://aws.amazon.com/ec2/)
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
8. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Which AMI(s)? (1/2)
Bioconductor on Amazon cloud: http:
//bioconductor.org/help/bioconductor-cloud-ami/
MPI cluster on Amazon:
Example
1 l i b r a r y ( Rmpi )
mpi . spawn . R s l a v e s ( )
3 mpi . p a r L a p p l y ( 1 : mpi . u n i v e r s e . s i z e ( ) , f u n c t i o n ( x
) x +1)
mpi . c l o s e . R s l a v e s ( )
5 mpi . q u i t ( )
Listing 1: โRmpiโ on EC2
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
9. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Which AMI(s)? (2/2)
Parallel cluster on Amazon:
Example
1 library ( parallel )
c l <โ makePSOCKcluster ( c ( โ 1 0 . 6 8 . 1 5 5 . 3 0 โ , โ
10.68.155.45 โ , โ 10.68.155.65 โ ) )
3 c l u s t e r C a l l ( c l , e v a l , myfunc ( arg1 , arg2 , . . . ) )
Listing 2: โparallelโ on EC2
Hadoop cluster on Amazon with RHadoop:
https://github.com/RevolutionAnalytics/RHadoop/tree/
master/rmr2/pkg/tools
Storm cluster on Amazon:
https://github.com/nathanmarz/storm-deploy
SAP Hana (http://aws.amazon.com/sap/), Oracle R Enterprise
(Hadoop for batch + NoSQL for real-time), etc.
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
10. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Using rmr2 in Hadoop framework (1/4)
Toy case
Xฮฒ=y
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
11. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Using rmr2 in Hadoop framework (1/4)
Toy case
Xฮฒ=y
solve(t(X)%*%X, t(X)%*%y)
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
12. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Using rmr2 in Hadoop framework (1/4)
Toy case
Xฮฒ=y
solve(t(X)%*%X, t(X)%*%y)
=
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
13. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Using rmr2 in Hadoop framework (1/4)
Toy case
Xฮฒ=y
solve(t(X)%*%X, t(X)%*%y)
=
Example
1 l i b r a r y ( rmr2 )
X = t o . d f s ( m a t r i x ( rnorm ( 2 0 0 0 ) , n c o l = 1 0 ) )
3 y = a s . m a t r i x ( rnorm ( 2 0 0 ) )
Listing 6: initializing variables
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
14. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Using rmr2 in Hadoop framework (2/4)
Example
1 tXX =
values (
3 from . d f s (
mapreduce (
5 input = X,
map = f u n c t i o n ( k , X i ) k e y v a l ( 1 , l i s t ( t ( Xi )%โ%Xi ) ,
7 % reduce = reducerFunction ,
combine = TRUE) ) ) [ [ 1 ] ]
Listing 7: โrmr2โ matrix multiplication
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
15. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Using rmr2 in Hadoop framework (3/4)
Example
tXy =
2 values (
from . d f s (
4 mapreduce (
input = X,
6 map = f u n c t i o n ( k , X i )
k e y v a l ( 1 , l i s t ( t ( Xi ) %โ% y ) ) ,
8 combine = TRUE) ) ) [ [ 1 ] ]
s o l v e ( tXX , tXy )
Listing 8: โrmr2โ solving
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
16. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
How to debug (4/4)
Debugging
rmr.str(varName)
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
17. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
R on EMR with segue package
Example
1 l i b r a r y ( segue )
s e t C r e d e n t i a l s (โ accessKey โ ,โ secretAccessKey โ)
3 m y C l u s t e r <โ c r e a t e C l u s t e r ( n u m I n s t a n c e s =1 ,
m a s t e r I n s t a n c e T y p e=โm1 . s m a l l โ ,
s l a v e I n s t a n c e T y p e=โm1 . s m a l l โ , l o c a t i o n=โ usโe a s t โ1a
โ)
5 R e s u l t L i s t<โe m r l a p p l y ( m y c l u s t e r , d a t a L i s t , myfunc )
stopCluster ()
Listing 9: R on EMR with โsegueโ
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
18. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
R on EMR using the API command (1/3)
Upload the numberList ๏ฌle (integers from 1 to 100 with one
integer per line) and the following R scripts: โmapper.rโ and
โreducer.rโ to your AWS S3
Run the command line in your bash:
Example
. / e l a s t i c โmapreduce โโc r e a t e โโs t r e a m โโi n p u t s 3 : / /
y o u r b u c k e t / n u m b e r L i s t . t x t โโmapper s 3 : / /
y o u r b u c k e t / mapper . r โโr e d u c e r s 3 : / / y o u r b u c k e t /
r e d u c e r . r โโo u t p u t s 3 : / / e m r o u t r 1 v v / m y r e s u l t s โโ
name EMRexampleR1 โโnumโi n s t a n c e s 1
Listing 10: Running R on EMR
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
19. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
R on EMR using the API command (2/3)
Example
1 #! / u s r / b i n / env R s c r i p t
t r i m W h i t e S p a c e <โ f u n c t i o n ( l i n e ) gsub ( โ ( ห +) | ( +$
) โ , โโ , l i n e )
3 con <โ f i l e ( โ s t d i n โ , open = โ r โ )
w h i l e ( l e n g t h ( l i n e <โ r e a d L i n e s ( con , n = 1 , warn
= FALSE ) ) > 0 ) {
5 l i n e <โ t r i m W h i t e S p a c e ( l i n e )
c a t ( a s . n u m e r i c ( l i n e ) , โ t โ , โ nโ , s e p=โ โ )
7 }
Listing 11: Running simple R scripts on EMR - mapper script
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
20. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
R on EMR using the API command (2/3)
Example
1 #! / u s r / b i n / env R s c r i p t
t r i m W h i t e S p a c e <โ f u n c t i o n ( l i n e ) gsub ( โ ( ห +) | ( +$
) โ , โโ , l i n e )
3 con <โ f i l e ( โ s t d i n โ , open = โ r โ )
x <โ c ( )
5 w h i l e ( l e n g t h ( l i n e <โ r e a d L i n e s ( con , n = 1 , warn
= FALSE ) ) > 0 ) {
x <โ c ( x , a s . n u m e r i c ( t r i m W h i t e S p a c e ( l i n e ) ) )
7 }
c a t ( mean ( x ) )
Listing 12: Running simple R scripts on EMR - reducer script
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
21. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
How to debug (4/4)
Debugging
Debug ๏ฌrst your R code in local with the command line:
c a t i n p u t . t x t | R CMD BATCH โโs l a v e โโnoโt i m i n g
mapper . r o u t . t x t ;
2 c a t o u t . t x t | R CMD BATCH โโs l a v e โโnoโt i m i n g
r e d u c e r . r 2>&1
Listing 13: Debugging R code before using EMR
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
22. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Tips with EMR
Be careful between s3 and s3n, either you use one or the other
but not both. For more information about the di๏ฌerences
between s3 and s3n, see
http://stackover๏ฌow.com/questions/10569455/di๏ฌerence-
between-amazon-s3-and-s3n-in-hadoop (accessed on Nov 6
2012).
The ๏ฌrst line of the ๏ฌle must be well written to call the right
language (such as #! /usr/bin/env Rscript" for R or
#!/usr/bin/python for python). If this ๏ฌle is called by
another one then this is not necessary (ex: an R script calls an
R function from another ๏ฌle, the R function ๏ฌle does not need
to start with #! /usr/bin/env Rscript).
the output directory may NOT exist before launching your
EMR job, otherwise the job will always FAIL. Use
s3://yourProjects/project1 instead of s3://project1.
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
23. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Projects in RBelgium
http://www.heritagehealthprize.com/c/hhp
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
24. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Projects in RBelgium
http://www.heritagehealthprize.com/c/hhp
Text Mining using real โtextโ data extracted from the
database systems of a project-partner
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
25. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
RBelgium members (1/3)
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
26. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
RBelgium members (2/3)
Example
mygroup <โ โ RBelgium โ
2 # l i b r a r i e s f o r c o m m u n i c a t i n g w i t h meetup API
l i b r a r y ( RJSONIO , R c u r l )
4 # library for plotting
l i b r a r y ( ggplot2 )
6 # g e t member d a t a from meetup . com
domain . u r l<โp a s t e ( โ h t t p s : / / a p i . meetup . com/ 2 /
members ? k e y=โ , mykey , โ&s i g n=t r u e&g r o u p u r l n a m e
=RBelgium โ , c o l l a p s e=โ โ , s e p=โ โ )
8 domain . g e t<โgetURL ( domain . u r l )
domain . d a t a<โfromJSON ( domain . g e t )
10 # d i s p l a y i n g names
p r i n t ( u n l i s t ( l a p p l y ( domain . d a t a $ r e s u l t s , f u n c t i o n (
x ) x $name ) ) )
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
27. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
RBelgium members (3/3)
Example
1 # p l o t t i n g graph
j o i n s <โ u n l i s t ( l a p p l y ( domain . d a t a $ r e s u l t s ,
f u n c t i o n ( x ) x$ j o i n e d ) )
3 o r d e r e d J o i n s <โ j o i n s [ o r d e r ( j o i n s ) ]
l a b = a s . POSIXct ( o r d e r e d J o i n s / 1 0 0 0 , o r i g i n=โ
1970โ01โ01โ )
5 d f <โ d a t a . f r a m e (
x=l a b ,
7 y =1: l e n g t h ( domain . d a t a $ r e s u l t s )
)
9 png ( โ memberJoined . png โ )
ggplot ( df ) +
11 geom p o i n t ( a e s ( x = x , y = y ) ) +
x l a b ( โ Date โ ) +
13 y l a b ( โ#members โ )
dev . o f f ( )
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
28. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
RBelgium on internet
Website: http://www.meetup.com/RBelgium/ (68
members)
Website: http://www.rbelgium.be
Twitter: twitter.com/rbelgium (5 followers)
LinkedIn: http://www.linkedin.com/groups/
RBelgium-4223869?gid=4223869&trk=hb_side_g (7
members)
Google group:
http://groups.google.com/group/rbelgium,
rbelgium@googlegroups.com
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud
29. Getting started on Amazon cloud
Some concrete applications using Hadoop
About RBelgium
Questions?
Jean-Baptiste Poullet (RBelgium Founder and Co-Organizer) R on Amazon cloud