Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/19tYIEk.
Karim Chine introduces Elastic-R, demonstrating some of its applications in bioinformatics and finance. Filmed at qconnewyork.com.
Karim Chine has been designing and building large scale distributed software for over a decade. Born in Tunisia, he moved to Paris in 1997, graduated from Ecole Polytechnique and Telecom ParisTech and worked several years at Schlumberger and IBM R&D departments. Karim is part of the European commission Experts group for e-Infrastructure and cloud computing projects and the author of Elastic-R.
2. InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/elastic-r-data
3. Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
4. 22
Outline
Introduction
Fragmentation and friction in the Data Science arena
The understated potential of Cloud scriptability
Elastic-R: Towards a universal platform for Data Science
Elastic-R: Design and Technologies Overview
Elastic-R: The scriptability toolset
Demo
Conclusion
5. 33
Introduction
“The next information revolution is well underway. But it is
not happening where information scientists, information
executives, and the information industry in general are
looking for it. It is not a revolution in technology, machinery,
techniques, software, or speed. It is a revolution in
concepts.”
Peter Drucker. Forbes ASAP, August 24, 1998
6. 44
Introduction
Science and the 4th Paradigm
The e-Research Dancing Bears
2
2
2.
3
4
a
cG
a
a
Experimental Science Theoretical Science Computational Science e-Science / Data-intensive Science
The townspeople gather to
see the wondrous sight as the
massive, lumbering beast
shambles and shuffles from
paw to paw. The bear is really
a terrible dancer, and the
wonder isn't that the bear
dances well but that the bear
dances at all.
7. 5
Fragmentation and friction in the
Data Science Arena
www.scilab.org
http://root.cern.ch
www.sagemath.org
www.sas.com
office.microsoft.com
www.mathworks.com
www.scipy.org
www.spss.com
www.wolfram.com
www.python.org
Open-source (GPL) software
environment for statistical computing
and graphics
Lingua franca of data analysis.
Repositories of contributed R
packages related to a variety of
problem domains in life sciences,
social sciences, finance, econometrics,
chemo metrics, etc. are growing at an
exponential rate.
R is Super Glue
8. 6
The understated potential of
Cloud scriptability
3D printers are becoming
a common place:
Creating three-dimensional
solid object of virtually any
shape from a digital model.
Scripting the physical world
Sharing physical reality on
Facebook is now as easy as
sharing your holiday pictures
9. 7
Elastic-R: Towards a universal
platform for Data Science
Computational Components
R packages, Wrapped C,C++,Fortran code, Python modules, Matlab Toolkits…
Open source or commercial
Computational Resources
Clusters, grids, private or public clouds
Free or pay-per-use
Computational GUIs
HTML5 and Desktop Workbench
Built-in views /Plugins /Collaborative views
Open source or commercial
Computational Scripts
R / Python / Matlab / Groovy
Computational APIs
Java / SOAP / REST, Stateless and stateful
Computational Storage
Local, NFS, FTP, Amazon S3, EBS
Generated Computational Web Services
Stateful or stateless, mapping of R objects/functions
Elastic
-R
14. 12
Elastic-R: Design and
Technologies Overview
Elastic-R
AMI 1
R 2.10
BioC 2.5
Elastic-R
AMI 2
R 2.9
BioC 2.3
Elastic-R
AMI 3
R 2.8
BioC 2.0
Elastic-R Amazon Machine Images
Elastic-R
EBS 1
Data Set XXX
Elastic-R
EBS 2
Data Set
YYY
Elastic-R
EBS 3
Data Set
ZZZ
Elastic-R
EBS 4
Data Set VVV
Elastic-R
AMI 2
R 2.9
BioC 2.3
Elastic-R
EBS 4
Data Set
VVV
Amazon Elastic Block Stores
Eastic-R
AMI 2
R 2.9
BioC 2.3
Elastic-R.org
Elastic-R
EBS 4
Data Set
VVV
15. 1313
Modes of access
Individuals with AWS accounts
Standard AMIs (Amazon Machine Images): paid-per-use to Amazon.
For Academic use and trial purposes
Paid AMI: Paid-per-use software model. For business users
Individuals without AWS accounts
Trial tokens, purchased tokens, tokens granted by other users.
Resources (data science engines) shared by other users
Individual subscribtions
Companies/Educational & Research Institutions
Dedicated Platform and AMIs on an Amazon VPC (Virtual Private
Cloud). Paid via subscribtion
19. 1717
Elastic-R: The scriptability
toolset
API: Soap and Restful Web Services
Data Analysis Engines control
Data Analysis Engines Management (Life cycle, etc.)
Virtual appliances/artifacts management
Platform administartion
SDKs: Java, R, Microsoft Office (Vba)
Command line: elasticR package
Html5 Workbench
Elastic-R artifacts management interface
Engines control interface
20. 1818
Demo
Register to Elastic-R academic and trial portal
(www.elastic-r.org)
Create Data Science Engines using trial tokens
Work with R, Python and Scientific Spreadsheets in the
browser
Share Data Science Engine and Collaborate
Connect to the remote Data Science Engine from within
a local R session, push and pull data, execute
commands and show impact on spreadsheets and
dashboards
21. 1919
Conclusion
Elastic -R unlocks the potential of the cloud for Data
Scientists and analytical applications developers and
improves dramatically their productivity
The Data Science IT Factory chain, from resources
acquisition to services and applications design and
publication becomes:
Fully scriptable in R
under the direct control of the Data Scientist and the developer
Fully traceable and reproducible
Elastic-R can seamlessly extend any existing application
or service and can be used as a collaboration backend
Openness, security and reliability are among the key
capabilities delivered
22. 2020
What to do Next
Register to Elastic-R and try the HTML 5 Workbench and
the collaboration capabilities
Download the R package elasticR and use it to access
the cloud from local R sessions
Download the Java SDK and create your first analytical
application using AWS and advanced tools for
programming with data.
Get in touch with me to explore potential partnerships
and collaborations