Hortonworks and Revolution Analytics have teamed up to bring the predictive analytics power of R to Hortonworks Data Platform.
Hadoop, being a disruptive data processing framework, has made a large impact in the data ecosystems of today. Enabling business users to translate existing skills to Hadoop is necessary to encourage the adoption and allow businesses to get value out of their Hadoop investment quickly. R, being a prolific and rapidly growing data analysis language, now has a place in the Hadoop ecosystem.
This presentation covers:
- Trends and business drivers for Hadoop
- How Hortonworks and Revolution Analytics play a role in the modern data architecture
- How you can run R natively in Hortonworks Data Platform to simply move your R-powered analytics to Hadoop
Presentation replay at:
http://www.revolutionanalytics.com/news-events/free-webinars/2013/modern-data-architecture-revolution-hortonworks/
17. Revolution Confidential
What is the Open Source R Project?
The R Language:
Object-Oriented Language for Stats, Math and Data Science
Comprehensive data visualization and statistical modeling
capabilities
The R Community:
2M+ Users with the Skill to Tackle Big Data Statistical and Numerical
Analysis and Machine Learning Projects
New graduates with data skills learn R
The R Ecosystem:
5000+ Freely Available Algorithms in CRAN
Specialized methods for finance, economics, genomics, linguistics,
and every data-driven domain
17
18. Revolution Confidential
R is open source and drives analytic innovation but has
some limitations for Enterprises
Bigger
data sizes
Speed of
analysis
Production
support
Memory Bound Big Data
Single Threaded
Scale out, parallel
processing, high speed
Community Support
Commercial
production support
Innovation
and scale
Innovative
5000+ packages
Exponential growth
Combines with open
source R packages
where needed
19. Revolution Confidential
Revolution R Enterprise
19
Enterprise-Ready
Revolution R Enterprise
is the only commercial big data analytics platform
based on open source R statistical computing language
Cross-Platform
Big Data Analytics
High Performance Analytics
Easier Build & Deploy
20. Modern Data Architecture
Extract and Analyze
Ad-hoc Data Distillation
Exploratory Data Analysis / Data Visualization
Model Development
AMBARI
MAPREDUCE
YARN
HDFS
REST
DATA REFINEMENT
HIVEPIG CUSTOM
HTTP
STREAM
LOAD
SQOOP
FLUME
WebHDFS
NFS
STRUCTURE
HCATALOG
(metadata services)
Query/Visualization/
Reporting/Analytical
Tools and Apps
SOURCE
DATA
- Sensor Logs
- Clickstream
- Flat Files
- Unstructured
- Sentiment
- Customer
- Inventory
DBs
JMS
Queue’s
Fil
es
Fil
esFiles
LOAD
SQOOP/Hive
Web HDFS
Data Sources
CSV
DATABASES
INTERACTIVE
HIVE Server2
Analytical Tools
ANALYTICAL
rHadoop
21. Revolution Confidential
The Data Scientist’s Big Data Toolkit
21
Statistical
Tests
Machine
Learning
Simulation
Descriptive
Statistics
Data
Visualization
R Data Step
Predictive
Models
Sampling
24. Revolution Confidential
Modern Data Architecture with RRE7
In-Hadoop Predictive Analytics
Production Data Distillation (e.g. Semantic Analysis)
Production Model Processing / Re-Estimation
Production Model Scoring
AMBARI
MAPREDUCE
YARN
HDFS
REST
DATA REFINEMENT
HIVEPIG CUSTOM
DISTILLED DATA
FILES
HTTP
STREAM
LOAD
SQOOP
FLUME
WebHDFS
NFS
STRUCTURE
HCATALOG
(metadata services)
Query/Visualization/
Reporting/Analytical
Tools and Apps
SOURCE
DATA
- Sensor Logs
- Clickstream
- Flat Files
- Unstructured
- Sentiment
- Customer
- Inventory
DBs
JMS
Queue’s
Fil
es
Fil
esFiles
LOAD
SQOOP/Hive
Web HDFS
Data Sources
CSV
DATABASES
INTERACTIVE
HIVE Server2
Analytical Tools
ANALYTICAL
Revolution R
Enterprise
25. Revolution Confidential
Hadoop As An R Engine
Use Revolution R Enterprise
PEMAs in Hadoop
No need to change existing R code
Simple R programming
No need to “Think In MapReduce”
Eliminate data movement to slash
latencies
Use Hadoop nodes as parallel R
computation engines
25
Hadoop
Remember that CRAN is a new term to IT professionals, and anyone who hasn’t learned much about R. Spend some time on it. The acronym stands for: Community R Archive Network – a single repository of R algorithms, test data, evaluations. Use by nearly all R programmers.