SlideShare a Scribd company logo
1 of 44
Download to read offline
Mario Inchiosa
Chief Scientist, Revolution Analytics
In-Database Analytics Deep Dive with
Teradata and Revolution R
Tim Miller
Partner Integration Lab, Teradata
• Introduction
• Revolution R Enterprise
• Case Study – Global Internet Marketplace
• Under the Hood
• Summary & Questions
Agenda
• What data storage/management software do you use?
> Hadoop
> Teradata
> LSF Clusters/Grids
> Servers
Please choose all that apply
Poll Question #1
• Most powerful statistical programming language
– Flexible, extensible and comprehensive for productivity
• Most widely used data analysis software
– Used by 2M+ data scientists, statisticians and analysts
• Create beautiful and unique data visualizations
– As seen in New York Times, Twitter and Flowing Data
• Thriving open-source community
– Leading edge of analytics research
• Fills the talent gap
– New graduates prefer R
What is R?
R is Hot
bit.ly/r-is-
hot
WHITE PAPER
Exploding growth and demand for R
• R is the highest paid IT skill
> Dice.com, Jan 2014
• R most-used data science language after
SQL
> O’Reilly, Jan 2014
• R is used by 70% of data miners
> Rexer, Sep 2013
• R is #15 of all programming languages
> RedMonk, Jan 2014
• R growing faster than any other data
science language
> KDnuggets, Aug 2013
• More than 2 million users worldwide
R Usage Growth
Rexer Data Miner Survey, 2007-2013
70% of data miners report using R
R is the first choice of more
data miners than any other
software
Source: www.rexeranalytics.com
Debt<10% of Income Debt=0%
Good
Credit
Risks
Bad
Credit
Risks
Good
Credit
Risks
Yes
YesYes
NO
NONO
Income>$40K
SQL Request
Sample Data
Debt<10% of Income Debt=0%
Good
Credit
Risks
Bad
Credit
Risks
Good
Credit
Risks
Yes
YesYes
NO
NONO
Income>$40K
Results
Desktop and Server Analytic Architecture
In-Database Analytic Architecture
Results
Server Based vs. In-Database Architectures
Why Is Teradata Different?
Exponential
Performance Improvement
Analyst
Node level calculation: 1 2 7 9 = 4.5
• R is distributed across nodes or servers
• Runs independently of the other nodes/servers
> Great for row independent processing such as Model Scoring
> However, for analytic functions requiring all the data such as Model Building…
– Onus is on the R programmer to understand data parallelism
Challenges Running R in Parallel
1 1 1 1 2 9 1 7 9 3 9 9
System level calculation: 1 1 1 1 1 2 3 7 9 9 9 9 = 2.5
Example: Median (Midpoint)
Node Level
1. Find median per node
2. Consolidate and find
the midpoint of the
results
3. Produce the wrong
answer
System Level
1. Sort all the data
2. Take midpoint
3. Produce the right
answer
< Wrong
< Right
R Operations on Data
R operates on independent rows
> Score models for a given observation
> Parsing Text field
> Log(x)
R operates on independent
partitions
> Fit a model to a partition such as region,
time, product or store
R operates on the entire data set
> Global sales average
> Regression on all customers
R Client
R Client
R Client
• What statistical programming tools do you use?
> R/RRE
> SAS
> SPSS
> Statistica
> KXEN
Please choose all that apply
Poll Question #2
Who is Revolution Analytics?
Revolution Analytics
OUR COMPANY
The leading provider
of advanced
analytics software
and services
based on open source
R, since 2007
OUR SOFTWARE
The only Big Data, Big
Analytics software
platform based on the
data science language R
SOME KUDOS
Visionary
Gartner Magic Quadrant
for Advanced Analytics
Platforms, 2014
Finance Insurance
Healthcare & Pharma Digital Economy Analytics Service Providers
Manufacturing & High Tech
Revolution R Enterprise
is….
the only big data big analytics platform
based on open source R, the de facto
statistical computing language for
modern analytics
• High Performance, Scalable Analytics
• Portable Across Enterprise Platforms
• Easier to Build & Deploy Analytics
Big Data In-memory bound
Hybrid memory & disk
scalability
Operates on bigger
volumes & factors
Speed of
Analysis
Single threaded Parallel threading Shrinks analysis time
Enterprise
Readiness
Community support Commercial support
Delivers full service
production support
Analytic
Breadth &
Depth
5000+ innovative
analytic packages
Leverage open source
packages plus Big Data
ready packages
Supercharges R
Commercial
Viability
Risk of deployment of
open source
Commercial license
Eliminate risk with open
source
It Has Some Limitations for Enterprises
R: Open Source that Drives Innovation, but…
The Big Data Big Analytics Platform
Introducing Revolution R Enterprise (RRE)
DistributedR
DevelopR DeployR
ScaleR
ConnectR
• Big Data Big Analytics Ready
> Enterprise readiness
> High performance analytics
> Multi-platform architecture
> Data source integration
> Development tools
> Deployment tools
The Platform Step by Step:
R Capabilities
R+CRAN
• Open source R interpreter
• UPDATED R 3.1.1
• Freely-available R algorithms
• Algorithms callable by RevoR
• Embeddable in R scripts
• 100% Compatible with existing
R scripts, functions and
packages
RevoR
• Based on open source R
• Adds high-performance math
Available On:
• Teradata Database
• Hortonworks Hadoop
• Cloudera Hadoop
• MapR Hadoop
• IBM Platform LSF Linux
• Microsoft HPC Clusters
• Windows & Linux Servers
• Windows & Linux Workstations
DeployR
• Web services software
development kit for integration
analytics via Java, JavaScript or
.NET APIs
• Integrates R Into application
infrastructures
Capabilities:
• Invokes R Scripts from
web services calls
• RESTful interface for
easy integration
• Works with web & mobile apps,
leading BI & Visualization tools and
business rules engines
DevelopR
• Integrated development
environment for R
• Visual ‘step-into’ debugger
• Based on Visual Studio Isolated
Shell
Available on:
• Windows
DevelopR DeployR
The Platform Step by Step:
Tools & Deployment
DevelopR - Integrated Development Environment
Script with type ahead
and code snippets Solutions window for
organizing code and
data
Packages
installed and
loaded
Objects loaded
in the R
Environment
Object
details
Sophisticated debugging
with breakpoints ,
variable values etc.
DeployR - Integration with 3rd Party Software
• Seamless
– Bring the power of R to any web enabled application
• Simple
– Leverage common APIs including JS, Java, .NET
• Scalable
– Robustly scale user and compute workloads
• Secure
– Manage enterprise security with LDAP & SSO
Data Analysis
Business Intelligence
Mobile Web Apps
Cloud / SaaS
R / Statistical
Modeling Expert
DeployR
Deployment
Expert
The Platform Step by Step:
Parallelization & Data Sourcing
ConnectR
• High-speed & direct connectors
Available for:
• High-performance XDF
• SAS, SPSS, delimited & fixed format
text data files
• Hadoop HDFS (text & XDF)
• Teradata Database
• ODBC
ScaleR
• Ready-to-Use high-performance
big data big analytics
• Fully-parallelized analytics
• Data prep & data distillation
• Descriptive statistics & statistical
tests
• Correlation & covariance matrices
• Predictive Models – linear, logistic,
GLM
• Machine learning
• Monte Carlo simulation
• Tools for distributing customized
algorithms across nodes
DistributedR
• Distributed computing framework
• Delivers portability across platforms
Available on:
• Teradata Database
• Hortonworks / Cloudera / MapR
• Windows Servers / HPC Clusters
• IBM Platform LSF Linux Clusters
• Red Hat Linux Servers
• SuSE Linux Servers
Revolution R Enterprise ScaleR:
High Performance Big Data Analytics
Data Prep, Distillation & Descriptive Analytics
R Data Step
Descriptive
Statistics
Statistical
Tests
Sampling
• Data import – Delimited, Fixed,
SAS, SPSS, ODBC
• Variable creation &
transformation using any R
functions and packages
• Recode variables
• Factor variables
• Missing value handling
• Sort
• Merge
• Split
• Aggregate by category
(means, sums)
• Min / Max
• Mean
• Median (approx.)
• Quantiles (approx.)
• Standard Deviation
• Variance
• Correlation
• Covariance
• Sum of Squares (cross
product matrix)
• Pairwise Cross tabs
• Risk Ratio & Odds Ratio
• Cross-Tabulation of Data
• Marginal Summaries of
Cross Tabulations
• Chi Square Test
• Kendall Rank Correlation
• Fisher’s Exact Test
• Student’s t-Test
• Subsample (observations &
variables)
• Random Sampling
Revolution R Enterprise ScaleR (continued)
Predictive
Models
• Covariance/Correlation/Sum
of Squares/Cross-product
Matrix
• Multiple Linear Regression
• Logistic Regression
• Generalized Linear Models
(GLM) - All exponential
family distributions:
binomial, Gaussian, inverse
Gaussian, Poisson, Tweedie.
Standard link functions
including: cauchit, identity,
log, logit, probit.
- User defined distributions
& link functions.
• Classification & Regression
Trees and Forests
• Gradient Boosted Trees
• Residuals for all models
• Histogram
• ROC Curves (actual
data and predicted
values)
• Lorenz Curve
• Line and Scatter Plots
• Tree Visualization
Data
Visualization
Variable
Selection
• Stepwise Regression
• Linear
• Logistic
• GLM
• Monte Carlo
• Run open source R
functions and
packages across
cores and nodes
Cluster
Analysis
• K-Means
Classification
& Regression
• Decision Trees
• Decision Forests
• Gradient Boosted Trees
• Prediction (scoring)
• PMML Export
Simulation
and HPC
Deployment
Statistical Modeling Machine Learning
DistributedR
ScaleR
ConnectR
DeployR
Write Once…Deploy Anywhere.
DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE
In the Cloud Amazon AWS
Workstations & Servers Windows
Linux
Clustered Systems IBM Platform LSF
Microsoft HPC
Hadoop Hortonworks, Cloudera, MapR
EDW Teradata Database
• Challenge: Model and score 250M
customers
• Server-based workflow was
taking 3 days
• Move calculation in-database to
drastically reduce runtime,
process twice as many
customers, and increase lift
Case Study - Global Internet Marketplace
• Binomial Logistic Regression
> 50+ Independent variables including categorical with indicator
variables
> Train from small sample (many thousands) – not a problem in and
of itself
> Scoring across entire corpus (many hundred millions) – slightly
more challenging
Existing Open Source R model
• Same Binomial Logistic Regression
> 50+ Independent variables including categorical with indicator
variables
> Train from large sample (many millions) – more accurately
captures user patterns and increases lift
> Scoring across entire corpus (many hundred millions) – completes
in minutes
Revolution R Enterprise model
By moving the compute to the data
RRE Used to Optimized the Current Process
Before After
Reduced 3 day process to 10 minutes
Scaling study: Time vs. Number of Rows
Benchmarking the Optimized Process
rows
time
NOTE:
• Teradata Environment
> 4 node, 1700 Appliance
• RRE Environment
> version 7.2,
> R 3.0.2
Server-based (Not In-DB) In-DB
• Before
trainit <- glm(as.formula(specs[[i]]), data = training.data, family='binomial', maxit=iters)
fits <- predict(trainit, newdata=test.data, type='response')
• After
trainit <- rxGlm(as.formula(specs[[i]]), data = training.data, family='binomial', maxIterations=iters)
fits <- rxPredict(trainit, newdata=test.data, type='response')
Recode Open Source R to Revolution R Enterprise
Optimization process
Revolution R
Enterprise
How RRE Scale R
Actually Works
Open
Source R
Revolution R
Enterprise
Computation (4-core laptop) Open Source R Revolution R Speedup
Linear Algebra1
Matrix Multiply 176 sec 9.3 sec 18x
Cholesky Factorization 25.5 sec 1.3 sec 19x
Linear Discriminant Analysis 189 sec 74 sec 3x
General R Benchmarks2
R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x
R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable
1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php
2. http://r.research.att.com/benchmarks/
Customers report 3-50x
performance improvements
compared to Open Source R —
without changing any code
RevoR - Performance Enhanced R
Revolution R Enterprise:
Across Cores and Nodes
Scalable and Parallelized
• Anatomy of a PEMA: 1) Initialize, 2) Process Chunk,
3) Aggregate, 4) Finalize
• Process a chunk of data at a time, giving linear scalability
• Process an unlimited number of rows of data in a fixed amount of
RAM
• Independent of the “compute context” (number of cores,
computers, distributed computing platform), giving portability
across these dimensions
• Independent of where the data is coming from, giving portability
with respect to data sources
“Parallel External Memory Algorithms”
Scalability and Portability of PEMAs
• Efficient computational algorithms
• Efficient memory management – minimize data copying and
data conversion
• Heavy use of C++ templates; optimal code
• Efficient data file format; fast access by row and column
• Models are pre-analyzed to detect and remove duplicate
computations and points of failure (singularities)
• Handle categorical variables efficiently
ScaleR Performance
Speed and Scalability Comparison
• Unique PEMAs: Parallel, external-
memory algorithms
• High-performance, scalable
replacements for R/SAS analytic
functions
• Parallel/distributed processing
eliminates CPU bottleneck
• Data streaming eliminates
memory size limitations
• Works with in-memory and disk-
based architectures
In-Database Billion Row Logistic Regression
• 114 seconds on Teradata
2650 (6 nodes, 72 cores),
including time to read data
• Scales linearly with number
of rows
• Scales linearly with number
of nodes: 3x faster than on 2
node Teradata system
Allstate compares SAS, Hadoop, and R for
Big-Data Insurance Models
Approach Platform Time to fit
SAS 16-core Sun Server 5 hours
rmr/MapReduce 10-node 80-core
Hadoop Cluster
> 10 hours
R 250 GB Server Impossible (> 3 days)
Revolution R Enterprise In-Teradata on
6-node 2650
3.3 minutes
Generalized linear model, 150 million observations, 70 degrees of freedom
http://blog.revolutionanalytics.com/2012/10/allstate-big-data-glm.html
• At what stage are you in your in-database analytics
deployment project?
> Still researching tools and methods
> Evaluating/Selecting data storage/management platform
> Evaluating/Selecting analytics programming tools
> Launched the project/working on it now
> We’re done and looking for another one!
Please select one answer
Poll Question #3
• Revolution R Enterprise has a new “data source”, RxTeradata (ODBC and
TPT)
# Change the data source if necessary
tdConn <- "DRIVER=…; IP=…; DATABASE=…; UID=…; PWD=…“
teradataDS <- RxTeradata(table=“…", connectionString=tdConn, …)
• Revolution R Enterprise has a new “compute context”, RxInTeradata
# Change the “compute context”
tdCompute <- rxInTeradata(connectionString=..., shareDir=..., remoteShareDir=...,
revoPath=..., wait=.., consoleOutput=...)
• Sample code for R Logistic Regression
# Specify model formula and parameters
rxLogit(ArrDelay>15 ~ Origin + Year + Month + DayOfWeek + UniqueCarrier
+ F(CRSDepTime), data=teradataDS)
RRE End-User’s Perspective
• Table User Defined Functions (UDFs) allow users to place a
function in the FROM clause of a SELECT statement
• Table Operators extend the existing table UDF capability:
> Table Operators are Object Oriented
– Inputs and outputs can be arbitrary and not “fixed” as Table UDF’s require
> Table Operators have a simpler row iterator interface
– Interface simply produces output rows providing a more natural application development
interface than Table UDF’s
> Table operators operate on a stream of rows.
– Rows are buffered for high-performance, eliminating row at a time processing
> Table operators support PARTITON BY and ORDER BY
– Allows the development of Map Reduce style operators in-database
Table Operators – Teradata 14.10+
RRE Architecture in Teradata 14.10+
Worker
Process
Message Passing Layer
Master Process …
Request Response Teradata 14.10+
Data
Partition
Data
Partition
Data
Partition
Data
Partition
Master Process
Worker
Process
Worker
Process
Worker
Process…
* All communication is done by binary BLOB’s
PE Layer
AMP Layer
1. RRE commands are sent to a “Master
Process” - an External Stored Procedure
(XSP) in the Parsing Engine that provides
parallel coordination
2. RRE analytics are split into “Worker
Process” tasks that run in a Table Operator
(TO) on every AMP.
a. HPA analytics iterate over the data,
and intermediate results are
analyzed and managed by the XSP.
b. HPC analytics do not iterate, and
final results from each AMP are
returned to the XSP
3. Final combined results are assembled by
the XSP and returned to the user
tdConnect <- rxTeradata(<data, connection string, …>)
tdCompute <- rxInTeradata(<data, server arguments, …>)
** PUT-based Installer
• High-performance, scalable, portable, fully-featured
algorithms
• Integration with R ecosystem
• Compatibility with Big Data ecosystem
Summary
PARTNERS Mobile App
InfoHub Kiosks
teradata-partners.com
WE LOVE FEEDBACK
Questions
Rate this Session
Questions?
Resources for you (available on RevolutionAnalytics.com):
• White Paper: Teradata and Revolution Analytics: For the Big Data
Era, An Analytics Revolution
• Webinar: Big Data Analytics with Teradata and Revolution Analytics
PARTNERS Mobile App
InfoHub Kiosks
teradata-partners.com
WE LOVE FEEDBACK
Questions
Rate this Session
Thank You!
www.RevolutionAnalytics.com www.Teradata.com

More Related Content

What's hot

Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Revolution Analytics
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopRevolution Analytics
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudRevolution Analytics
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Revolution Analytics
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalRevolution Analytics
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormRevolution Analytics
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Moving From SAS to R Webinar Presentation - 07Aug14
Moving From SAS to R Webinar Presentation - 07Aug14Moving From SAS to R Webinar Presentation - 07Aug14
Moving From SAS to R Webinar Presentation - 07Aug14Revolution Analytics
 
Intro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User WebinarIntro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User WebinarRevolution Analytics
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and VerilogGanesan Narayanasamy
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopRevolution Analytics
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with RTechsparks
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopDataWorks Summit
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for HadoopWilly Marroquin (WillyDevNET)
 
R for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two StrategiesR for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two StrategiesRevolution Analytics
 

What's hot (20)

Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
Big Data Predictive Analytics with Revolution R Enterprise (Gartner BI Summit...
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
R and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with HadoopR and Big Data using Revolution R Enterprise with Hadoop
R and Big Data using Revolution R Enterprise with Hadoop
 
Taking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the CloudTaking R Analytics to SQL and the Cloud
Taking R Analytics to SQL and the Cloud
 
Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics? Are You Ready for Big Data Big Analytics?
Are You Ready for Big Data Big Analytics?
 
Revolution R: 100% R and more
Revolution R: 100% R and moreRevolution R: 100% R and more
Revolution R: 100% R and more
 
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Performance and Scale Options for R with Hadoop: A comparison of potential ar...
Performance and Scale Options for R with Hadoop: A comparison of potential ar...
 
The network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 finalThe network structure of cran 2015 07-02 final
The network structure of cran 2015 07-02 final
 
Big Data Analysis Starts with R
Big Data Analysis Starts with RBig Data Analysis Starts with R
Big Data Analysis Starts with R
 
Batter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and StormBatter Up! Advanced Sports Analytics with R and Storm
Batter Up! Advanced Sports Analytics with R and Storm
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Moving From SAS to R Webinar Presentation - 07Aug14
Moving From SAS to R Webinar Presentation - 07Aug14Moving From SAS to R Webinar Presentation - 07Aug14
Moving From SAS to R Webinar Presentation - 07Aug14
 
Intro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User WebinarIntro to R for SAS and SPSS User Webinar
Intro to R for SAS and SPSS User Webinar
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Big Data - Analytics with R
Big Data - Analytics with RBig Data - Analytics with R
Big Data - Analytics with R
 
High Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and HadoopHigh Performance Predictive Analytics in R and Hadoop
High Performance Predictive Analytics in R and Hadoop
 
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
Accelerating R analytics with Spark and  Microsoft R Server  for HadoopAccelerating R analytics with Spark and  Microsoft R Server  for Hadoop
Accelerating R analytics with Spark and Microsoft R Server for Hadoop
 
R for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two StrategiesR for SAS Users Complement or Replace Two Strategies
R for SAS Users Complement or Replace Two Strategies
 

Viewers also liked

The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceRevolution Analytics
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source CommunitiesRevolution Analytics
 
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICSBIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICSTIBCO Spotfire
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorRevolution Analytics
 
Reproducibility with Revolution R Open
Reproducibility with Revolution R OpenReproducibility with Revolution R Open
Reproducibility with Revolution R OpenRevolution Analytics
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your DataRTTS
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingRTTS
 
Distributed Computing Patterns in R
Distributed Computing Patterns in RDistributed Computing Patterns in R
Distributed Computing Patterns in Rarmstrtw
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageRevolution Analytics
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solutionRevolution Analytics
 
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionRTTS
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaRTTS
 
Deploying R in BI and Real time Applications
Deploying R in BI and Real time ApplicationsDeploying R in BI and Real time Applications
Deploying R in BI and Real time ApplicationsLou Bajuk
 
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...RTTS
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...RTTS
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaData Science Thailand
 

Viewers also liked (18)

R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
The Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data ScienceThe Business Economics and Opportunity of Open Source Data Science
The Business Economics and Opportunity of Open Source Data Science
 
The Value of Open Source Communities
The Value of Open Source CommunitiesThe Value of Open Source Communities
The Value of Open Source Communities
 
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICSBIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
BIG DATA ANALYTICS MEANS “IN-DATABASE” ANALYTICS
 
The Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductorThe Network structure of R packages on CRAN & BioConductor
The Network structure of R packages on CRAN & BioConductor
 
Reproducibility with Revolution R Open
Reproducibility with Revolution R OpenReproducibility with Revolution R Open
Reproducibility with Revolution R Open
 
Improve the Health of Your Data
Improve the Health of Your DataImprove the Health of Your Data
Improve the Health of Your Data
 
Query Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programmingQuery Wizards - data testing made easy - no programming
Query Wizards - data testing made easy - no programming
 
Distributed Computing Patterns in R
Distributed Computing Patterns in RDistributed Computing Patterns in R
Distributed Computing Patterns in R
 
Reproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint PackageReproducibility with Revolution R Open and the Checkpoint Package
Reproducibility with Revolution R Open and the Checkpoint Package
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
Warranty Predictive Analytics solution
Warranty Predictive Analytics solutionWarranty Predictive Analytics solution
Warranty Predictive Analytics solution
 
QuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solutionQuerySurge - the automated Data Testing solution
QuerySurge - the automated Data Testing solution
 
Leveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE VerticaLeveraging HPE ALM & QuerySurge to test HPE Vertica
Leveraging HPE ALM & QuerySurge to test HPE Vertica
 
Deploying R in BI and Real time Applications
Deploying R in BI and Real time ApplicationsDeploying R in BI and Real time Applications
Deploying R in BI and Real time Applications
 
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
Data Warehousing in Pharma: How to Find Bad Data while Meeting Regulatory Req...
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Microsoft R Server for Data Sciencea
Microsoft R Server for Data ScienceaMicrosoft R Server for Data Sciencea
Microsoft R Server for Data Sciencea
 

Similar to In-Database Analytics Deep Dive with Teradata and Revolution

microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computingBAINIDA
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statisticsIBM
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution Analytics
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & RŁukasz Grala
 
eRum2016 -RevoScaleR - Performance and Scalability R
eRum2016 -RevoScaleR - Performance and Scalability ReRum2016 -RevoScaleR - Performance and Scalability R
eRum2016 -RevoScaleR - Performance and Scalability RŁukasz Grala
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL ServerŁukasz Grala
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Data Con LA
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAlex Palamides
 
Analytics with R in SQL Server 2016
Analytics with R in SQL Server 2016Analytics with R in SQL Server 2016
Analytics with R in SQL Server 2016HARIHARAN R
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 
Where Should You Deliver Database Services From?
Where Should You Deliver Database Services From?Where Should You Deliver Database Services From?
Where Should You Deliver Database Services From?EDB
 
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...Rui Quintino
 
Big data analytics with R tool.pptx
Big data analytics with R tool.pptxBig data analytics with R tool.pptx
Big data analytics with R tool.pptxsalutiontechnology
 
Advanced analytics with R and SQL
Advanced analytics with R and SQLAdvanced analytics with R and SQL
Advanced analytics with R and SQLMSDEVMTL
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...Mark Rittman
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folksThomas Hütter
 

Similar to In-Database Analytics Deep Dive with Teradata and Revolution (20)

microsoft r server for distributed computing
microsoft r server for distributed computingmicrosoft r server for distributed computing
microsoft r server for distributed computing
 
Introduction to basic statistics
Introduction to basic statisticsIntroduction to basic statistics
Introduction to basic statistics
 
Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013Revolution R Enterprise - Portland R User Group, November 2013
Revolution R Enterprise - Portland R User Group, November 2013
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
 
Decision trees in hadoop
Decision trees in hadoopDecision trees in hadoop
Decision trees in hadoop
 
eRum2016 -RevoScaleR - Performance and Scalability R
eRum2016 -RevoScaleR - Performance and Scalability ReRum2016 -RevoScaleR - Performance and Scalability R
eRum2016 -RevoScaleR - Performance and Scalability R
 
Michal Marušan: Scalable R
Michal Marušan: Scalable RMichal Marušan: Scalable R
Michal Marušan: Scalable R
 
DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
 
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
 
Analytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using RAnalytics Beyond RAM Capacity using R
Analytics Beyond RAM Capacity using R
 
Analytics with R in SQL Server 2016
Analytics with R in SQL Server 2016Analytics with R in SQL Server 2016
Analytics with R in SQL Server 2016
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
Where Should You Deliver Database Services From?
Where Should You Deliver Database Services From?Where Should You Deliver Database Services From?
Where Should You Deliver Database Services From?
 
Building a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with RBuilding a Scalable Data Science Platform with R
Building a Scalable Data Science Platform with R
 
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
 
Big data analytics with R tool.pptx
Big data analytics with R tool.pptxBig data analytics with R tool.pptx
Big data analytics with R tool.pptx
 
Advanced analytics with R and SQL
Advanced analytics with R and SQLAdvanced analytics with R and SQL
Advanced analytics with R and SQL
 
LSESU a Taste of R Language Workshop
LSESU a Taste of R Language WorkshopLSESU a Taste of R Language Workshop
LSESU a Taste of R Language Workshop
 
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
From lots of reports (with some data Analysis) 
to Massive Data Analysis (Wit...
 
An R primer for SQL folks
An R primer for SQL folksAn R primer for SQL folks
An R primer for SQL folks
 

More from Revolution Analytics

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudRevolution Analytics
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureRevolution Analytics
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondRevolution Analytics
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with RRevolution Analytics
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution Analytics
 

More from Revolution Analytics (12)

Speeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the CloudSpeeding up R with Parallel Programming in the Cloud
Speeding up R with Parallel Programming in the Cloud
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
R in Minecraft
R in Minecraft R in Minecraft
R in Minecraft
 
The case for R for AI developers
The case for R for AI developersThe case for R for AI developers
The case for R for AI developers
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
The R Ecosystem
The R EcosystemThe R Ecosystem
The R Ecosystem
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
Building a scalable data science platform with R
Building a scalable data science platform with RBuilding a scalable data science platform with R
Building a scalable data science platform with R
 
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
Revolution R Enterprise 7.4 - Presentation by Bill Jacobs 11Jun15
 
R and Data Science
R and Data ScienceR and Data Science
R and Data Science
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 

In-Database Analytics Deep Dive with Teradata and Revolution

  • 1. Mario Inchiosa Chief Scientist, Revolution Analytics In-Database Analytics Deep Dive with Teradata and Revolution R Tim Miller Partner Integration Lab, Teradata
  • 2. • Introduction • Revolution R Enterprise • Case Study – Global Internet Marketplace • Under the Hood • Summary & Questions Agenda
  • 3. • What data storage/management software do you use? > Hadoop > Teradata > LSF Clusters/Grids > Servers Please choose all that apply Poll Question #1
  • 4. • Most powerful statistical programming language – Flexible, extensible and comprehensive for productivity • Most widely used data analysis software – Used by 2M+ data scientists, statisticians and analysts • Create beautiful and unique data visualizations – As seen in New York Times, Twitter and Flowing Data • Thriving open-source community – Leading edge of analytics research • Fills the talent gap – New graduates prefer R What is R? R is Hot bit.ly/r-is- hot WHITE PAPER
  • 5. Exploding growth and demand for R • R is the highest paid IT skill > Dice.com, Jan 2014 • R most-used data science language after SQL > O’Reilly, Jan 2014 • R is used by 70% of data miners > Rexer, Sep 2013 • R is #15 of all programming languages > RedMonk, Jan 2014 • R growing faster than any other data science language > KDnuggets, Aug 2013 • More than 2 million users worldwide R Usage Growth Rexer Data Miner Survey, 2007-2013 70% of data miners report using R R is the first choice of more data miners than any other software Source: www.rexeranalytics.com
  • 6. Debt<10% of Income Debt=0% Good Credit Risks Bad Credit Risks Good Credit Risks Yes YesYes NO NONO Income>$40K SQL Request Sample Data Debt<10% of Income Debt=0% Good Credit Risks Bad Credit Risks Good Credit Risks Yes YesYes NO NONO Income>$40K Results Desktop and Server Analytic Architecture In-Database Analytic Architecture Results Server Based vs. In-Database Architectures Why Is Teradata Different? Exponential Performance Improvement Analyst
  • 7. Node level calculation: 1 2 7 9 = 4.5 • R is distributed across nodes or servers • Runs independently of the other nodes/servers > Great for row independent processing such as Model Scoring > However, for analytic functions requiring all the data such as Model Building… – Onus is on the R programmer to understand data parallelism Challenges Running R in Parallel 1 1 1 1 2 9 1 7 9 3 9 9 System level calculation: 1 1 1 1 1 2 3 7 9 9 9 9 = 2.5 Example: Median (Midpoint) Node Level 1. Find median per node 2. Consolidate and find the midpoint of the results 3. Produce the wrong answer System Level 1. Sort all the data 2. Take midpoint 3. Produce the right answer < Wrong < Right
  • 8. R Operations on Data R operates on independent rows > Score models for a given observation > Parsing Text field > Log(x) R operates on independent partitions > Fit a model to a partition such as region, time, product or store R operates on the entire data set > Global sales average > Regression on all customers R Client R Client R Client
  • 9. • What statistical programming tools do you use? > R/RRE > SAS > SPSS > Statistica > KXEN Please choose all that apply Poll Question #2
  • 10. Who is Revolution Analytics? Revolution Analytics
  • 11. OUR COMPANY The leading provider of advanced analytics software and services based on open source R, since 2007 OUR SOFTWARE The only Big Data, Big Analytics software platform based on the data science language R SOME KUDOS Visionary Gartner Magic Quadrant for Advanced Analytics Platforms, 2014
  • 12. Finance Insurance Healthcare & Pharma Digital Economy Analytics Service Providers Manufacturing & High Tech
  • 13. Revolution R Enterprise is…. the only big data big analytics platform based on open source R, the de facto statistical computing language for modern analytics • High Performance, Scalable Analytics • Portable Across Enterprise Platforms • Easier to Build & Deploy Analytics
  • 14. Big Data In-memory bound Hybrid memory & disk scalability Operates on bigger volumes & factors Speed of Analysis Single threaded Parallel threading Shrinks analysis time Enterprise Readiness Community support Commercial support Delivers full service production support Analytic Breadth & Depth 5000+ innovative analytic packages Leverage open source packages plus Big Data ready packages Supercharges R Commercial Viability Risk of deployment of open source Commercial license Eliminate risk with open source It Has Some Limitations for Enterprises R: Open Source that Drives Innovation, but…
  • 15. The Big Data Big Analytics Platform Introducing Revolution R Enterprise (RRE) DistributedR DevelopR DeployR ScaleR ConnectR • Big Data Big Analytics Ready > Enterprise readiness > High performance analytics > Multi-platform architecture > Data source integration > Development tools > Deployment tools
  • 16. The Platform Step by Step: R Capabilities R+CRAN • Open source R interpreter • UPDATED R 3.1.1 • Freely-available R algorithms • Algorithms callable by RevoR • Embeddable in R scripts • 100% Compatible with existing R scripts, functions and packages RevoR • Based on open source R • Adds high-performance math Available On: • Teradata Database • Hortonworks Hadoop • Cloudera Hadoop • MapR Hadoop • IBM Platform LSF Linux • Microsoft HPC Clusters • Windows & Linux Servers • Windows & Linux Workstations
  • 17. DeployR • Web services software development kit for integration analytics via Java, JavaScript or .NET APIs • Integrates R Into application infrastructures Capabilities: • Invokes R Scripts from web services calls • RESTful interface for easy integration • Works with web & mobile apps, leading BI & Visualization tools and business rules engines DevelopR • Integrated development environment for R • Visual ‘step-into’ debugger • Based on Visual Studio Isolated Shell Available on: • Windows DevelopR DeployR The Platform Step by Step: Tools & Deployment
  • 18. DevelopR - Integrated Development Environment Script with type ahead and code snippets Solutions window for organizing code and data Packages installed and loaded Objects loaded in the R Environment Object details Sophisticated debugging with breakpoints , variable values etc.
  • 19. DeployR - Integration with 3rd Party Software • Seamless – Bring the power of R to any web enabled application • Simple – Leverage common APIs including JS, Java, .NET • Scalable – Robustly scale user and compute workloads • Secure – Manage enterprise security with LDAP & SSO Data Analysis Business Intelligence Mobile Web Apps Cloud / SaaS R / Statistical Modeling Expert DeployR Deployment Expert
  • 20. The Platform Step by Step: Parallelization & Data Sourcing ConnectR • High-speed & direct connectors Available for: • High-performance XDF • SAS, SPSS, delimited & fixed format text data files • Hadoop HDFS (text & XDF) • Teradata Database • ODBC ScaleR • Ready-to-Use high-performance big data big analytics • Fully-parallelized analytics • Data prep & data distillation • Descriptive statistics & statistical tests • Correlation & covariance matrices • Predictive Models – linear, logistic, GLM • Machine learning • Monte Carlo simulation • Tools for distributing customized algorithms across nodes DistributedR • Distributed computing framework • Delivers portability across platforms Available on: • Teradata Database • Hortonworks / Cloudera / MapR • Windows Servers / HPC Clusters • IBM Platform LSF Linux Clusters • Red Hat Linux Servers • SuSE Linux Servers
  • 21. Revolution R Enterprise ScaleR: High Performance Big Data Analytics Data Prep, Distillation & Descriptive Analytics R Data Step Descriptive Statistics Statistical Tests Sampling • Data import – Delimited, Fixed, SAS, SPSS, ODBC • Variable creation & transformation using any R functions and packages • Recode variables • Factor variables • Missing value handling • Sort • Merge • Split • Aggregate by category (means, sums) • Min / Max • Mean • Median (approx.) • Quantiles (approx.) • Standard Deviation • Variance • Correlation • Covariance • Sum of Squares (cross product matrix) • Pairwise Cross tabs • Risk Ratio & Odds Ratio • Cross-Tabulation of Data • Marginal Summaries of Cross Tabulations • Chi Square Test • Kendall Rank Correlation • Fisher’s Exact Test • Student’s t-Test • Subsample (observations & variables) • Random Sampling
  • 22. Revolution R Enterprise ScaleR (continued) Predictive Models • Covariance/Correlation/Sum of Squares/Cross-product Matrix • Multiple Linear Regression • Logistic Regression • Generalized Linear Models (GLM) - All exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions including: cauchit, identity, log, logit, probit. - User defined distributions & link functions. • Classification & Regression Trees and Forests • Gradient Boosted Trees • Residuals for all models • Histogram • ROC Curves (actual data and predicted values) • Lorenz Curve • Line and Scatter Plots • Tree Visualization Data Visualization Variable Selection • Stepwise Regression • Linear • Logistic • GLM • Monte Carlo • Run open source R functions and packages across cores and nodes Cluster Analysis • K-Means Classification & Regression • Decision Trees • Decision Forests • Gradient Boosted Trees • Prediction (scoring) • PMML Export Simulation and HPC Deployment Statistical Modeling Machine Learning
  • 23. DistributedR ScaleR ConnectR DeployR Write Once…Deploy Anywhere. DESIGNED FOR SCALE, PORTABILITY & PERFORMANCE In the Cloud Amazon AWS Workstations & Servers Windows Linux Clustered Systems IBM Platform LSF Microsoft HPC Hadoop Hortonworks, Cloudera, MapR EDW Teradata Database
  • 24. • Challenge: Model and score 250M customers • Server-based workflow was taking 3 days • Move calculation in-database to drastically reduce runtime, process twice as many customers, and increase lift Case Study - Global Internet Marketplace
  • 25. • Binomial Logistic Regression > 50+ Independent variables including categorical with indicator variables > Train from small sample (many thousands) – not a problem in and of itself > Scoring across entire corpus (many hundred millions) – slightly more challenging Existing Open Source R model
  • 26. • Same Binomial Logistic Regression > 50+ Independent variables including categorical with indicator variables > Train from large sample (many millions) – more accurately captures user patterns and increases lift > Scoring across entire corpus (many hundred millions) – completes in minutes Revolution R Enterprise model
  • 27. By moving the compute to the data RRE Used to Optimized the Current Process Before After Reduced 3 day process to 10 minutes
  • 28. Scaling study: Time vs. Number of Rows Benchmarking the Optimized Process rows time NOTE: • Teradata Environment > 4 node, 1700 Appliance • RRE Environment > version 7.2, > R 3.0.2 Server-based (Not In-DB) In-DB
  • 29. • Before trainit <- glm(as.formula(specs[[i]]), data = training.data, family='binomial', maxit=iters) fits <- predict(trainit, newdata=test.data, type='response') • After trainit <- rxGlm(as.formula(specs[[i]]), data = training.data, family='binomial', maxIterations=iters) fits <- rxPredict(trainit, newdata=test.data, type='response') Recode Open Source R to Revolution R Enterprise Optimization process
  • 30. Revolution R Enterprise How RRE Scale R Actually Works
  • 31. Open Source R Revolution R Enterprise Computation (4-core laptop) Open Source R Revolution R Speedup Linear Algebra1 Matrix Multiply 176 sec 9.3 sec 18x Cholesky Factorization 25.5 sec 1.3 sec 19x Linear Discriminant Analysis 189 sec 74 sec 3x General R Benchmarks2 R Benchmarks (Matrix Functions) 22 sec 3.5 sec 5x R Benchmarks (Program Control) 5.6 sec 5.4 sec Not appreciable 1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php 2. http://r.research.att.com/benchmarks/ Customers report 3-50x performance improvements compared to Open Source R — without changing any code RevoR - Performance Enhanced R Revolution R Enterprise:
  • 32. Across Cores and Nodes Scalable and Parallelized
  • 33. • Anatomy of a PEMA: 1) Initialize, 2) Process Chunk, 3) Aggregate, 4) Finalize • Process a chunk of data at a time, giving linear scalability • Process an unlimited number of rows of data in a fixed amount of RAM • Independent of the “compute context” (number of cores, computers, distributed computing platform), giving portability across these dimensions • Independent of where the data is coming from, giving portability with respect to data sources “Parallel External Memory Algorithms” Scalability and Portability of PEMAs
  • 34. • Efficient computational algorithms • Efficient memory management – minimize data copying and data conversion • Heavy use of C++ templates; optimal code • Efficient data file format; fast access by row and column • Models are pre-analyzed to detect and remove duplicate computations and points of failure (singularities) • Handle categorical variables efficiently ScaleR Performance
  • 35. Speed and Scalability Comparison • Unique PEMAs: Parallel, external- memory algorithms • High-performance, scalable replacements for R/SAS analytic functions • Parallel/distributed processing eliminates CPU bottleneck • Data streaming eliminates memory size limitations • Works with in-memory and disk- based architectures
  • 36. In-Database Billion Row Logistic Regression • 114 seconds on Teradata 2650 (6 nodes, 72 cores), including time to read data • Scales linearly with number of rows • Scales linearly with number of nodes: 3x faster than on 2 node Teradata system
  • 37. Allstate compares SAS, Hadoop, and R for Big-Data Insurance Models Approach Platform Time to fit SAS 16-core Sun Server 5 hours rmr/MapReduce 10-node 80-core Hadoop Cluster > 10 hours R 250 GB Server Impossible (> 3 days) Revolution R Enterprise In-Teradata on 6-node 2650 3.3 minutes Generalized linear model, 150 million observations, 70 degrees of freedom http://blog.revolutionanalytics.com/2012/10/allstate-big-data-glm.html
  • 38. • At what stage are you in your in-database analytics deployment project? > Still researching tools and methods > Evaluating/Selecting data storage/management platform > Evaluating/Selecting analytics programming tools > Launched the project/working on it now > We’re done and looking for another one! Please select one answer Poll Question #3
  • 39. • Revolution R Enterprise has a new “data source”, RxTeradata (ODBC and TPT) # Change the data source if necessary tdConn <- "DRIVER=…; IP=…; DATABASE=…; UID=…; PWD=…“ teradataDS <- RxTeradata(table=“…", connectionString=tdConn, …) • Revolution R Enterprise has a new “compute context”, RxInTeradata # Change the “compute context” tdCompute <- rxInTeradata(connectionString=..., shareDir=..., remoteShareDir=..., revoPath=..., wait=.., consoleOutput=...) • Sample code for R Logistic Regression # Specify model formula and parameters rxLogit(ArrDelay>15 ~ Origin + Year + Month + DayOfWeek + UniqueCarrier + F(CRSDepTime), data=teradataDS) RRE End-User’s Perspective
  • 40. • Table User Defined Functions (UDFs) allow users to place a function in the FROM clause of a SELECT statement • Table Operators extend the existing table UDF capability: > Table Operators are Object Oriented – Inputs and outputs can be arbitrary and not “fixed” as Table UDF’s require > Table Operators have a simpler row iterator interface – Interface simply produces output rows providing a more natural application development interface than Table UDF’s > Table operators operate on a stream of rows. – Rows are buffered for high-performance, eliminating row at a time processing > Table operators support PARTITON BY and ORDER BY – Allows the development of Map Reduce style operators in-database Table Operators – Teradata 14.10+
  • 41. RRE Architecture in Teradata 14.10+ Worker Process Message Passing Layer Master Process … Request Response Teradata 14.10+ Data Partition Data Partition Data Partition Data Partition Master Process Worker Process Worker Process Worker Process… * All communication is done by binary BLOB’s PE Layer AMP Layer 1. RRE commands are sent to a “Master Process” - an External Stored Procedure (XSP) in the Parsing Engine that provides parallel coordination 2. RRE analytics are split into “Worker Process” tasks that run in a Table Operator (TO) on every AMP. a. HPA analytics iterate over the data, and intermediate results are analyzed and managed by the XSP. b. HPC analytics do not iterate, and final results from each AMP are returned to the XSP 3. Final combined results are assembled by the XSP and returned to the user tdConnect <- rxTeradata(<data, connection string, …>) tdCompute <- rxInTeradata(<data, server arguments, …>) ** PUT-based Installer
  • 42. • High-performance, scalable, portable, fully-featured algorithms • Integration with R ecosystem • Compatibility with Big Data ecosystem Summary
  • 43. PARTNERS Mobile App InfoHub Kiosks teradata-partners.com WE LOVE FEEDBACK Questions Rate this Session Questions? Resources for you (available on RevolutionAnalytics.com): • White Paper: Teradata and Revolution Analytics: For the Big Data Era, An Analytics Revolution • Webinar: Big Data Analytics with Teradata and Revolution Analytics
  • 44. PARTNERS Mobile App InfoHub Kiosks teradata-partners.com WE LOVE FEEDBACK Questions Rate this Session Thank You! www.RevolutionAnalytics.com www.Teradata.com