1. Advanced Analytics with R and SQL
Stéphane Fréchette
Data Platform Solution Architect
Twitter: @sfrechette
2.
3. 3
SQL Server
enables
data mining
using SSAS
Computers
work on
users behalf,
filtering junk
email
Microsoft
search
engine built
with machine
learning
Bing Maps
ships with
ML traffic-
prediction
service
1999 20082004 2005
Microsoft
Kinect can
watch users
gestures
Microsoft
launches
Azure
Machine
Learning
Successful,
real-time,
speech-to-
speech
translation
2012 20142010
Microsoft
launches R
server for
scalable,
enterprise
grade
analytics
SQL ‘16
supports
advanced
analytics in-
DB using R
2015 2016
I believe over the next decade computing will become even more ubiquitous and
intelligence will become ambient. This will be made possible by an ever-growing network of
connected devices, incredible computing capacity from the cloud, insights from big data, and
intelligence from machine learning.
Machine learning is pervasive throughout Microsoft products.
8. R Usage Growth
Rexer Data Miner Survey, 2007-2015
Language Popularity
IEEE Spectrum Top Programming Languages, 2015
76% of analytic
professionals
report using R
36% select R as
their primary tool
9. • R is an open source (GNU) version of
the S language developed by John
Chambers et al. at Bell Labs in 80’s
History of R
• R was initially written in early 1990’s
by Robert Gentleman and Ross Ihaka
then with the Statistics Department
of the University of Auckland
• R is administered and controlled by
the R Foundation
• Microsoft is founding member and
Platinum Sponsor of R Consortium
R Reference Card from CRAN
10. Open Source
“lingua franca”
Analytics, Computing,
Modeling
CRAN Task View by Barry Rowlingson: http://www.maths.lancs.ac.uk/~rowlings/R/TaskViews/
More packages on Github and BioConductor project
11. Works With Open Source R
Enterprise Scale & Performance
– Scales from workstations to large clusters
– Scales to large data sizes
– Growing portfolio of Parallelized algorithms
Secure, Scalable R Deployment/Operationalization
Write Once Deploy Anywhere for multiple platforms
– RDBMS: SQL Server & TeraData
– Windows, Linux: RedHat & SUSE
– Hadoop: HortonWorks, Cloudera, MapR
– Cloud: AzureVMs, Azure HDInsight
R Tools for Visual Studio IDE
DeployRRTVS
R Open MicrosoftR Server
12. • Microsoft R Server for Redhat Linux
• Microsoft R Server for SUSE Linux
• Microsoft R Server for Teradata DB
• Microsoft R Server for Hadoop on Redhat
Microsoft R Server
13. R Open MicrosoftR Server
DeployRRTVS
ConnectR
• High-speed & direct
connectors
Available for:
• High-performance XDF
• SAS, SPSS, delimited & fixed
format text data files
• Hadoop HDFS (text & XDF)
• Teradata Database & Aster
• EDWs and ADWs
• ODBC
ScaleR
• Ready-to-Use high-performance
big data big analytics
• Fully-parallelized analytics
• Data prep & data distillation
• Descriptive statistics & statistical tests
• Range of predictive functions
• User tools for distributing customized R algorithms
across nodes
• Wide data sets supported – thousands of variables
DistributedR
• Distributed computing framework
• Delivers cross-platform portability
R+CRAN
• Open source R interpreter
• R 3.1.2
• Freely-available huge range of R
algorithms
• Algorithms callable by RevoR
• Embeddable in R scripts
• 100% Compatible with existing R scripts,
functions and packages
Microsoft R Open
• Based on open source R
• High-performance math library
to speed up linear algebra
functions
• Checkpoint package to easily
share R code and replicate
results using specific R
package versions
DeployR
• RESTful APIs for easy
integration from Java,
JavaScript, .NET
• Enterprise authentication &
security
• Horizontal scaling
R Tools for Visual Studio
• State of the art, R Tools for Visual Studio IDE
14. Demo: Intro to R
What is R?
R Language Resources
R Tools for Visual Studio (RTVS)
RTVS Video
Microsoft R Server
R Sample Programs
16. Relevant data available in real-time Ingest
All relevant data available in real-time Query
All relevant data available for analytics in real-time Analytics
These are 3 key ingredients to build an Intelligent Application
OperationalizeModelPrepare
19. Working from my R IDE on my workstation, I can execute an R script that runs in-database, and get the
results back.
Microsoft R Open
Microsoft R Server
R IDE
Data Scientist Workstation
SQL Server 2016
Script
Results
Execution1 2
3
sqlCompute <- RxInSqlServer()
rxSetComputeContext(sqlCompute)
linModObj <- rxLinMod()
Microsoft R Open
Microsoft R Server
Advanced Analytics
Extensions
20. I can call a T-SQL System Stored Procedure from my application and have it trigger R script execution in-
database. Results are then returned to my application (predictions, plots, etc).
Application
Call System Stored Procedure
Results: scores, plots
The stored procedure
contains R code and
executes in-database.
1
3
exec sp_execute_external_script
@ languague = ‘R’
, @script =
-- R code --
SQL Server 2016
2
Microsoft R Open
Microsoft R Server
Advanced Analytics
Extensions
22. Operationalize R scripts and models
SQL Server 2016 extensibility
mechanism allows secure execution
of R scripts on the SQL Server
Use familiar T-SQL stored procedures
to invoke R scripts from your application.
Embed the returned predictions and
plots in your application.
Enterprise Performance and scale
Use SQL Server’s in-memory querying
and Columnstore Indexes
Leverage RevoScaleR support for large
datasets and parallel algorithms with SQL
Server 2016 Enterprise Edition.
Bring compute to data
with In-Database analytics
23.
24. Microsoft R Server
Big-data analytics and distributed computing on Linux,
Hadoop and Teradata
SQL Server 2016
R Services
Big-data analytics integrated with SQL Server database
Visual Studio
R Tools for Visual Studio: integrated development
environment for R
R Sample Programs
Github repository of data and samples to learn capabilities
of Open Source R and Microsoft R Server
SQL Server 2016
Learn about the full suite of capabilities in the latest version
of SQL Server