SlideShare uma empresa Scribd logo
1 de 45
Baixar para ler offline
The Polyglot
Data Scientist
Adventures with R, Python, and SQL
Audience Survey
• How many here have used:
– SQL?
– Python?
– R?
• What job titles do people have?
What We Won’t Cover
• Theories behind data science and machine learning
• Deep dive into Python
• Deep dive into R
• Deep dive into SQL Server
There is a data science VM available on
Azure. It won’t be covered in this
presentation.
See https://docs.microsoft.com/en-
us/sql/advanced-analytics/getting-started-
with-machine-learning-services for details.
Azure Support
What We Will Cover
• The Problem with Being a Polyglot
• What SQL Server + R or SQL Server + Python Solves
• A Glance at these in Action
Not a Microsoft sales person…
• Microsoft MVP in
Visual Studio
• Been into exploring
data most of my life
• Been in tech over 20
years
• Practitioner and
hobbyist, not
researcher
Sample Problem: Sensor Data
• Domain: House of Sadukie
• Problem: Temperature data is
stored miserably
• Goal: Display data in a
visualization that makes sense
Current Outcome – via MySQL & R
Polyglot
Knowing or using several languages
SQL Server
Data Scientist
A person employed to analyze and
interpret complex digital data, such as
the usage statistics of a website,
especially in order to assist a business
in its decision-making
Multi-Faceted Data Science
• Various categories:
– Statistics – modeling, sampling, clustering, reduction
– Mathematics – NSA, astronomers, military
– Data engineering – database/memory/file optimization, Hadoop, data flows
– Machine learning and algorithms
– Business – ROI optimization, decision sciences
– Software engineering – primarily polyglots in production code
– Visualization
– Spatial
Source: https://www.datasciencecentral.com/profiles/blogs/six-categories-of-
data-scientists
The Problem with Being a Polyglot
• Understanding strengths and weaknesses of the languages
• Knowing which language is appropriate for what situation
Multiple tools…
multiple solutions…
how many
programs do I
have to use?!?
And wouldn’t it be
awesome if I could
use one tool to do
most of the work?
What R and Python Have to Offer
for SQL
• Libraries specialized to handle data science domain problems
including:
– Visualization
– Data exploration
– Statistical and Mathematical Analysis
– Trending
– Regression
• Libraries + Data right from the source = quicker exploratory analysis
• Python and R are great working from one large table and branch for
different directions
– Which can inspire additional analyses
Sample Problem: Sensor Data
• Number of rows: 400k+
• 1 Table
• Questions to look into:
– What are temperature trends over
time?
– When are sensors going offline?
– What temperatures look spot on?
– What sensors are wavering in reads
and showing inconsistencies?
Bringing the Computation
to the Data
Advanced Analytics
in
SQL Server 2016/2017
• SQL Server 2016
• SQL Server R Services / Machine
Learning Services
• SQL Server 2017
• SQL Server R Services / Machine
Learning Services
• Python Support
Sample Problem: Sensor Data
• Possible Strategy:
– Use SQL to gather the data into a
dataset that has the most amount of
data to observe.
– Use Python or R to manipulate the
data results and allow for easy analysis
and substantial predictions based on
observations.
Not Just Windows!
R Server for Windows
R Server for Linux
- CentOS
- RHEL
- Ubuntu
- SUSE
R Server for Hadoop – cluster in the cloud
R Server for Teradata – not as Machine Learning
Server
SQL Server as our Base
R and/or Python on Top
Additional pieces provided by MachineML:
Microsoft Machine Learning Services, RevoScaleR, RevoScalePy
Microsoft
Machine Learning
Services
Machine Learning Services in SQL
Server
• Allows integration of other languages in SQL Server
– SQL Server 2016 can work with R
– SQL Server 2017 introduces Python support
• Scalable in that you can develop and test on a single machine
and then deploy to distributed or parallel processing platforms.
Platforms include:
– SQL Server on Windows
– Hadoop
– Spark
SQL Server Machine Learning
Services (In-Database)
• SQL Server R Services (In-Database) started in SQL Server 2016
• With SQL Server 2017, SQL Server Machine Learning Services (In-
Database) allows us to use R and Python within SQL Server
• Do not need to open IDE and SQL tools to accomplish the work –
no context switching needed!
• Can call libraries from Python or R to process data right within
SQL
Python vs R?
• SQL Server 2016? R
• SQL Server 2017? R and/or Python
• What are you familiar with?
• Look at tutorials – what makes sense?
• What features do you need and how are they supported by
Microsoft ML?
Python Support
• CPython 3.5
• revoscalepy – Python equivalents of RevoScaleR
• Remote compute contexts
• Also supports familiar libraries such as:
– scikit-learn
– Tensorflow
– Caffe
– Theano/Keras
R Code in SQL
DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
SensorData <- SqlData;
print(summary(SensorData))';
DECLARE @sqlscript NVARCHAR(MAX);
SET @sqlscript = N'
SELECT * FROM Sensors;';
EXEC sp_execute_external_script
@language = N'R',
@script = @rscript,
@input_data_1 = @sqlscript,
@input_data_1_name = N'SqlData',
@output_data_1_name = N'SensorData';
Python Code in SQL
execute sp_execute_external_script
@language = N'Python',
@script = N'
summary = pandas.DataFrame.describe(InputDataSet)
print(summary.transpose())
',
@input_data_1 = N'SELECT * FROM Sensors';
GO
RevoScaleR and
RevoScalePy
What is RevoScaleR?
• A library written in R that includes functions for importing,
transforming, and analyzing data
• Scalable, portable, and easily distributable
• Things it can do include:
– Descriptive statistics
– Generalized linear models
– Logistic Regression
– Classification trees
– Decision forest
• Multithreaded and multinode
Running RevoScaleR
• Part of the Machine Learning Server and Microsoft R products
• Can use any R IDE to write scripts that use RevoScaleR
• Needs to be run on a computer with the interpreter and libraries
• Two modalities:
– Locally
– Remote compute context
– Shift execution to the server
– Windows server
– Hadoop
– Spark
Prediction
• Linear models
• Logistic regression models
• Generalized linear models
• Covariance and correlation
• Decision forest
• K-means clustering
Understanding Data with
RevoScaleR
Typical Workflow with RevoScaleRAnalyVVisuaMoveData
Import /
Export
TidyData
Clean
Manipulate
Transform
PresentData
Visualize
MakeDecisions
Analyze
Learn
Predict
Key Pieces for Analysis with
RevoScaleR
Data
Source
Compute
Context
Analytic
Function
Data Sources
• Comma-delimited text data
• SAS
• SPSS
• XDF
• ODBC
• Teradata
• SQL Server
Graphing
with
RevoScaleR
• rxHistogram
• rxLinePlot
• rxLorenz
• rxRocCurve
Descriptive Statistics
• rxQuantile
• rxSummary
• rxCrossTabs
• rxCube
Two Use Cases for Remote
Computer Context
• Running R in T-SQL scripts or stored procedures
• Calling RevoScaleR in R from a SQL context
Visual Studio 2017: One IDE with
Common Tools
• Python Tools for Visual Studio
• R Tools for Visual Studio
• SQL Server capabilities within Visual Studio
Additional Support
Polyglot Data Scientist Presentation
Resources
• R Services in SQL Server 2016 (Channel 9)
• Built-in machine learning in Microsoft SQL Server 2017 with Python
(Build 2017)
• MicrosoftML 1.3.0: What’s new for machine learning in Microsoft
R Server (Channel 9)
• Using Visual Studio for Machine Learning (Build 2017)
• Performance patterns for machine learning services in SQL Server
(Microsoft Ignite 2017)
Learn More
Resources
• Kaggle: The Home of Data Science and Machine Learning
• DataCamp: Learn R, Python, and Data Science Online
• Difference between Machine Learning, Data Science, AI, Deep
Learning, and Statistics – Vincent Granville
• Python Tutorial from Mode Analytics
• Coursera
– Mastering Software Development in R Specialization
– Data Science Specialization
– Applied Data Science with Python Specialization
– Executive Data Science Specialization
Contact Me
• Twitter: @sadukie
• Blog: http://codinggeekette.com
• Email:
sarah@cletechconsulting.com
Sarah Dutkiewicz
Cleveland Tech Consulting, LLC
Owner

Mais conteúdo relacionado

Mais procurados

Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software DevelopmentAlexis Seigneurin
 
Extending Apache Spark APIs Without Going Near Spark Source or a Compiler wi...
 Extending Apache Spark APIs Without Going Near Spark Source or a Compiler wi... Extending Apache Spark APIs Without Going Near Spark Source or a Compiler wi...
Extending Apache Spark APIs Without Going Near Spark Source or a Compiler wi...Databricks
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
Why & Where Knoldus Uses Rust?
Why & Where Knoldus Uses Rust?Why & Where Knoldus Uses Rust?
Why & Where Knoldus Uses Rust?Knoldus Inc.
 
Insights Without Tradeoffs: Using Structured Streaming
Insights Without Tradeoffs: Using Structured StreamingInsights Without Tradeoffs: Using Structured Streaming
Insights Without Tradeoffs: Using Structured StreamingDatabricks
 
scrazzl - A technical overview
scrazzl - A technical overviewscrazzl - A technical overview
scrazzl - A technical overviewscrazzl
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithDatabricks
 
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...Databricks
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraDatabricks
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise WideDatabricks
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataRobert Dempsey
 
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
What We Learned Building an R-Python Hybrid Predictive Analytics PipelineWhat We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
What We Learned Building an R-Python Hybrid Predictive Analytics PipelineWork-Bench
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about SparkGiivee The
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks
 
Unlock cassandra data for application developers using graphQL
Unlock cassandra data for application developers using graphQLUnlock cassandra data for application developers using graphQL
Unlock cassandra data for application developers using graphQLCédrick Lunven
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiDatabricks
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingDatabricks
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesDatabricks
 

Mais procurados (20)

Data Science meets Software Development
Data Science meets Software DevelopmentData Science meets Software Development
Data Science meets Software Development
 
Extending Apache Spark APIs Without Going Near Spark Source or a Compiler wi...
 Extending Apache Spark APIs Without Going Near Spark Source or a Compiler wi... Extending Apache Spark APIs Without Going Near Spark Source or a Compiler wi...
Extending Apache Spark APIs Without Going Near Spark Source or a Compiler wi...
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
Why & Where Knoldus Uses Rust?
Why & Where Knoldus Uses Rust?Why & Where Knoldus Uses Rust?
Why & Where Knoldus Uses Rust?
 
Insights Without Tradeoffs: Using Structured Streaming
Insights Without Tradeoffs: Using Structured StreamingInsights Without Tradeoffs: Using Structured Streaming
Insights Without Tradeoffs: Using Structured Streaming
 
Spark Worshop
Spark WorshopSpark Worshop
Spark Worshop
 
scrazzl - A technical overview
scrazzl - A technical overviewscrazzl - A technical overview
scrazzl - A technical overview
 
Getting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague GriffithGetting Ready to Use Redis with Apache Spark with Tague Griffith
Getting Ready to Use Redis with Apache Spark with Tague Griffith
 
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
From Python Scikit-learn to Scala Apache Spark—The Road to Uncovering Botnets...
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
 
10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide10 Things Learned Releasing Databricks Enterprise Wide
10 Things Learned Releasing Databricks Enterprise Wide
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of Data
 
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
What We Learned Building an R-Python Hybrid Predictive Analytics PipelineWhat We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
Unlock cassandra data for application developers using graphQL
Unlock cassandra data for application developers using graphQLUnlock cassandra data for application developers using graphQL
Unlock cassandra data for application developers using graphQL
 
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath LakkundiApache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
Apache Spark At Apple with Sam Maclennan and Vishwanath Lakkundi
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean DownesRental Cars and Industrialized Learning to Rank with Sean Downes
Rental Cars and Industrialized Learning to Rank with Sean Downes
 

Semelhante a The Polyglot Data Scientist - Exploring R, Python, and SQL Server

DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL ServerŁukasz Grala
 
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...Rui Quintino
 
Advanced analytics with R and SQL
Advanced analytics with R and SQLAdvanced analytics with R and SQL
Advanced analytics with R and SQLMSDEVMTL
 
Moving advanced analytics to your sql server databases
Moving advanced analytics to your sql server databasesMoving advanced analytics to your sql server databases
Moving advanced analytics to your sql server databasesEnrico van de Laar
 
Predictive Analysis using Microsoft SQL Server R Services
Predictive Analysis using Microsoft SQL Server R ServicesPredictive Analysis using Microsoft SQL Server R Services
Predictive Analysis using Microsoft SQL Server R ServicesFisnik Doko
 
Analytics with R in SQL Server 2016
Analytics with R in SQL Server 2016Analytics with R in SQL Server 2016
Analytics with R in SQL Server 2016HARIHARAN R
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkAlex Zeltov
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & RŁukasz Grala
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksDataWorks Summit
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksMapR Technologies
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summitOpen Analytics
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle TechnologiesOleksii Movchaniuk
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017James Serra
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopAmanda Casari
 
Data processing with spark in r &amp; python
Data processing with spark in r &amp; pythonData processing with spark in r &amp; python
Data processing with spark in r &amp; pythonMaloy Manna, PMP®
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 

Semelhante a The Polyglot Data Scientist - Exploring R, Python, and SQL Server (20)

DataMass Summit - Machine Learning for Big Data in SQL Server
DataMass Summit - Machine Learning for Big Data  in SQL ServerDataMass Summit - Machine Learning for Big Data  in SQL Server
DataMass Summit - Machine Learning for Big Data in SQL Server
 
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
Microsoft Data Platform Airlift 2017 Rui Quintino Machine Learning with SQL S...
 
Advanced analytics with R and SQL
Advanced analytics with R and SQLAdvanced analytics with R and SQL
Advanced analytics with R and SQL
 
Moving advanced analytics to your sql server databases
Moving advanced analytics to your sql server databasesMoving advanced analytics to your sql server databases
Moving advanced analytics to your sql server databases
 
Michal Marušan: Scalable R
Michal Marušan: Scalable RMichal Marušan: Scalable R
Michal Marušan: Scalable R
 
Predictive Analysis using Microsoft SQL Server R Services
Predictive Analysis using Microsoft SQL Server R ServicesPredictive Analysis using Microsoft SQL Server R Services
Predictive Analysis using Microsoft SQL Server R Services
 
Analytics with R in SQL Server 2016
Analytics with R in SQL Server 2016Analytics with R in SQL Server 2016
Analytics with R in SQL Server 2016
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
 
Ml2
Ml2Ml2
Ml2
 
20160317 - PAZUR - PowerBI & R
20160317  - PAZUR - PowerBI & R20160317  - PAZUR - PowerBI & R
20160317 - PAZUR - PowerBI & R
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksUsing Familiar BI Tools and Hadoop to Analyze Enterprise Networks
Using Familiar BI Tools and Hadoop to Analyze Enterprise Networks
 
No sql and sql - open analytics summit
No sql and sql - open analytics summitNo sql and sql - open analytics summit
No sql and sql - open analytics summit
 
Big Data & Oracle Technologies
Big Data & Oracle TechnologiesBig Data & Oracle Technologies
Big Data & Oracle Technologies
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
Data processing with spark in r &amp; python
Data processing with spark in r &amp; pythonData processing with spark in r &amp; python
Data processing with spark in r &amp; python
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
Big Data training
Big Data trainingBig Data training
Big Data training
 

Mais de Sarah Dutkiewicz

Passwordless Development using Azure Identity
Passwordless Development using Azure IdentityPasswordless Development using Azure Identity
Passwordless Development using Azure IdentitySarah Dutkiewicz
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksSarah Dutkiewicz
 
Azure DevOps for Developers
Azure DevOps for DevelopersAzure DevOps for Developers
Azure DevOps for DevelopersSarah Dutkiewicz
 
Azure DevOps for JavaScript Developers
Azure DevOps for JavaScript DevelopersAzure DevOps for JavaScript Developers
Azure DevOps for JavaScript DevelopersSarah Dutkiewicz
 
Azure DevOps for the Data Professional
Azure DevOps for the Data ProfessionalAzure DevOps for the Data Professional
Azure DevOps for the Data ProfessionalSarah Dutkiewicz
 
Noodling with Data in Jupyter Notebook
Noodling with Data in Jupyter NotebookNoodling with Data in Jupyter Notebook
Noodling with Data in Jupyter NotebookSarah Dutkiewicz
 
Becoming a Servant Leader, Leading from the Trenches
Becoming a Servant Leader, Leading from the TrenchesBecoming a Servant Leader, Leading from the Trenches
Becoming a Servant Leader, Leading from the TrenchesSarah Dutkiewicz
 
NEOISF - On Mentoring Future Techies
NEOISF - On Mentoring Future TechiesNEOISF - On Mentoring Future Techies
NEOISF - On Mentoring Future TechiesSarah Dutkiewicz
 
The importance of UX for Developers
The importance of UX for DevelopersThe importance of UX for Developers
The importance of UX for DevelopersSarah Dutkiewicz
 
The Impact of Women Trailblazers in Tech
The Impact of Women Trailblazers in TechThe Impact of Women Trailblazers in Tech
The Impact of Women Trailblazers in TechSarah Dutkiewicz
 
Unstoppable Course Final Presentation
Unstoppable Course Final PresentationUnstoppable Course Final Presentation
Unstoppable Course Final PresentationSarah Dutkiewicz
 
Even More Tools for the Developer's UX Toolbelt
Even More Tools for the Developer's UX ToolbeltEven More Tools for the Developer's UX Toolbelt
Even More Tools for the Developer's UX ToolbeltSarah Dutkiewicz
 
History of Women in Tech - Trivia
History of Women in Tech - TriviaHistory of Women in Tech - Trivia
History of Women in Tech - TriviaSarah Dutkiewicz
 
The UX Toolbelt for Developers
The UX Toolbelt for DevelopersThe UX Toolbelt for Developers
The UX Toolbelt for DevelopersSarah Dutkiewicz
 
World Usability Day 2014 - UX Toolbelt for Developers
World Usability Day 2014 - UX Toolbelt for DevelopersWorld Usability Day 2014 - UX Toolbelt for Developers
World Usability Day 2014 - UX Toolbelt for DevelopersSarah Dutkiewicz
 
The UX Toolbelt for Developers
The UX Toolbelt for DevelopersThe UX Toolbelt for Developers
The UX Toolbelt for DevelopersSarah Dutkiewicz
 
The Case for the UX Developer
The Case for the UX DeveloperThe Case for the UX Developer
The Case for the UX DeveloperSarah Dutkiewicz
 

Mais de Sarah Dutkiewicz (20)

Passwordless Development using Azure Identity
Passwordless Development using Azure IdentityPasswordless Development using Azure Identity
Passwordless Development using Azure Identity
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure Databricks
 
Azure DevOps for Developers
Azure DevOps for DevelopersAzure DevOps for Developers
Azure DevOps for Developers
 
Azure DevOps for JavaScript Developers
Azure DevOps for JavaScript DevelopersAzure DevOps for JavaScript Developers
Azure DevOps for JavaScript Developers
 
Azure DevOps for the Data Professional
Azure DevOps for the Data ProfessionalAzure DevOps for the Data Professional
Azure DevOps for the Data Professional
 
Noodling with Data in Jupyter Notebook
Noodling with Data in Jupyter NotebookNoodling with Data in Jupyter Notebook
Noodling with Data in Jupyter Notebook
 
Pairing and mobbing
Pairing and mobbingPairing and mobbing
Pairing and mobbing
 
Becoming a Servant Leader, Leading from the Trenches
Becoming a Servant Leader, Leading from the TrenchesBecoming a Servant Leader, Leading from the Trenches
Becoming a Servant Leader, Leading from the Trenches
 
NEOISF - On Mentoring Future Techies
NEOISF - On Mentoring Future TechiesNEOISF - On Mentoring Future Techies
NEOISF - On Mentoring Future Techies
 
Becoming a Servant Leader
Becoming a Servant LeaderBecoming a Servant Leader
Becoming a Servant Leader
 
The importance of UX for Developers
The importance of UX for DevelopersThe importance of UX for Developers
The importance of UX for Developers
 
The Impact of Women Trailblazers in Tech
The Impact of Women Trailblazers in TechThe Impact of Women Trailblazers in Tech
The Impact of Women Trailblazers in Tech
 
Unstoppable Course Final Presentation
Unstoppable Course Final PresentationUnstoppable Course Final Presentation
Unstoppable Course Final Presentation
 
Even More Tools for the Developer's UX Toolbelt
Even More Tools for the Developer's UX ToolbeltEven More Tools for the Developer's UX Toolbelt
Even More Tools for the Developer's UX Toolbelt
 
History of Women in Tech
History of Women in TechHistory of Women in Tech
History of Women in Tech
 
History of Women in Tech - Trivia
History of Women in Tech - TriviaHistory of Women in Tech - Trivia
History of Women in Tech - Trivia
 
The UX Toolbelt for Developers
The UX Toolbelt for DevelopersThe UX Toolbelt for Developers
The UX Toolbelt for Developers
 
World Usability Day 2014 - UX Toolbelt for Developers
World Usability Day 2014 - UX Toolbelt for DevelopersWorld Usability Day 2014 - UX Toolbelt for Developers
World Usability Day 2014 - UX Toolbelt for Developers
 
The UX Toolbelt for Developers
The UX Toolbelt for DevelopersThe UX Toolbelt for Developers
The UX Toolbelt for Developers
 
The Case for the UX Developer
The Case for the UX DeveloperThe Case for the UX Developer
The Case for the UX Developer
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

The Polyglot Data Scientist - Exploring R, Python, and SQL Server

  • 1. The Polyglot Data Scientist Adventures with R, Python, and SQL
  • 2. Audience Survey • How many here have used: – SQL? – Python? – R? • What job titles do people have?
  • 3. What We Won’t Cover • Theories behind data science and machine learning • Deep dive into Python • Deep dive into R • Deep dive into SQL Server
  • 4. There is a data science VM available on Azure. It won’t be covered in this presentation. See https://docs.microsoft.com/en- us/sql/advanced-analytics/getting-started- with-machine-learning-services for details. Azure Support
  • 5. What We Will Cover • The Problem with Being a Polyglot • What SQL Server + R or SQL Server + Python Solves • A Glance at these in Action
  • 6. Not a Microsoft sales person… • Microsoft MVP in Visual Studio • Been into exploring data most of my life • Been in tech over 20 years • Practitioner and hobbyist, not researcher
  • 7. Sample Problem: Sensor Data • Domain: House of Sadukie • Problem: Temperature data is stored miserably • Goal: Display data in a visualization that makes sense
  • 8. Current Outcome – via MySQL & R
  • 9. Polyglot Knowing or using several languages
  • 11. Data Scientist A person employed to analyze and interpret complex digital data, such as the usage statistics of a website, especially in order to assist a business in its decision-making
  • 12. Multi-Faceted Data Science • Various categories: – Statistics – modeling, sampling, clustering, reduction – Mathematics – NSA, astronomers, military – Data engineering – database/memory/file optimization, Hadoop, data flows – Machine learning and algorithms – Business – ROI optimization, decision sciences – Software engineering – primarily polyglots in production code – Visualization – Spatial Source: https://www.datasciencecentral.com/profiles/blogs/six-categories-of- data-scientists
  • 13. The Problem with Being a Polyglot • Understanding strengths and weaknesses of the languages • Knowing which language is appropriate for what situation
  • 14. Multiple tools… multiple solutions… how many programs do I have to use?!? And wouldn’t it be awesome if I could use one tool to do most of the work?
  • 15. What R and Python Have to Offer for SQL • Libraries specialized to handle data science domain problems including: – Visualization – Data exploration – Statistical and Mathematical Analysis – Trending – Regression • Libraries + Data right from the source = quicker exploratory analysis • Python and R are great working from one large table and branch for different directions – Which can inspire additional analyses
  • 16. Sample Problem: Sensor Data • Number of rows: 400k+ • 1 Table • Questions to look into: – What are temperature trends over time? – When are sensors going offline? – What temperatures look spot on? – What sensors are wavering in reads and showing inconsistencies?
  • 18. Advanced Analytics in SQL Server 2016/2017 • SQL Server 2016 • SQL Server R Services / Machine Learning Services • SQL Server 2017 • SQL Server R Services / Machine Learning Services • Python Support
  • 19. Sample Problem: Sensor Data • Possible Strategy: – Use SQL to gather the data into a dataset that has the most amount of data to observe. – Use Python or R to manipulate the data results and allow for easy analysis and substantial predictions based on observations.
  • 20. Not Just Windows! R Server for Windows R Server for Linux - CentOS - RHEL - Ubuntu - SUSE R Server for Hadoop – cluster in the cloud R Server for Teradata – not as Machine Learning Server
  • 21. SQL Server as our Base R and/or Python on Top Additional pieces provided by MachineML: Microsoft Machine Learning Services, RevoScaleR, RevoScalePy
  • 23. Machine Learning Services in SQL Server • Allows integration of other languages in SQL Server – SQL Server 2016 can work with R – SQL Server 2017 introduces Python support • Scalable in that you can develop and test on a single machine and then deploy to distributed or parallel processing platforms. Platforms include: – SQL Server on Windows – Hadoop – Spark
  • 24. SQL Server Machine Learning Services (In-Database) • SQL Server R Services (In-Database) started in SQL Server 2016 • With SQL Server 2017, SQL Server Machine Learning Services (In- Database) allows us to use R and Python within SQL Server • Do not need to open IDE and SQL tools to accomplish the work – no context switching needed! • Can call libraries from Python or R to process data right within SQL
  • 25. Python vs R? • SQL Server 2016? R • SQL Server 2017? R and/or Python • What are you familiar with? • Look at tutorials – what makes sense? • What features do you need and how are they supported by Microsoft ML?
  • 26. Python Support • CPython 3.5 • revoscalepy – Python equivalents of RevoScaleR • Remote compute contexts • Also supports familiar libraries such as: – scikit-learn – Tensorflow – Caffe – Theano/Keras
  • 27. R Code in SQL DECLARE @rscript NVARCHAR(MAX); SET @rscript = N' SensorData <- SqlData; print(summary(SensorData))'; DECLARE @sqlscript NVARCHAR(MAX); SET @sqlscript = N' SELECT * FROM Sensors;'; EXEC sp_execute_external_script @language = N'R', @script = @rscript, @input_data_1 = @sqlscript, @input_data_1_name = N'SqlData', @output_data_1_name = N'SensorData';
  • 28. Python Code in SQL execute sp_execute_external_script @language = N'Python', @script = N' summary = pandas.DataFrame.describe(InputDataSet) print(summary.transpose()) ', @input_data_1 = N'SELECT * FROM Sensors'; GO
  • 30. What is RevoScaleR? • A library written in R that includes functions for importing, transforming, and analyzing data • Scalable, portable, and easily distributable • Things it can do include: – Descriptive statistics – Generalized linear models – Logistic Regression – Classification trees – Decision forest • Multithreaded and multinode
  • 31. Running RevoScaleR • Part of the Machine Learning Server and Microsoft R products • Can use any R IDE to write scripts that use RevoScaleR • Needs to be run on a computer with the interpreter and libraries • Two modalities: – Locally – Remote compute context – Shift execution to the server – Windows server – Hadoop – Spark
  • 32. Prediction • Linear models • Logistic regression models • Generalized linear models • Covariance and correlation • Decision forest • K-means clustering
  • 34. Typical Workflow with RevoScaleRAnalyVVisuaMoveData Import / Export TidyData Clean Manipulate Transform PresentData Visualize MakeDecisions Analyze Learn Predict
  • 35. Key Pieces for Analysis with RevoScaleR Data Source Compute Context Analytic Function
  • 36. Data Sources • Comma-delimited text data • SAS • SPSS • XDF • ODBC • Teradata • SQL Server
  • 38. Descriptive Statistics • rxQuantile • rxSummary • rxCrossTabs • rxCube
  • 39. Two Use Cases for Remote Computer Context • Running R in T-SQL scripts or stored procedures • Calling RevoScaleR in R from a SQL context
  • 40. Visual Studio 2017: One IDE with Common Tools • Python Tools for Visual Studio • R Tools for Visual Studio • SQL Server capabilities within Visual Studio
  • 42. Polyglot Data Scientist Presentation Resources • R Services in SQL Server 2016 (Channel 9) • Built-in machine learning in Microsoft SQL Server 2017 with Python (Build 2017) • MicrosoftML 1.3.0: What’s new for machine learning in Microsoft R Server (Channel 9) • Using Visual Studio for Machine Learning (Build 2017) • Performance patterns for machine learning services in SQL Server (Microsoft Ignite 2017)
  • 44. Resources • Kaggle: The Home of Data Science and Machine Learning • DataCamp: Learn R, Python, and Data Science Online • Difference between Machine Learning, Data Science, AI, Deep Learning, and Statistics – Vincent Granville • Python Tutorial from Mode Analytics • Coursera – Mastering Software Development in R Specialization – Data Science Specialization – Applied Data Science with Python Specialization – Executive Data Science Specialization
  • 45. Contact Me • Twitter: @sadukie • Blog: http://codinggeekette.com • Email: sarah@cletechconsulting.com Sarah Dutkiewicz Cleveland Tech Consulting, LLC Owner