SlideShare uma empresa Scribd logo
1 de 53
Models in Minutes not Months:
Data Science as Microservices
Sarah Aerni, PhD
saerni@salesforce.com
@itweetsarah
​Einstein Platform
LIVE DEMO
Agenda
​BUILDING AI APPS: Perspective Of A Data Scientist
• Journey to building your first model
• Barriers to production along the way
​DEPLOYING MODELS IN PRODUCTION: Built For Reuse
• Where engineering and applications meet AI
• DevOps in Data Science – monitoring, alerting and iterating
​AUTO MACHINE LEARNING: Machine Learning Pipelines as a Collection of Microservices
• Create reusable ML pipeline code for multiple applications customers
• Data Scientists focus on exploration, validation and adding new apps and models
ENABLING DATA SCIENCE
​A DATA SCIENTISTS VIEW OF BUILDING MODELS
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
A data scientist’s view of
the journey to building
models
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
A data scientist’s view of
the journey to building
models
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
#Leads
Created Date
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
Engineer Features
Empty fields
One-hot encoding (pivoting)
Email domain of a user
Business titles of a user
Historical spend
Email-Company Name Similarity
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy​>>> from sklearn import svm
>>> from numpy import loadtxt as l, random as r
>>> pls = numpy.loadtxt("leadFeatures.data", delimiter=",")
>>> testSet = r.choice(len(pls), int(len(pls)*.7), replace=False)
>>> X, y = pls[-testSet,:-1], pls[-testSet:,-1]
>>> clf = svm.SVC()
>>> clf.fit(X,y)
SVC(C=1.0, cache_size=200, class_weight=None,
coef0=0.0,decision_function_shape=None, degree=3,
gamma='auto', kernel='rbf', max_iter=-1,
tol=0.001, verbose=False)
>>> clf.score(pls[testSet,:-1],pls[testSet,-1])
0.88571428571428568
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
Geographies
#Leads
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
Access and
Explore Data
Engineer
Features and
Build Models
Interpret Model
Results and
Accuracy
Fresh Data Input
Delivery of
Predictions
Bringing a Model to Production Requires a Team
​Applications deliver predictions for
customer consumption
​Predictions are produced by the
models live in production
​Pipelines deliver the data for modeling
and scoring at an appropriate latency
​Monitoring systems allow us to check
the health of the models, data,
pipelines and app
Source: Salesforce Customer Relationship Survey conducted 2014-2016 among 10,500+ customers randomly selected. Response sizes per question vary.
Data Engineers
Provide data access and management
capabilities for data scientists
Set up and monitor data pipelines
Improve performance of data
processing pipelines
Front-End Developers
Build customer-facing UI
Application instrumentation and
logging
​Data Scientists
​Continue evaluating models
​Monitor for anomalies and degradation
​Iteratively improve models in production
Bringing a Model to Production Requires a Team
Platform Engineers
Machine resource management
Alerting and monitoring
Product Managers
Gather requirements & feedback
Provide business context
Supporting a Model in Production is Complex
Only a small fraction of real-world ML systems is a composed of ML code, as
shown by the small black box in the middle. The required surrounding
infrastructure is fast and complex.
D. Sculley, et al. Hidden technical debt in machine learning systems. In Neural Information Processing
Systems (NIPS). 2015
MODELS IN PRODUDCTION
​WHAT IT TAKES TO DEPLOY AN AI-POWERED
APPLICATION
Supporting Models in Production is Mostly NOT AI
Only a small fraction of real-world ML systems
is a composed of ML code, as shown by the
small black box in the middle. The required
surrounding infrastructure is fast and
complex.
Adapted from D. Sculley, et al. Hidden technical debt in machine
learning systems. In Neural Information Processing Systems
(NIPS). 2015
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
Data
Sources
…
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
Data
Sources
…
Web UICLI
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
Admin / Authentication Service
Data
Sources
…
Web UICLI
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor ServiceDatastore Service
Admin / Authentication Service
Data
Sources
…
Web UICLI
Why Data Services are Critical
…
DataConnector
Data Hub
Object Store in
Blob Storage
Catalog
AccessControl
Data Registry
Applications
App
Data
Prov
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Pusher
Datastore Service
Admin / Authentication Service
Data
Sources
…
Web UICLI
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Pusher
Datastore Service
Admin / Authentication Service
Data
Sources
…
Web UICLI Exploration Tool
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Pusher
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Web UICLI Exploration Tool
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Preparator
Data
Pusher
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Web UICLI Exploration Tool
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Data Puller
Data
Preparator
Data
Pusher
CI Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Web UICLI Exploration Tool
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Monitoring Tool
Data Puller
Data
Preparator
Data
Pusher
CI Service
Monitoring
Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Web UICLI Exploration Tool
Monitoring your AI’s health like any other app
​Pipelines, Model Performance, Scores – Invest your time where it is needed!
Sample Dashboard on Simulated Data
​Model Performance at Evaluation​Distribution of Scores at Evaluation
​105,874
Scores Written Per
Hour(1 day moving
avg)
​0.86
Evaluation auROC
​Total Number of Scores Written Per Hour
​150,000
​
100,000
​
50,000
​
0
​Total Number of Scores Written Per Week
​10,000,000
​
100,000
​
100
​
0
​PercentofTotalPredictionsin
Bucket
​Prediction Probability Bucket
​TotalPredictionCount
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Monitoring Tool
Data Puller
Data
Preparator
Data
Pusher
CI Service
Monitoring
Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Provisioning Service
Web UICLI Exploration Tool
Why Data Services are Critical
…
DataConnector
Data Hub
Object Store in
Blob Storage
Catalog
AccessControl
Data Registry
Applications
App
Data
Prov
Applications
App 1
App 2
App 3
Prov
Prov
Prov
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
c
Executor Service
Monitoring Tool
Data Puller
Data
Preparator
Data
Pusher
Scheduler
Model
Management
Service
Control System
CI Service
Monitoring
Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Provisioning Service
Web UICLI Exploration Tool
c
Executor Service
Monitoring Tool
Data Puller
Data
Preparator
Data
Pusher
Scheduler
Model
Management
Service
Control System
CI Service
Monitoring
Service
Auxiliary Services
Datastore Service
Admin / Authentication Service
Data
Sources
…
Modeling /
Scoring
Provisioning Service
Web UICLI Exploration ToolMicroservice architecture
Customizable model-evaluation &
monitoring dashboards
Scheduling and workflow
management
In-platform secured
experimentation and exploration
Data Scientists focus their
efforts on modeling and
evaluating results
How the Salesforce Einstein Platform Enables Data Scientists
​Deploy, monitor and iterate on models in one location
Why Stop at Microservices for Supporting Your ML Code?
Why stop here?
Your ML code can also be just a
collection of microservices!
Configuration
Data Collection
Data Verification
Machine Resource
Management
Monitoring
Serving
Infrastructure
Feature
Extraction
Analysis Tools
Process Management Tools
ML Code
Auto Machine Learning
​Building reusable ML code
Leveraging Platform Services to Easily Deploy 1000s of Apps
Data Scientists on App #1
Leveraging Platform Services to Easily Deploy 1000s of Apps
Data Scientists on App #2Data Scientists on App #1
Let’s Add a Third App
Data Scientists on App #2 Data Scientists on App #3Data Scientists on App #1
How This Process Would Look in Salesforce
150,000 customers
​LeadIQ App ​Activity App ​Predictive
Journeys App
​App #5 ​App #6 ​App #7 ​App #8
​Opportunity
App
​App #9 ​App #10 ​App #11 ​App #12
​LeadIQ App ​Activity App ​Predictive
Journeys App
​App #5 ​App #6 ​App #7 ​App #8
​Opportunity
App
​App #9 ​App #10 ​App #11 ​App #12
​LeadIQ App ​Activity App ​Predictive
Journeys App
​App #5 ​App #6 ​App #7 ​App #8
​Opportunity
App
​App #9 ​App #10 ​App #11 ​App #12
Einstein’sNewApproachtoAI
Democratizing AI for Everyone
Classical
Approach
Data
Sampling
Feature
Selection
Model
Selection
Score
Calibration
Integrate to
Application
Artificial
Intelligence
Einstein
Auto-ML
Data already prepped
Models automatically built
Predictions delivered in context
AI for CRM
Discover
Predict
Recommend
Automate
Numerical BucketsCategorical Variables Text Fields
​AutoML for feature engineering
Repeatable Elements in Machine Learning Pipelines
Categorical Variables
​AutoML for feature engineering
Repeatable Elements in Machine Learning Pipelines
1 0 0
1 0 0
0 0 1
0 0 1
0 1 0
0 0 1
0 0 0
0 1 0
Text Fields
​AutoML for feature engineering
Repeatable Elements in Machine Learning Pipelines
Word Count
Word Count (no
stop words)
Is English Sentiment
4 2 1 1
6 3 1 1
9 4 0 0
6 4 0 -1
7 3 1 0
5 1 1 0
7 3 1 0
Numerical Buckets
​AutoML for feature engineering
Repeatable Elements in Machine Learning Pipelines
What Now? How autoML can choose your model
​>>> from sklearn import svm
>>> from numpy import loadtxt as l, random as r
>>> clf = svm.SVC()
>>> pls = numpy.loadtxt("leadFeatures.data", delimiter=",")
>>> testSet = r.choice(len(pls), int(len(pls)*.7), replace=False)
>>> X, y = pls[-testSet,:-1], pls[-testSet:,-1]
>>> clf.fit(X,y)
SVC(C=1.0, cache_size=200, class_weight=None,
coef0=0.0,decision_function_shape=None, degree=3,
gamma='auto', kernel='rbf', max_iter=-1,
probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
>>> clf.score(pls[testSet,:-1],pls[testSet,-1])
0.88571428571428568
Should we try other model forms?Should we try other model forms?
Features?
Should we try other model forms?
Features?
Kernels or hyperparameters?
Each use case will have its own
model and features to use. We
enable building separate models
and features with 1 code base
using OP
Model 1
83% accuracy
Model 3
73% accuracy
Model 4
89% accuracy
…
Model 1 Model 2
Model 3 Model 4
…
MODEL GENERATION MODEL TESTINGCUSTOMER A
Model 2
91% accuracy
Customer ID
Age
Age Group
Gender
Valid Address
1 2 3 4 5
22 30 45 23 60
A A B A C
M F F M M
Y N N Y Y
A tournament of models!
A tournament of models!
…
Model 1 Model 2
Model 3 Model 4
…
CUSTOMER B
Customer ID
Age
Age Group
Gender
Valid Address
1 2 3 4 5
34 22 66 58 41
B A C C B
M M F M F
Y Y N N Y
Model 1
Model 3
Model 4
Customer B
Model 2
Customer A
MODEL GENERATION MODEL TESTINGCUSTOMER A
Customer ID
Age
Age Group
Gender
Valid Address
1 2 3 4 5
22 30 45 23 60
A A B A C
M F F M M
Y N N Y Y
Deploy Monitors, Monitor, Repeat!
Sample Dashboard on Simulated Data
​134
Models in
Production
​215
Models Trained
(curr.month)
​98.51%
Models with Above
Chance Performance
​35,573,664
Predictions Written
Per Day (7 day avg)
​8
Experiments Run this
Week
Deploy Monitors, Monitor, Repeat!
​Pipelines, Model Performance, Scores – Invest your time where it is needed!
Sample Dashboard on Simulated Data
​Model Performance at Evaluation​Distribution of Scores at Evaluation
​105,874
Scores Written Per
Hour(1 day moving
avg)
​0.86
Evaluation auROC
​Total Number of Scores Written Per Hour
​150,000
​
100,000
​
50,000
​
0
​Total Number of Scores Written Per Week
​10,000,000
​
100,000
​
100
​
0
​PercentofTotalPredictionsin
Bucket
​Prediction Probability Bucket
​TotalPredictionCount
Deploy Monitors, Monitor, Repeat!
Sample Dashboard on Simulated Data
​134
Models in
Production
​215
Models Trained
(curr.month)
​98.51%
Models with Above
Chance Performance
​216
Models Trained
(curr.month)
​99.25%
Models with Above
Chance Performance
​35,573,664
Predictions Written
Per Day (7 day avg)
​12
Experiments Run this
Week
Key Takeaways
• Deploying machine learning in production is hard
• Platforms are critical for enabling data scientist productivity
• Plan for multiple apps… always
• To ensure enabling rapid identification of areas of improvement and efficacy of new approaches
provide
• Monitoring services
• Experimentation frameworks
• Identify opportunities for reusability in all aspects, even your machine learning pipelines
• Help simplify the process of experimenting, deploying, and iterating
Models in Minutes using AutoML

Mais conteúdo relacionado

Mais procurados

Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
Stepan Pushkarev
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
Johann Schleier-Smith
 

Mais procurados (20)

Automate your Machine Learning
Automate your Machine LearningAutomate your Machine Learning
Automate your Machine Learning
 
MLFlow as part of ML CI/CD at Avalara
MLFlow as part of ML CI/CD at AvalaraMLFlow as part of ML CI/CD at Avalara
MLFlow as part of ML CI/CD at Avalara
 
Facebook ML Infrastructure - 2018 slides
Facebook ML Infrastructure - 2018 slidesFacebook ML Infrastructure - 2018 slides
Facebook ML Infrastructure - 2018 slides
 
AzureML TechTalk
AzureML TechTalkAzureML TechTalk
AzureML TechTalk
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle ManagementMLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
 
Serverless machine learning operations
Serverless machine learning operationsServerless machine learning operations
Serverless machine learning operations
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
MLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at ScaleMLOps Virtual Event: Automating ML at Scale
MLOps Virtual Event: Automating ML at Scale
 
Machine Learning Platform Life-Cycle Management
Machine Learning Platform Life-Cycle ManagementMachine Learning Platform Life-Cycle Management
Machine Learning Platform Life-Cycle Management
 
Rest microservice ml_deployment_ntalagala_ai_conf_2019
Rest microservice ml_deployment_ntalagala_ai_conf_2019Rest microservice ml_deployment_ntalagala_ai_conf_2019
Rest microservice ml_deployment_ntalagala_ai_conf_2019
 
StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)
 
NextGenML
NextGenML NextGenML
NextGenML
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
 
Seamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflowSeamless MLOps with Seldon and MLflow
Seamless MLOps with Seldon and MLflow
 
A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning A practical guidance of the enterprise machine learning
A practical guidance of the enterprise machine learning
 
Managing your ML lifecycle with Azure Databricks and Azure ML
Managing your ML lifecycle with Azure Databricks and Azure MLManaging your ML lifecycle with Azure Databricks and Azure ML
Managing your ML lifecycle with Azure Databricks and Azure ML
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
 
TejasveeBolisetty
TejasveeBolisettyTejasveeBolisetty
TejasveeBolisetty
 

Semelhante a Models in Minutes using AutoML

Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
DataWorks Summit
 
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Le_GFII
 

Semelhante a Models in Minutes using AutoML (20)

DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
 
Microsoft DevOps for AI with GoDataDriven
Microsoft DevOps for AI with GoDataDrivenMicrosoft DevOps for AI with GoDataDriven
Microsoft DevOps for AI with GoDataDriven
 
Machine Learning Models in Production
Machine Learning Models in ProductionMachine Learning Models in Production
Machine Learning Models in Production
 
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
 
Introducing MLOps.pdf
Introducing MLOps.pdfIntroducing MLOps.pdf
Introducing MLOps.pdf
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Machine learning
Machine learningMachine learning
Machine learning
 
Future.ready().watson dataplatform 01
Future.ready().watson dataplatform 01Future.ready().watson dataplatform 01
Future.ready().watson dataplatform 01
 
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014Machine Learning on dirty data - Dataiku - Forum du GFII 2014
Machine Learning on dirty data - Dataiku - Forum du GFII 2014
 
Machine Learning Operations & Azure
Machine Learning Operations & AzureMachine Learning Operations & Azure
Machine Learning Operations & Azure
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx
[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx
[DSC Adria 23] Antoni Ivanov Practical Kimball Data Patterns.pptx
 
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future VisionMLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
MLOps journey at Swisscom: AI Use Cases, Architecture and Future Vision
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital Markets
 
Operationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at StarbucksOperationalizing Machine Learning at Scale at Starbucks
Operationalizing Machine Learning at Scale at Starbucks
 
Introduction to Machine learning and Deep Learning
Introduction to Machine learning and Deep LearningIntroduction to Machine learning and Deep Learning
Introduction to Machine learning and Deep Learning
 
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in PracticeGDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
 
Overview on Azure Machine Learning
Overview on Azure Machine LearningOverview on Azure Machine Learning
Overview on Azure Machine Learning
 
Building a mature foundation for life in the cloud
Building a mature foundation for life in the cloudBuilding a mature foundation for life in the cloud
Building a mature foundation for life in the cloud
 
Microsoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the CloudMicrosoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the Cloud
 

Mais de Bill Liu

Mais de Bill Liu (20)

Walk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectWalk Through a Real World ML Production Project
Walk Through a Real World ML Production Project
 
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...
 
Productizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeProductizing Machine Learning at the Edge
Productizing Machine Learning at the Edge
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps Workflows
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
 
Practical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScalePractical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at Scale
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
 
Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
Weekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileWeekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on Mobile
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningWeekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
 
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsAISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with Microeconomics
 
AISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldAISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First World
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeAISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the Edge
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 

Models in Minutes using AutoML

  • 1.
  • 2. Models in Minutes not Months: Data Science as Microservices Sarah Aerni, PhD saerni@salesforce.com @itweetsarah ​Einstein Platform
  • 4. Agenda ​BUILDING AI APPS: Perspective Of A Data Scientist • Journey to building your first model • Barriers to production along the way ​DEPLOYING MODELS IN PRODUCTION: Built For Reuse • Where engineering and applications meet AI • DevOps in Data Science – monitoring, alerting and iterating ​AUTO MACHINE LEARNING: Machine Learning Pipelines as a Collection of Microservices • Create reusable ML pipeline code for multiple applications customers • Data Scientists focus on exploration, validation and adding new apps and models
  • 5. ENABLING DATA SCIENCE ​A DATA SCIENTISTS VIEW OF BUILDING MODELS
  • 6. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy A data scientist’s view of the journey to building models
  • 7. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy A data scientist’s view of the journey to building models
  • 8. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy #Leads Created Date
  • 9. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy Engineer Features Empty fields One-hot encoding (pivoting) Email domain of a user Business titles of a user Historical spend Email-Company Name Similarity
  • 10. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy​>>> from sklearn import svm >>> from numpy import loadtxt as l, random as r >>> pls = numpy.loadtxt("leadFeatures.data", delimiter=",") >>> testSet = r.choice(len(pls), int(len(pls)*.7), replace=False) >>> X, y = pls[-testSet,:-1], pls[-testSet:,-1] >>> clf = svm.SVC() >>> clf.fit(X,y) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape=None, degree=3, gamma='auto', kernel='rbf', max_iter=-1, tol=0.001, verbose=False) >>> clf.score(pls[testSet,:-1],pls[testSet,-1]) 0.88571428571428568
  • 11. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy Geographies #Leads
  • 12. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy
  • 13. Access and Explore Data Engineer Features and Build Models Interpret Model Results and Accuracy Fresh Data Input Delivery of Predictions
  • 14. Bringing a Model to Production Requires a Team ​Applications deliver predictions for customer consumption ​Predictions are produced by the models live in production ​Pipelines deliver the data for modeling and scoring at an appropriate latency ​Monitoring systems allow us to check the health of the models, data, pipelines and app Source: Salesforce Customer Relationship Survey conducted 2014-2016 among 10,500+ customers randomly selected. Response sizes per question vary.
  • 15. Data Engineers Provide data access and management capabilities for data scientists Set up and monitor data pipelines Improve performance of data processing pipelines Front-End Developers Build customer-facing UI Application instrumentation and logging ​Data Scientists ​Continue evaluating models ​Monitor for anomalies and degradation ​Iteratively improve models in production Bringing a Model to Production Requires a Team Platform Engineers Machine resource management Alerting and monitoring Product Managers Gather requirements & feedback Provide business context
  • 16. Supporting a Model in Production is Complex Only a small fraction of real-world ML systems is a composed of ML code, as shown by the small black box in the middle. The required surrounding infrastructure is fast and complex. D. Sculley, et al. Hidden technical debt in machine learning systems. In Neural Information Processing Systems (NIPS). 2015
  • 17. MODELS IN PRODUDCTION ​WHAT IT TAKES TO DEPLOY AN AI-POWERED APPLICATION
  • 18. Supporting Models in Production is Mostly NOT AI Only a small fraction of real-world ML systems is a composed of ML code, as shown by the small black box in the middle. The required surrounding infrastructure is fast and complex. Adapted from D. Sculley, et al. Hidden technical debt in machine learning systems. In Neural Information Processing Systems (NIPS). 2015 Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code
  • 19. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location Data Sources …
  • 20. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location Data Sources … Web UICLI
  • 21. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location Admin / Authentication Service Data Sources … Web UICLI
  • 22. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor ServiceDatastore Service Admin / Authentication Service Data Sources … Web UICLI
  • 23. Why Data Services are Critical … DataConnector Data Hub Object Store in Blob Storage Catalog AccessControl Data Registry Applications App Data Prov
  • 24. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Pusher Datastore Service Admin / Authentication Service Data Sources … Web UICLI
  • 25. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Pusher Datastore Service Admin / Authentication Service Data Sources … Web UICLI Exploration Tool
  • 26. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Pusher Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Web UICLI Exploration Tool
  • 27. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Preparator Data Pusher Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Web UICLI Exploration Tool
  • 28. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Data Puller Data Preparator Data Pusher CI Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Web UICLI Exploration Tool
  • 29. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Monitoring Tool Data Puller Data Preparator Data Pusher CI Service Monitoring Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Web UICLI Exploration Tool
  • 30. Monitoring your AI’s health like any other app ​Pipelines, Model Performance, Scores – Invest your time where it is needed! Sample Dashboard on Simulated Data ​Model Performance at Evaluation​Distribution of Scores at Evaluation ​105,874 Scores Written Per Hour(1 day moving avg) ​0.86 Evaluation auROC ​Total Number of Scores Written Per Hour ​150,000 ​ 100,000 ​ 50,000 ​ 0 ​Total Number of Scores Written Per Week ​10,000,000 ​ 100,000 ​ 100 ​ 0 ​PercentofTotalPredictionsin Bucket ​Prediction Probability Bucket ​TotalPredictionCount
  • 31. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Monitoring Tool Data Puller Data Preparator Data Pusher CI Service Monitoring Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Provisioning Service Web UICLI Exploration Tool
  • 32. Why Data Services are Critical … DataConnector Data Hub Object Store in Blob Storage Catalog AccessControl Data Registry Applications App Data Prov Applications App 1 App 2 App 3 Prov Prov Prov
  • 33. Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location c Executor Service Monitoring Tool Data Puller Data Preparator Data Pusher Scheduler Model Management Service Control System CI Service Monitoring Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Provisioning Service Web UICLI Exploration Tool
  • 34. c Executor Service Monitoring Tool Data Puller Data Preparator Data Pusher Scheduler Model Management Service Control System CI Service Monitoring Service Auxiliary Services Datastore Service Admin / Authentication Service Data Sources … Modeling / Scoring Provisioning Service Web UICLI Exploration ToolMicroservice architecture Customizable model-evaluation & monitoring dashboards Scheduling and workflow management In-platform secured experimentation and exploration Data Scientists focus their efforts on modeling and evaluating results How the Salesforce Einstein Platform Enables Data Scientists ​Deploy, monitor and iterate on models in one location
  • 35. Why Stop at Microservices for Supporting Your ML Code? Why stop here? Your ML code can also be just a collection of microservices! Configuration Data Collection Data Verification Machine Resource Management Monitoring Serving Infrastructure Feature Extraction Analysis Tools Process Management Tools ML Code
  • 37. Leveraging Platform Services to Easily Deploy 1000s of Apps Data Scientists on App #1
  • 38. Leveraging Platform Services to Easily Deploy 1000s of Apps Data Scientists on App #2Data Scientists on App #1
  • 39. Let’s Add a Third App Data Scientists on App #2 Data Scientists on App #3Data Scientists on App #1
  • 40. How This Process Would Look in Salesforce 150,000 customers ​LeadIQ App ​Activity App ​Predictive Journeys App ​App #5 ​App #6 ​App #7 ​App #8 ​Opportunity App ​App #9 ​App #10 ​App #11 ​App #12 ​LeadIQ App ​Activity App ​Predictive Journeys App ​App #5 ​App #6 ​App #7 ​App #8 ​Opportunity App ​App #9 ​App #10 ​App #11 ​App #12 ​LeadIQ App ​Activity App ​Predictive Journeys App ​App #5 ​App #6 ​App #7 ​App #8 ​Opportunity App ​App #9 ​App #10 ​App #11 ​App #12
  • 41. Einstein’sNewApproachtoAI Democratizing AI for Everyone Classical Approach Data Sampling Feature Selection Model Selection Score Calibration Integrate to Application Artificial Intelligence Einstein Auto-ML Data already prepped Models automatically built Predictions delivered in context AI for CRM Discover Predict Recommend Automate
  • 42. Numerical BucketsCategorical Variables Text Fields ​AutoML for feature engineering Repeatable Elements in Machine Learning Pipelines
  • 43. Categorical Variables ​AutoML for feature engineering Repeatable Elements in Machine Learning Pipelines 1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 1 0
  • 44. Text Fields ​AutoML for feature engineering Repeatable Elements in Machine Learning Pipelines Word Count Word Count (no stop words) Is English Sentiment 4 2 1 1 6 3 1 1 9 4 0 0 6 4 0 -1 7 3 1 0 5 1 1 0 7 3 1 0
  • 45. Numerical Buckets ​AutoML for feature engineering Repeatable Elements in Machine Learning Pipelines
  • 46. What Now? How autoML can choose your model ​>>> from sklearn import svm >>> from numpy import loadtxt as l, random as r >>> clf = svm.SVC() >>> pls = numpy.loadtxt("leadFeatures.data", delimiter=",") >>> testSet = r.choice(len(pls), int(len(pls)*.7), replace=False) >>> X, y = pls[-testSet,:-1], pls[-testSet:,-1] >>> clf.fit(X,y) SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,decision_function_shape=None, degree=3, gamma='auto', kernel='rbf', max_iter=-1, probability=False, random_state=None, shrinking=True, tol=0.001, verbose=False) >>> clf.score(pls[testSet,:-1],pls[testSet,-1]) 0.88571428571428568 Should we try other model forms?Should we try other model forms? Features? Should we try other model forms? Features? Kernels or hyperparameters? Each use case will have its own model and features to use. We enable building separate models and features with 1 code base using OP
  • 47. Model 1 83% accuracy Model 3 73% accuracy Model 4 89% accuracy … Model 1 Model 2 Model 3 Model 4 … MODEL GENERATION MODEL TESTINGCUSTOMER A Model 2 91% accuracy Customer ID Age Age Group Gender Valid Address 1 2 3 4 5 22 30 45 23 60 A A B A C M F F M M Y N N Y Y A tournament of models!
  • 48. A tournament of models! … Model 1 Model 2 Model 3 Model 4 … CUSTOMER B Customer ID Age Age Group Gender Valid Address 1 2 3 4 5 34 22 66 58 41 B A C C B M M F M F Y Y N N Y Model 1 Model 3 Model 4 Customer B Model 2 Customer A MODEL GENERATION MODEL TESTINGCUSTOMER A Customer ID Age Age Group Gender Valid Address 1 2 3 4 5 22 30 45 23 60 A A B A C M F F M M Y N N Y Y
  • 49. Deploy Monitors, Monitor, Repeat! Sample Dashboard on Simulated Data ​134 Models in Production ​215 Models Trained (curr.month) ​98.51% Models with Above Chance Performance ​35,573,664 Predictions Written Per Day (7 day avg) ​8 Experiments Run this Week
  • 50. Deploy Monitors, Monitor, Repeat! ​Pipelines, Model Performance, Scores – Invest your time where it is needed! Sample Dashboard on Simulated Data ​Model Performance at Evaluation​Distribution of Scores at Evaluation ​105,874 Scores Written Per Hour(1 day moving avg) ​0.86 Evaluation auROC ​Total Number of Scores Written Per Hour ​150,000 ​ 100,000 ​ 50,000 ​ 0 ​Total Number of Scores Written Per Week ​10,000,000 ​ 100,000 ​ 100 ​ 0 ​PercentofTotalPredictionsin Bucket ​Prediction Probability Bucket ​TotalPredictionCount
  • 51. Deploy Monitors, Monitor, Repeat! Sample Dashboard on Simulated Data ​134 Models in Production ​215 Models Trained (curr.month) ​98.51% Models with Above Chance Performance ​216 Models Trained (curr.month) ​99.25% Models with Above Chance Performance ​35,573,664 Predictions Written Per Day (7 day avg) ​12 Experiments Run this Week
  • 52. Key Takeaways • Deploying machine learning in production is hard • Platforms are critical for enabling data scientist productivity • Plan for multiple apps… always • To ensure enabling rapid identification of areas of improvement and efficacy of new approaches provide • Monitoring services • Experimentation frameworks • Identify opportunities for reusability in all aspects, even your machine learning pipelines • Help simplify the process of experimenting, deploying, and iterating