SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Model Monitoring at Scale with Apache
Spark and Verta
Manasi Vartak, Ph.D.
Founder and CEO, Verta Inc
www.verta.ai | @DataCereal
About
2
https://github.com/VertaAI/modeldb
- Ph.D. thesis at MIT CSAIL on model
management and diagnosis
- Created ModelDB: Open-source ML
model management & versioning
- Released at Spark Summit 2017!
- ML @ Twitter, Google, Facebook
https://www.verta.ai/product
- End-to-end MLOps platform for ML model
delivery, operations and monitoring
- Serving models for some of the top tech cos,
finance, insurance, etc.
Agenda
▴ Why Model Monitoring?
▴ What is Model Monitoring?
▴ Generalized Framework for Model Monitoring
▴ Monitoring at scale with Apache Spark
▴ Wrap up
3
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
ML Models are used across all functions
AI-ML doesn’t always work as expected
...models used to predict
delinquencies suddenly stopped
working as the pandemic hit
since data used to build them was
simply no longer relevant.
-- Head of Consumer
Banking, Top US Bank
https://www.globalbankingandfinance.com/a-framew
ork-for-analytics-operational-risk-management/
What we are hearing in the field
7
Our ad-serving system saw a
revenue loss of $20K in 10
minutes and we had no idea
why that happened. We had to dig
through all kinds of logs to piece
together what had happened
Our model results are used to
make automated pricing
decisions. So preempting bad
model predictions can
save us millions of dollars
Engineers with minimal ML expertise
consume these models, so they are
black-boxes to them. We try our
best to tell the product team
when something is wrong
with the model, but that’s really
hard to do
Head of DS,
US Ad-tech Company
ML Team Manager,
Silicon Valley Unicorn
Top-5 US
E-Commerce Retailer
8
How do we solve these problems?
Enter, Model Monitoring.
What is Model Monitoring?
9
▴ Know when models are failing
▴ Quickly find the root cause
▴ Close the loop by fast recovery
10
Ensuring model results are
consistently of high quality
*We refer to all latency, throughput etc. as model service health
I. How can we know when a model fails?
11
Model
Input Output
Ground-
truth
Model
Input Output
Ground-
truth
30 days
Feedback is not
instantaneous
Featurized
Data
(train)
Input
(train)
Featurized
Data
(test)
vs.
Input
(test)
vs.
Output
(train)
Output
(test)
vs.
II. How can we find the root cause of model failures?
12
DB1
DB2
ETL1 ETL3 Model1 Pred1
DB3
ETL4 ETL5 Model2 Pred2 ??
DB4
ETL6 Model3 Pred3
??
ETL2
DB2
ETL7 ETL8
III. How can we close the loop for fast recovery?
▴ Know the problem before it happens so you can take action
○ E.g., Missing feature? Impute or fall back to a different model
○ E.g., Set alerts on upstream data so that defects do not propagate downstream
▴ Close the loop by integrating into rest of ML pipeline
○ Re-train model
○ Send data to labeling software
○ Fall back to previous version of the model
13
What’s the alternative?
14
M
input
output
Logs
Analysis
Pipeline
Monitoring
X 100
X 100
▴ Custom analysis pipelines for each
model type (maintenance burden)
▴ Difficult to get a global view (vs.
per-model view) required for root
cause analysis
▴ Takes >quarter to get something
basic set up
Challenges with ML Monitoring
▴ Measurement. Measuring quality in absence of ground-truth is challenging
▴ Customization. Quality metrics are unique to each model type and domain
▴ Pipeline Jungles. Convoluted model lineage and data pipelines make root cause
analysis extremely hard
▴ Accessibility. For non-experts to consume models, monitoring must be easy to plug
in and interpret
▴ Scale. Must scale to large datasets, large number of statistics, and to live+batch
inference
15
Introducing a Generalized
Framework for Model
Monitoring
16
Goals
▴ Make it flexible
○ Monitor models running on any serving platform, any ML framework
○ Monitor data pipeline, batch and live models
▴ Make it customizable
○ Use out of box statistics, or
○ Define your own custom functions and statistical properties to monitor & visualize
▴ Close the loop
○ Automate recovery and alert resolution process
17
How does it work?
18
...
Ground truth
Data/Model
Pipelines
Remediation
- Retrain
- Rollback
- Human loop
Models
(Batch, Live)
Take automated
actions
Get notified
Get insights,
visualize, debug
Configure
profilers, alerts
Ingest ground-truth
Ingest input, output
Ingest data
Data1
Data1
SummarySamples
+ Metadata
SummarySamples
+ Metadata
How does data ingest work?
19
Data1 Profiler1
SummarySamples
+ Metadata
SummarySamples
+ Metadata
SummarySamples
+ Metadata
ProfilerN
SummarySamples
+ Metadata
...
Summary1
SummaryN
...
Data2
DataN
...
Profiler1 ...
Profiler2 ...
...
summary.enable_live(profiler)
20
But what about real-time?
Demo: Monitoring Spark ML
Pipelines
21
Demo Setup
▴ Batch prediction pipeline w/Spark
▴ New data arrives daily
22
DB1
DB2
ETL1 ETL2 Model Pred
DB1
DB2
ETL1 ETL2 Model Pred
DB1
DB2
ETL1 ETL2 Model Pred
??
Demo Setup
23
CSV
StringIndexer
(0)
GBDT
(4)
Pred
StringIndexer
(1)
StringIndexer
(2)
VectorAssembler
(3)
But what if?
▴ 3 interconnected pipelines w/model
dependencies
▴ What happens when DB2 is broken?
▴ What happens when ETL4 is broken?
24
DB1
DB2
ETL1 ETL2 Model1 Pred1
DB3
ETL3 ETL4
Model
2
Pred2
??
DB4
ETL5
Model
3
Pred3
??
Summary
▴ ML Models drive key user experiences and business decisions
▴ Model Monitoring ensures model results are consistently of high quality
▴ When done right, Model Monitoring can:
○ Save $20K in 10 mins
○ Identify failing models before social media does!
○ Safely democratize AI
25
26
Thank you.
Intrigued? Check out: https://monitoring.verta.ai

Mais conteúdo relacionado

Mais procurados

SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
Databricks
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
 

Mais procurados (20)

Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflowAutomatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
 
Apply MLOps at Scale
Apply MLOps at ScaleApply MLOps at Scale
Apply MLOps at Scale
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
 
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
Jeremy Engle's slides from Redshift / Big Data meetup on July 13, 2017
 
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
(BDT303) Construct Your ETL Pipeline with AWS Data Pipeline, Amazon EMR, and ...
 
SQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at ComcastSQL Analytics Powering Telemetry Analysis at Comcast
SQL Analytics Powering Telemetry Analysis at Comcast
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
 
Creating an Effective MDM Strategy for Salesforce
Creating an Effective MDM Strategy for SalesforceCreating an Effective MDM Strategy for Salesforce
Creating an Effective MDM Strategy for Salesforce
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
Cloud Adoption
Cloud AdoptionCloud Adoption
Cloud Adoption
 
Data analytics and powerbi intro
Data analytics and powerbi introData analytics and powerbi intro
Data analytics and powerbi intro
 
Introducing Amazon SageMaker
Introducing Amazon SageMakerIntroducing Amazon SageMaker
Introducing Amazon SageMaker
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure Databricks
 
Databricks Overview for MLOps
Databricks Overview for MLOpsDatabricks Overview for MLOps
Databricks Overview for MLOps
 
Building a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksBuilding a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with Databricks
 
MDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large Enterprises
 
Washington DC DataOps Meetup -- Nov 2019
Washington DC DataOps Meetup   -- Nov 2019Washington DC DataOps Meetup   -- Nov 2019
Washington DC DataOps Meetup -- Nov 2019
 

Semelhante a Model Monitoring at Scale with Apache Spark and Verta

Semelhante a Model Monitoring at Scale with Apache Spark and Verta (20)

Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Machine Learning and Analytics in Splunk
Machine Learning and Analytics in SplunkMachine Learning and Analytics in Splunk
Machine Learning and Analytics in Splunk
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk Machine Learning + Analytics in Splunk
Machine Learning + Analytics in Splunk
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor A...
How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor A...How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor A...
How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor A...
 
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
Big Data LDN 2018: HOW AUTOMATION CAN ACCELERATE THE DELIVERY OF MACHINE LEAR...
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
 
Splunk for Machine Learning and Analytics
Splunk for Machine Learning and AnalyticsSplunk for Machine Learning and Analytics
Splunk for Machine Learning and Analytics
 
Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)Databricks for MLOps Presentation (AI/ML)
Databricks for MLOps Presentation (AI/ML)
 
vodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applicationsvodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applications
 
artificggggggggggggggialintelligence.pdf
artificggggggggggggggialintelligence.pdfartificggggggggggggggialintelligence.pdf
artificggggggggggggggialintelligence.pdf
 
Week 3 data journey and data storage
Week 3   data journey and data storageWeek 3   data journey and data storage
Week 3 data journey and data storage
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Operationalizing analytics to scale
Operationalizing analytics to scaleOperationalizing analytics to scale
Operationalizing analytics to scale
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 

Mais de Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 

Mais de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Último

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 

Último (20)

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 

Model Monitoring at Scale with Apache Spark and Verta

  • 1. Model Monitoring at Scale with Apache Spark and Verta Manasi Vartak, Ph.D. Founder and CEO, Verta Inc www.verta.ai | @DataCereal
  • 2. About 2 https://github.com/VertaAI/modeldb - Ph.D. thesis at MIT CSAIL on model management and diagnosis - Created ModelDB: Open-source ML model management & versioning - Released at Spark Summit 2017! - ML @ Twitter, Google, Facebook https://www.verta.ai/product - End-to-end MLOps platform for ML model delivery, operations and monitoring - Serving models for some of the top tech cos, finance, insurance, etc.
  • 3. Agenda ▴ Why Model Monitoring? ▴ What is Model Monitoring? ▴ Generalized Framework for Model Monitoring ▴ Monitoring at scale with Apache Spark ▴ Wrap up 3
  • 4. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  • 5. ML Models are used across all functions
  • 6. AI-ML doesn’t always work as expected ...models used to predict delinquencies suddenly stopped working as the pandemic hit since data used to build them was simply no longer relevant. -- Head of Consumer Banking, Top US Bank https://www.globalbankingandfinance.com/a-framew ork-for-analytics-operational-risk-management/
  • 7. What we are hearing in the field 7 Our ad-serving system saw a revenue loss of $20K in 10 minutes and we had no idea why that happened. We had to dig through all kinds of logs to piece together what had happened Our model results are used to make automated pricing decisions. So preempting bad model predictions can save us millions of dollars Engineers with minimal ML expertise consume these models, so they are black-boxes to them. We try our best to tell the product team when something is wrong with the model, but that’s really hard to do Head of DS, US Ad-tech Company ML Team Manager, Silicon Valley Unicorn Top-5 US E-Commerce Retailer
  • 8. 8 How do we solve these problems? Enter, Model Monitoring.
  • 9. What is Model Monitoring? 9
  • 10. ▴ Know when models are failing ▴ Quickly find the root cause ▴ Close the loop by fast recovery 10 Ensuring model results are consistently of high quality *We refer to all latency, throughput etc. as model service health
  • 11. I. How can we know when a model fails? 11 Model Input Output Ground- truth Model Input Output Ground- truth 30 days Feedback is not instantaneous Featurized Data (train) Input (train) Featurized Data (test) vs. Input (test) vs. Output (train) Output (test) vs.
  • 12. II. How can we find the root cause of model failures? 12 DB1 DB2 ETL1 ETL3 Model1 Pred1 DB3 ETL4 ETL5 Model2 Pred2 ?? DB4 ETL6 Model3 Pred3 ?? ETL2 DB2 ETL7 ETL8
  • 13. III. How can we close the loop for fast recovery? ▴ Know the problem before it happens so you can take action ○ E.g., Missing feature? Impute or fall back to a different model ○ E.g., Set alerts on upstream data so that defects do not propagate downstream ▴ Close the loop by integrating into rest of ML pipeline ○ Re-train model ○ Send data to labeling software ○ Fall back to previous version of the model 13
  • 14. What’s the alternative? 14 M input output Logs Analysis Pipeline Monitoring X 100 X 100 ▴ Custom analysis pipelines for each model type (maintenance burden) ▴ Difficult to get a global view (vs. per-model view) required for root cause analysis ▴ Takes >quarter to get something basic set up
  • 15. Challenges with ML Monitoring ▴ Measurement. Measuring quality in absence of ground-truth is challenging ▴ Customization. Quality metrics are unique to each model type and domain ▴ Pipeline Jungles. Convoluted model lineage and data pipelines make root cause analysis extremely hard ▴ Accessibility. For non-experts to consume models, monitoring must be easy to plug in and interpret ▴ Scale. Must scale to large datasets, large number of statistics, and to live+batch inference 15
  • 16. Introducing a Generalized Framework for Model Monitoring 16
  • 17. Goals ▴ Make it flexible ○ Monitor models running on any serving platform, any ML framework ○ Monitor data pipeline, batch and live models ▴ Make it customizable ○ Use out of box statistics, or ○ Define your own custom functions and statistical properties to monitor & visualize ▴ Close the loop ○ Automate recovery and alert resolution process 17
  • 18. How does it work? 18 ... Ground truth Data/Model Pipelines Remediation - Retrain - Rollback - Human loop Models (Batch, Live) Take automated actions Get notified Get insights, visualize, debug Configure profilers, alerts Ingest ground-truth Ingest input, output Ingest data
  • 19. Data1 Data1 SummarySamples + Metadata SummarySamples + Metadata How does data ingest work? 19 Data1 Profiler1 SummarySamples + Metadata SummarySamples + Metadata SummarySamples + Metadata ProfilerN SummarySamples + Metadata ... Summary1 SummaryN ... Data2 DataN ... Profiler1 ... Profiler2 ... ...
  • 21. Demo: Monitoring Spark ML Pipelines 21
  • 22. Demo Setup ▴ Batch prediction pipeline w/Spark ▴ New data arrives daily 22 DB1 DB2 ETL1 ETL2 Model Pred DB1 DB2 ETL1 ETL2 Model Pred DB1 DB2 ETL1 ETL2 Model Pred ??
  • 24. But what if? ▴ 3 interconnected pipelines w/model dependencies ▴ What happens when DB2 is broken? ▴ What happens when ETL4 is broken? 24 DB1 DB2 ETL1 ETL2 Model1 Pred1 DB3 ETL3 ETL4 Model 2 Pred2 ?? DB4 ETL5 Model 3 Pred3 ??
  • 25. Summary ▴ ML Models drive key user experiences and business decisions ▴ Model Monitoring ensures model results are consistently of high quality ▴ When done right, Model Monitoring can: ○ Save $20K in 10 mins ○ Identify failing models before social media does! ○ Safely democratize AI 25
  • 26. 26 Thank you. Intrigued? Check out: https://monitoring.verta.ai