Model Monitoring at Scale with Apache Spark and Verta

Model Monitoring at Scale with Apache
Spark and Verta
Manasi Vartak, Ph.D.
Founder and CEO, Verta Inc
www.verta.ai | @DataCereal

About
2
https://github.com/VertaAI/modeldb
- Ph.D. thesis at MIT CSAIL on model
management and diagnosis
- Created ModelDB: Open-source ML
model management & versioning
- Released at Spark Summit 2017!
- ML @ Twitter, Google, Facebook
https://www.verta.ai/product
- End-to-end MLOps platform for ML model
delivery, operations and monitoring
- Serving models for some of the top tech cos,
ﬁnance, insurance, etc.

Agenda
▴ Why Model Monitoring?
▴ What is Model Monitoring?
▴ Generalized Framework for Model Monitoring
▴ Monitoring at scale with Apache Spark
▴ Wrap up
3

Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

ML Models are used across all functions

AI-ML doesn’t always work as expected
...models used to predict
delinquencies suddenly stopped
working as the pandemic hit
since data used to build them was
simply no longer relevant.
-- Head of Consumer
Banking, Top US Bank
https://www.globalbankingandfinance.com/a-framew
ork-for-analytics-operational-risk-management/

What we are hearing in the ﬁeld
7
Our ad-serving system saw a
revenue loss of $20K in 10
minutes and we had no idea
why that happened. We had to dig
through all kinds of logs to piece
together what had happened
Our model results are used to
make automated pricing
decisions. So preempting bad
model predictions can
save us millions of dollars
Engineers with minimal ML expertise
consume these models, so they are
black-boxes to them. We try our
best to tell the product team
when something is wrong
with the model, but that’s really
hard to do
Head of DS,
US Ad-tech Company
ML Team Manager,
Silicon Valley Unicorn
Top-5 US
E-Commerce Retailer

8
How do we solve these problems?
Enter, Model Monitoring.

▴ Know when models are failing
▴ Quickly ﬁnd the root cause
▴ Close the loop by fast recovery
10
Ensuring model results are
consistently of high quality
*We refer to all latency, throughput etc. as model service health

I. How can we know when a model fails?
11
Model
Input Output
Ground-
truth
Model
Input Output
Ground-
truth
30 days
Feedback is not
instantaneous
Featurized
Data
(train)
Input
(train)
Featurized
Data
(test)
vs.
Input
(test)
vs.
Output
(train)
Output
(test)
vs.

II. How can we ﬁnd the root cause of model failures?
12
DB1
DB2
ETL1 ETL3 Model1 Pred1
DB3
ETL4 ETL5 Model2 Pred2 ??
DB4
ETL6 Model3 Pred3
??
ETL2
DB2
ETL7 ETL8

III. How can we close the loop for fast recovery?
▴ Know the problem before it happens so you can take action
￮ E.g., Missing feature? Impute or fall back to a different model
￮ E.g., Set alerts on upstream data so that defects do not propagate downstream
▴ Close the loop by integrating into rest of ML pipeline
￮ Re-train model
￮ Send data to labeling software
￮ Fall back to previous version of the model
13

What’s the alternative?
14
M
input
output
Logs
Analysis
Pipeline
Monitoring
X 100
X 100
▴ Custom analysis pipelines for each
model type (maintenance burden)
▴ Difﬁcult to get a global view (vs.
per-model view) required for root
cause analysis
▴ Takes >quarter to get something
basic set up

Challenges with ML Monitoring
▴ Measurement. Measuring quality in absence of ground-truth is challenging
▴ Customization. Quality metrics are unique to each model type and domain
▴ Pipeline Jungles. Convoluted model lineage and data pipelines make root cause
analysis extremely hard
▴ Accessibility. For non-experts to consume models, monitoring must be easy to plug
in and interpret
▴ Scale. Must scale to large datasets, large number of statistics, and to live+batch
inference
15

Introducing a Generalized
Framework for Model
Monitoring
16

Goals
▴ Make it ﬂexible
￮ Monitor models running on any serving platform, any ML framework
￮ Monitor data pipeline, batch and live models
▴ Make it customizable
￮ Use out of box statistics, or
￮ Deﬁne your own custom functions and statistical properties to monitor & visualize
▴ Close the loop
￮ Automate recovery and alert resolution process
17

How does it work?
18
...
Ground truth
Data/Model
Pipelines
Remediation
- Retrain
- Rollback
- Human loop
Models
(Batch, Live)
Take automated
actions
Get notified
Get insights,
visualize, debug
Configure
profilers, alerts
Ingest ground-truth
Ingest input, output
Ingest data

Data1
Data1
SummarySamples
+ Metadata
SummarySamples
+ Metadata
How does data ingest work?
19
Data1 Profiler1
SummarySamples
+ Metadata
SummarySamples
+ Metadata
SummarySamples
+ Metadata
ProfilerN
SummarySamples
+ Metadata
...
Summary1
SummaryN
...
Data2
DataN
...
Profiler1 ...
Profiler2 ...
...

summary.enable_live(profiler)
20
But what about real-time?

Demo: Monitoring Spark ML
Pipelines
21

Demo Setup
▴ Batch prediction pipeline w/Spark
▴ New data arrives daily
22
DB1
DB2
ETL1 ETL2 Model Pred
DB1
DB2
DB1
DB2
??

Demo Setup
23
CSV
StringIndexer
(0)
GBDT
(4)
Pred
StringIndexer
(1)
StringIndexer
(2)
VectorAssembler
(3)

But what if?
▴ 3 interconnected pipelines w/model
dependencies
▴ What happens when DB2 is broken?
▴ What happens when ETL4 is broken?
24
DB1
DB2
ETL1 ETL2 Model1 Pred1
DB3
ETL3 ETL4
Model
2
Pred2
??
DB4
ETL5
Model
3
Pred3
??

Summary
▴ ML Models drive key user experiences and business decisions
▴ Model Monitoring ensures model results are consistently of high quality
▴ When done right, Model Monitoring can:
￮ Save $20K in 10 mins
￮ Identify failing models before social media does!
￮ Safely democratize AI
25

26
Thank you.
Intrigued? Check out: https://monitoring.verta.ai

Model Monitoring at Scale with Apache Spark and Verta

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Model Monitoring at Scale with Apache Spark and Verta

Semelhante a Model Monitoring at Scale with Apache Spark and Verta (20)

Mais de Databricks

Mais de Databricks (20)

Último

Último (20)

Model Monitoring at Scale with Apache Spark and Verta