For any organization whose core product or business depends on ML models (think Slack search, Twitter feed ranking, or Tesla Autopilot), ensuring that production ML models are performing with high efficacy is crucial. In fact, according to the McKinsey report on model risk, defective models have led to revenue losses of hundreds of millions of dollars in the financial sector alone. However, in spite of the significant harms of defective models, tools to detect and remedy model performance issues for production ML models are missing.
Based on our experience building ML debugging and robustness tools at MIT CSAIL and managing large-scale model inference services at Twitter, Nvidia, and now at Verta, we developed a generalized model monitoring framework that can monitor a wide variety of ML models, work unchanged in batch and real-time inference scenarios, and scale to millions of inference requests. In this talk, we focus on how this framework applies to monitoring ML inference workflows built on top of Apache Spark and Databricks. We describe how we can supplement the massively scalable data processing capabilities of these platforms with statistical processors to support the monitoring and debugging of ML models.
Learn how ML Monitoring is fundamentally different from application performance monitoring or data monitoring. Understand what model monitoring must achieve for batch and real-time model serving use cases. Then dig in with us as we focus on the batch prediction use case for model scoring and demonstrate how we can leverage the core Apache Spark engine to easily monitor model performance and identify errors in serving pipelines.
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Model Monitoring at Scale with Apache Spark and Verta
1. Model Monitoring at Scale with Apache
Spark and Verta
Manasi Vartak, Ph.D.
Founder and CEO, Verta Inc
www.verta.ai | @DataCereal
2. About
2
https://github.com/VertaAI/modeldb
- Ph.D. thesis at MIT CSAIL on model
management and diagnosis
- Created ModelDB: Open-source ML
model management & versioning
- Released at Spark Summit 2017!
- ML @ Twitter, Google, Facebook
https://www.verta.ai/product
- End-to-end MLOps platform for ML model
delivery, operations and monitoring
- Serving models for some of the top tech cos,
finance, insurance, etc.
3. Agenda
▴ Why Model Monitoring?
▴ What is Model Monitoring?
▴ Generalized Framework for Model Monitoring
▴ Monitoring at scale with Apache Spark
▴ Wrap up
3
6. AI-ML doesn’t always work as expected
...models used to predict
delinquencies suddenly stopped
working as the pandemic hit
since data used to build them was
simply no longer relevant.
-- Head of Consumer
Banking, Top US Bank
https://www.globalbankingandfinance.com/a-framew
ork-for-analytics-operational-risk-management/
7. What we are hearing in the field
7
Our ad-serving system saw a
revenue loss of $20K in 10
minutes and we had no idea
why that happened. We had to dig
through all kinds of logs to piece
together what had happened
Our model results are used to
make automated pricing
decisions. So preempting bad
model predictions can
save us millions of dollars
Engineers with minimal ML expertise
consume these models, so they are
black-boxes to them. We try our
best to tell the product team
when something is wrong
with the model, but that’s really
hard to do
Head of DS,
US Ad-tech Company
ML Team Manager,
Silicon Valley Unicorn
Top-5 US
E-Commerce Retailer
8. 8
How do we solve these problems?
Enter, Model Monitoring.
10. ▴ Know when models are failing
▴ Quickly find the root cause
▴ Close the loop by fast recovery
10
Ensuring model results are
consistently of high quality
*We refer to all latency, throughput etc. as model service health
11. I. How can we know when a model fails?
11
Model
Input Output
Ground-
truth
Model
Input Output
Ground-
truth
30 days
Feedback is not
instantaneous
Featurized
Data
(train)
Input
(train)
Featurized
Data
(test)
vs.
Input
(test)
vs.
Output
(train)
Output
(test)
vs.
12. II. How can we find the root cause of model failures?
12
DB1
DB2
ETL1 ETL3 Model1 Pred1
DB3
ETL4 ETL5 Model2 Pred2 ??
DB4
ETL6 Model3 Pred3
??
ETL2
DB2
ETL7 ETL8
13. III. How can we close the loop for fast recovery?
▴ Know the problem before it happens so you can take action
○ E.g., Missing feature? Impute or fall back to a different model
○ E.g., Set alerts on upstream data so that defects do not propagate downstream
▴ Close the loop by integrating into rest of ML pipeline
○ Re-train model
○ Send data to labeling software
○ Fall back to previous version of the model
13
15. Challenges with ML Monitoring
▴ Measurement. Measuring quality in absence of ground-truth is challenging
▴ Customization. Quality metrics are unique to each model type and domain
▴ Pipeline Jungles. Convoluted model lineage and data pipelines make root cause
analysis extremely hard
▴ Accessibility. For non-experts to consume models, monitoring must be easy to plug
in and interpret
▴ Scale. Must scale to large datasets, large number of statistics, and to live+batch
inference
15
17. Goals
▴ Make it flexible
○ Monitor models running on any serving platform, any ML framework
○ Monitor data pipeline, batch and live models
▴ Make it customizable
○ Use out of box statistics, or
○ Define your own custom functions and statistical properties to monitor & visualize
▴ Close the loop
○ Automate recovery and alert resolution process
17
18. How does it work?
18
...
Ground truth
Data/Model
Pipelines
Remediation
- Retrain
- Rollback
- Human loop
Models
(Batch, Live)
Take automated
actions
Get notified
Get insights,
visualize, debug
Configure
profilers, alerts
Ingest ground-truth
Ingest input, output
Ingest data
22. Demo Setup
▴ Batch prediction pipeline w/Spark
▴ New data arrives daily
22
DB1
DB2
ETL1 ETL2 Model Pred
DB1
DB2
ETL1 ETL2 Model Pred
DB1
DB2
ETL1 ETL2 Model Pred
??
24. But what if?
▴ 3 interconnected pipelines w/model
dependencies
▴ What happens when DB2 is broken?
▴ What happens when ETL4 is broken?
24
DB1
DB2
ETL1 ETL2 Model1 Pred1
DB3
ETL3 ETL4
Model
2
Pred2
??
DB4
ETL5
Model
3
Pred3
??
25. Summary
▴ ML Models drive key user experiences and business decisions
▴ Model Monitoring ensures model results are consistently of high quality
▴ When done right, Model Monitoring can:
○ Save $20K in 10 mins
○ Identify failing models before social media does!
○ Safely democratize AI
25