SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
Monitoring Models in Production
Keeping track of complex models in a complex world
Jannes Klaas
About me
International
Business @ RSM
Financial Economics
@ Oxford Saïd
Course developer
machine learning @
Turing Society
Author “Machine
Learning for
Finance” out in July
ML consultant non-
profits / impact
investors
Prev. Urban Planning
@ IHS Rotterdam &
Destroyer of my
Startup
The life and
times of an
ML
practitioner
“We send you the data,
you send us back a model,
then we take it from there”
– Consulting Clients
“Define an approach,
evaluate on common
benchmark and publish” –
Academia
Repeat after
me
It is not done after we ship
It is not done after we ship
It is not done after we ship
It is not done after we ship
It is not done after we ship
It is not done after we ship
It is not done after we ship
It is not done after we ship
Machine
learning 101
Estimate some function y = f(x)
using (x,y) pairs
Estimated function hopefully
represents the true relationship
between x and y
Model is function of data
Problems you encounter in
production
• The world changes, your training data might
no longer depict the real world
• Your model inputs might change
• There might be unintended bugs and side
effects in complex models
• Models influence the world the try to model
• Model decay: Your model usually becomes
worse over time
Are models
a liability
after
shipping?
No, the real world is the perfect
training environment
Datasets are only an
approximation of the real world
Active learning on real world
examples can greatly reduce
your data needs
Online learning
• Update model continuously as new data streams
in
• Good if you have continuous stream of ground
truth as well
• Needs more monitoring to ensure model does
not go off track
• Can be expensive for big models
• Might need separate training / inference
hardware
Active
learning
Make predictions
Request labels for low confidence
examples
Train on those ‘special cases’
Production is an opportunity for
learning
Monitoring is part of training
Model monitoring vs Ops monitoring
• Model monitoring models model behavior
• Inherently stochastic
• Can be driven by user behavior
• Almost certainly looking for unknown unknowns
• Few established guidelines on what to monitor
Monitoring inputs
•E.g. images arriving at model very small, very dark, high contrast, etc.
More similar to ops monitoring as there can be obvious failures
•Means
•Standard deviations
•Correlations
•KL Divergence between Training & Live data
Monitor classic stats, compare to training data
Output
monitoring
Harder, people might just upload more
plane images one day
Monitoring prediction distribution
surprisingly helpful
Monitor confidence (highest
probability – lowest probability)
Compare against other model
predictions
Compare against ground truth
Ground truth
• In absence of a ground truth signal, ground truth
needs to be established manually
• Can be done by data scientists themselves with good
UI design
• Yields extra insights ‘Our model does worse when
Instagram filters are applied’ / ‘Many users take
sideways pictures’
• Prioritize low confidence predictions for active
learning
• Sample randomly for monitoring
Implementation Example: Prodigy
Alerting / Monitoring is a
UI/UX problem
• The terms might be very hard to explain or
interpret
• Who here would comfortably write down
the formula for KL Divergence and
explain what it means?
• Key metrics are different depending on use
case
• Non – Datascientists might have to make
decisions based on alerts
Alerting Example
0
5
10
15
20
25
Husky Chihaua Mastif Pug Labrador Poodle Retriever Terrier
Training versus live distribution of dog breeds
Train Live
Alerting Example
• Detected !"#(%&'()| +(,- = 1.56394694
which is out of bounds
• Detected model output distribution
significantly different from training data
• Detected an unexpected amount of
pictures classified as Pugs
Model accountability
• Who made the decision
• Model versioning, all versions need to be retained
• On which grounds was the decision made
• All input data needs to be retained and must be linked to transaction ID
• Why was the decision made
• Use tools like LIME to interpret models
• Still a long way to interpretable deep models, but we are getting there
nth order effects
Societal
impact
Business
Metrics
(Revenue)
User
behavior
(e.g. CTR)
Model
metrics
(Accuracy)
Easy to monitor
Hard to monitor
Small impact
Large impact
Large impact effects…
• … are hard to monitor
• … are not what data scientists are trained for
• … only show with large scale deployment
• … are time delayed
• … are influenced by exogenous factors, too
Monitoring
high order
effects
Users are desperate to improve
your model, let them!
User input is a meta metric
showing how well your model
selection does
Implementation
example
Hosting monitoring sys as
separate microservice
Using flask to serve model
Flask service calls monitor
Alt. client can call monitor
A simple monitoring system with Flask
User Keras + Flask SciKit + Flask Data Scientist
Image
Classification
Image +
Classification Alerts
Transaction DB
Store
transaction
Provide
benchmark
data
Bare Bones Flask Serving
image = flask.request.files["image"].read()
image = prepare_image(image, target=(224, 224))
preds = model.predict(image)
results = decode_predictions(preds)
data["predictions"] = []
for (label, prob) in results[0]:
r = {"label": label, "probability": float(prob)}
data["predictions"].append(r)
data["success"] = True
return flask.jsonify(data)
Statistical monitoring with SciKit
ent = scipy.stats.entropy(pk,qk,base=2)
if ent > threshold:
abs_diff = np.abs(pk-qk)
worst_offender = lookup[np.argmax(abs_diff)]
max_deviation = np.max(abs_diff)
alert(model_id,ent,
worst_offender,max_deviation)
Data science teams should own the whole process
Define
approach
Feature
Engineering
Train
model
Deploy
Monitor
Unsolved challenges
• Model versioning
• Dataset versioning
• Continuous Integration for data scientists
• Communication and understanding of model
metrics in the Org
• Managing higher order effects
Recommended reading
• Sculley et al. (2015) Hidden Technical Debt in Machine Learning
Systems https://papers.nips.cc/paper/5656-hidden-technical-debt-
in-machine-learning-systems.pdf
• Breck et al. (2016) What’s your ML Test Score? A rubric for ML
production systems https://ai.google/research/pubs/pub45742
• How Zendesk Serves TensorFlow Models in Production
https://medium.com/zendesk-engineering/how-zendesk-serves-
tensorflow-models-in-production-751ee22f0f4b
• Machine Learning for Finance ;) https://www.packtpub.com/big-
data-and-business-intelligence/machine-learning-finance

Mais conteúdo relacionado

Mais procurados

Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
 
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohNUS-ISS
 
ML-Ops how to bring your data science to production
ML-Ops  how to bring your data science to productionML-Ops  how to bring your data science to production
ML-Ops how to bring your data science to productionHerman Wu
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOpsMarco Parenzan
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Krishnaram Kenthapadi
 
Build, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at ScaleBuild, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at ScaleAmazon Web Services
 
Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AIBill Liu
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflowDatabricks
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AINing Jiang
 
ML-Ops: Philosophy, Best-Practices and Tools
ML-Ops:Philosophy, Best-Practices and ToolsML-Ops:Philosophy, Best-Practices and Tools
ML-Ops: Philosophy, Best-Practices and ToolsJorge Davila-Chacon
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflowDatabricks
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
Model Monitoring at Scale with Apache Spark and Verta
Model Monitoring at Scale with Apache Spark and VertaModel Monitoring at Scale with Apache Spark and Verta
Model Monitoring at Scale with Apache Spark and VertaDatabricks
 

Mais procurados (20)

MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh
 
ML-Ops how to bring your data science to production
ML-Ops  how to bring your data science to productionML-Ops  how to bring your data science to production
ML-Ops how to bring your data science to production
 
MLOps with Azure DevOps
MLOps with Azure DevOpsMLOps with Azure DevOps
MLOps with Azure DevOps
 
Time series deep learning
Time series   deep learningTime series   deep learning
Time series deep learning
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
Build, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at ScaleBuild, Train & Deploy Machine Learning Models at Scale
Build, Train & Deploy Machine Learning Models at Scale
 
Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AI
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
ML-Ops: Philosophy, Best-Practices and Tools
ML-Ops:Philosophy, Best-Practices and ToolsML-Ops:Philosophy, Best-Practices and Tools
ML-Ops: Philosophy, Best-Practices and Tools
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
Model Monitoring at Scale with Apache Spark and Verta
Model Monitoring at Scale with Apache Spark and VertaModel Monitoring at Scale with Apache Spark and Verta
Model Monitoring at Scale with Apache Spark and Verta
 

Semelhante a Monitoring Models in Production

Enterprise Machine Learning Governance
Enterprise Machine Learning Governance Enterprise Machine Learning Governance
Enterprise Machine Learning Governance Terence Siganakis
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
Unit 1 introduction to simulation
Unit 1 introduction to simulationUnit 1 introduction to simulation
Unit 1 introduction to simulationDevaKumari Vijay
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...Dell World
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Lviv Startup Club
 
Advancing Testing Using Axioms
Advancing Testing Using AxiomsAdvancing Testing Using Axioms
Advancing Testing Using AxiomsSQALab
 
Module_1_Slide_01.pdf
Module_1_Slide_01.pdfModule_1_Slide_01.pdf
Module_1_Slide_01.pdfFazleeKan
 
Machine Learning and Analytics in Splunk
Machine Learning and Analytics in SplunkMachine Learning and Analytics in Splunk
Machine Learning and Analytics in SplunkSplunk
 
Machine learning and big data
Machine learning and big dataMachine learning and big data
Machine learning and big dataPoo Kuan Hoong
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
Introduction to Modelling and Simulation.pptx
Introduction to Modelling and Simulation.pptxIntroduction to Modelling and Simulation.pptx
Introduction to Modelling and Simulation.pptxPortiaMupfumiraTenda
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for TestingSQALab
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
How Will Your ML Project Fail
How Will Your ML Project FailHow Will Your ML Project Fail
How Will Your ML Project FailElena Samuylova
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managersNitin T Bhat
 
Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportPeter Skomoroch
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 
achine Learning and Model Risk
achine Learning and Model Riskachine Learning and Model Risk
achine Learning and Model RiskQuantUniversity
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionSplunk
 

Semelhante a Monitoring Models in Production (20)

Enterprise Machine Learning Governance
Enterprise Machine Learning Governance Enterprise Machine Learning Governance
Enterprise Machine Learning Governance
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Unit 1 introduction to simulation
Unit 1 introduction to simulationUnit 1 introduction to simulation
Unit 1 introduction to simulation
 
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
If You Are Not Embedding Analytics Into Your Day To Day Processes, You Are Do...
 
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
Mykola Mykytenko: MLOps: your way from nonsense to valuable effect (approache...
 
Advancing Testing Using Axioms
Advancing Testing Using AxiomsAdvancing Testing Using Axioms
Advancing Testing Using Axioms
 
Module_1_Slide_01.pdf
Module_1_Slide_01.pdfModule_1_Slide_01.pdf
Module_1_Slide_01.pdf
 
Machine Learning and Analytics in Splunk
Machine Learning and Analytics in SplunkMachine Learning and Analytics in Splunk
Machine Learning and Analytics in Splunk
 
Machine learning and big data
Machine learning and big dataMachine learning and big data
Machine learning and big data
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
Introduction to Modelling and Simulation.pptx
Introduction to Modelling and Simulation.pptxIntroduction to Modelling and Simulation.pptx
Introduction to Modelling and Simulation.pptx
 
A New Model for Testing
A New Model for TestingA New Model for Testing
A New Model for Testing
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
How Will Your ML Project Fail
How Will Your ML Project FailHow Will Your ML Project Fail
How Will Your ML Project Fail
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managers
 
Bridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder SupportBridging the AI Gap: Building Stakeholder Support
Bridging the AI Gap: Building Stakeholder Support
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 
achine Learning and Model Risk
achine Learning and Model Riskachine Learning and Model Risk
achine Learning and Model Risk
 
Machine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout SessionMachine Learning and Analytics Breakout Session
Machine Learning and Analytics Breakout Session
 

Último

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Último (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Monitoring Models in Production

  • 1. Monitoring Models in Production Keeping track of complex models in a complex world Jannes Klaas
  • 2. About me International Business @ RSM Financial Economics @ Oxford Saïd Course developer machine learning @ Turing Society Author “Machine Learning for Finance” out in July ML consultant non- profits / impact investors Prev. Urban Planning @ IHS Rotterdam & Destroyer of my Startup
  • 3. The life and times of an ML practitioner “We send you the data, you send us back a model, then we take it from there” – Consulting Clients “Define an approach, evaluate on common benchmark and publish” – Academia
  • 4. Repeat after me It is not done after we ship It is not done after we ship It is not done after we ship It is not done after we ship It is not done after we ship It is not done after we ship It is not done after we ship It is not done after we ship
  • 5. Machine learning 101 Estimate some function y = f(x) using (x,y) pairs Estimated function hopefully represents the true relationship between x and y Model is function of data
  • 6.
  • 7. Problems you encounter in production • The world changes, your training data might no longer depict the real world • Your model inputs might change • There might be unintended bugs and side effects in complex models • Models influence the world the try to model • Model decay: Your model usually becomes worse over time
  • 8. Are models a liability after shipping? No, the real world is the perfect training environment Datasets are only an approximation of the real world Active learning on real world examples can greatly reduce your data needs
  • 9. Online learning • Update model continuously as new data streams in • Good if you have continuous stream of ground truth as well • Needs more monitoring to ensure model does not go off track • Can be expensive for big models • Might need separate training / inference hardware
  • 10. Active learning Make predictions Request labels for low confidence examples Train on those ‘special cases’ Production is an opportunity for learning Monitoring is part of training
  • 11. Model monitoring vs Ops monitoring • Model monitoring models model behavior • Inherently stochastic • Can be driven by user behavior • Almost certainly looking for unknown unknowns • Few established guidelines on what to monitor
  • 12. Monitoring inputs •E.g. images arriving at model very small, very dark, high contrast, etc. More similar to ops monitoring as there can be obvious failures •Means •Standard deviations •Correlations •KL Divergence between Training & Live data Monitor classic stats, compare to training data
  • 13. Output monitoring Harder, people might just upload more plane images one day Monitoring prediction distribution surprisingly helpful Monitor confidence (highest probability – lowest probability) Compare against other model predictions Compare against ground truth
  • 14. Ground truth • In absence of a ground truth signal, ground truth needs to be established manually • Can be done by data scientists themselves with good UI design • Yields extra insights ‘Our model does worse when Instagram filters are applied’ / ‘Many users take sideways pictures’ • Prioritize low confidence predictions for active learning • Sample randomly for monitoring
  • 16. Alerting / Monitoring is a UI/UX problem • The terms might be very hard to explain or interpret • Who here would comfortably write down the formula for KL Divergence and explain what it means? • Key metrics are different depending on use case • Non – Datascientists might have to make decisions based on alerts
  • 17. Alerting Example 0 5 10 15 20 25 Husky Chihaua Mastif Pug Labrador Poodle Retriever Terrier Training versus live distribution of dog breeds Train Live
  • 18. Alerting Example • Detected !"#(%&'()| +(,- = 1.56394694 which is out of bounds • Detected model output distribution significantly different from training data • Detected an unexpected amount of pictures classified as Pugs
  • 19. Model accountability • Who made the decision • Model versioning, all versions need to be retained • On which grounds was the decision made • All input data needs to be retained and must be linked to transaction ID • Why was the decision made • Use tools like LIME to interpret models • Still a long way to interpretable deep models, but we are getting there
  • 20. nth order effects Societal impact Business Metrics (Revenue) User behavior (e.g. CTR) Model metrics (Accuracy) Easy to monitor Hard to monitor Small impact Large impact
  • 21. Large impact effects… • … are hard to monitor • … are not what data scientists are trained for • … only show with large scale deployment • … are time delayed • … are influenced by exogenous factors, too
  • 22. Monitoring high order effects Users are desperate to improve your model, let them! User input is a meta metric showing how well your model selection does
  • 23. Implementation example Hosting monitoring sys as separate microservice Using flask to serve model Flask service calls monitor Alt. client can call monitor
  • 24. A simple monitoring system with Flask User Keras + Flask SciKit + Flask Data Scientist Image Classification Image + Classification Alerts Transaction DB Store transaction Provide benchmark data
  • 25. Bare Bones Flask Serving image = flask.request.files["image"].read() image = prepare_image(image, target=(224, 224)) preds = model.predict(image) results = decode_predictions(preds) data["predictions"] = [] for (label, prob) in results[0]: r = {"label": label, "probability": float(prob)} data["predictions"].append(r) data["success"] = True return flask.jsonify(data)
  • 26. Statistical monitoring with SciKit ent = scipy.stats.entropy(pk,qk,base=2) if ent > threshold: abs_diff = np.abs(pk-qk) worst_offender = lookup[np.argmax(abs_diff)] max_deviation = np.max(abs_diff) alert(model_id,ent, worst_offender,max_deviation)
  • 27. Data science teams should own the whole process Define approach Feature Engineering Train model Deploy Monitor
  • 28. Unsolved challenges • Model versioning • Dataset versioning • Continuous Integration for data scientists • Communication and understanding of model metrics in the Org • Managing higher order effects
  • 29. Recommended reading • Sculley et al. (2015) Hidden Technical Debt in Machine Learning Systems https://papers.nips.cc/paper/5656-hidden-technical-debt- in-machine-learning-systems.pdf • Breck et al. (2016) What’s your ML Test Score? A rubric for ML production systems https://ai.google/research/pubs/pub45742 • How Zendesk Serves TensorFlow Models in Production https://medium.com/zendesk-engineering/how-zendesk-serves- tensorflow-models-in-production-751ee22f0f4b • Machine Learning for Finance ;) https://www.packtpub.com/big- data-and-business-intelligence/machine-learning-finance