SlideShare a Scribd company logo
1 of 38
Download to read offline
Cloud-Native MLOps
Framework
Data Fest 2021
Artem Koval
Big Data and Machine Learning Practice Lead at ClearScale
About Speaker
● Hey all!
● Name: Artem Koval
● Position: Big Data and Machine Learning
Practice Lead
● Company: ClearScale
Agenda
● What is modern MLOps
● Why the shift towards Human-Centered AI
● Fairness, Explainability, Model Monitoring
● Human Augmented AI
● How much MLOps do you need in your organization
● The future
What is MLOps?
● https://en.wikipedia.org/wiki/MLOps
● https://ml-ops.org/
● A process of deploying ML models in CI/CD manner into production,
establishing model monitoring, explainability, fairness, and providing tools for
human intervention
CRISP-DM
● https://en.wikipedia.org
/wiki/Cross-industry_st
andard_process_for_d
ata_mining
● Too generic
Why a Framework
● A need for the end-to-end solution, from the data ingestion to the model
monitoring, data labeling, algorithms explainability
An elegant weapon for a more civilized age (c)
● Your father’s ML Pipeline
ML has Technical Debt?
● Hidden Debt in Machine Learning Systems
The House of MLOps
Human-Centered AI
● https://hai.stanford.edu/
● https://plato.stanford.edu/entries/ethics-ai/
● https://ethical.institute/
● Humans must control AI end-to-end solutions
Cloud-Native MLOps
Modern MLOps Framework Drivers
● Not only CI/CD and ML code anymore
● Fairness and Explainability
● Observability (Monitoring)
● Scalability (Training and Inference)
● Data Labeling
● A/B Testing, Acceptance Testing
● Human Review
● Legacy Migration
● Multi-tenant Multi-model
Fairness
● https://github.com/slundberg/shap
● Regulatory requirements
● Business trust
Explainability
● https://github.com/Trusted-AI/AIF360
● No bias in data, no bias in inference (gender, racial, religious, ageism etc.)
● Fairness and Explainability by Design as a Process
Monitor Data Quality
● Monitors ML models in production and notifies when data quality issue arise
● Enable data capture (inference input & output, historical data)
● Create a baseline (https://github.com/awslabs/deequ)
● Define and schedule data quality monitoring jobs
● View data quality metrics/violations
● Integrate data quality monitoring with a Notification Service
● Interpret the results of a monitoring job
● Visualize results
Data Quality Violations/Metrics
● data_type_check
● completeness_check
● baseline_drift_check
● missing_column_check
● extra_column_check
● categorical_values_check
● Max, Min, Sum, SampleCount, Average, Distribution, StdDev, Mean
● ...
Monitor Model Quality
● Monitors the performance of a model by comparing the live predictions with
the actual ground truth labels
● Enable Data Capture
● Create a baseline
● Define and schedule model quality monitoring jobs
● Ingest ground truth labels that model monitor merges with captured prediction
data from real-time/batch inference endpoints
● Integrate model quality monitoring with a Notification Service
● Interpret the results of a monitoring job
● Visualize the results
Model Quality Metrics
● Regression: mae, mse, rmse, r2, ...
● Binary classification: confusion matrix, recall, precision, accuracy,
recall_best_constant_classifier, precision_best_constant_classifier,
accuracy_best_constant_classifier, true_positive_rate, …
● Multiclass classification: weighted_recall, weighted_f1,
weighted_f2_best_constant_classifier, ...
Monitor Bias Drift
● https://github.com/aws/amazon-sagemaker-clarify
● https://github.com/anodot/MLWatcher
● Training data differs from the live inference data
● Pre-training/post-training/common
Bias Metrics
● Class Imbalance (CI)
● Difference in Positive Proportions in Labels (DPL)
● Kullback-Liebler Divergence (KL)
● Jensen-Shannon Divergence (JS)
● Total Variation Distance (TVD)
● Kolmogorov-Smirnov Distance (KS)
● Conditional Demographic Disparity in Labels (CDDL)
● Difference in Conditional Outcomes (DCO)
● Difference in Label Rates (DLR)
● ...
Monitor Feature Attribution Drift
● https://github.com/slundberg/shap
● A drift in the distribution of live data for models in production can result in a
corresponding drift in the feature attribution values
Feature Attribution Drift Monitoring Methods
● LIME
● Shapley sampling values
● DeepLIFT
● QII
● Layer-wise relevance propagation
● Shapley regression values
● Tree interpreter
Human Augmented AI Drivers
● Need human oversight to ensure accuracy with sensitive data (healthcare,
finance)
● Implement human review of ML predictions
● Integrate human oversight with any application
● Flexibility to work with inside and outside reviewers
● Easy instructions for reviewers
● Workflows to simplify the human review process
● Improve results with multiple reviews
Human Augmented AI
Human Augmented Ground Truth Labeling Drivers
● Improve data label accuracy
● Easy to use (automatic snapping, image denoising, pre-selecting object
contour, etc.)
● Reduce costs
● Distribute workload over varying workforce
Human Augmented Ground Truth Labeling
MLOps Levels
● Lightweight MLOps
● Cloud-Native Greenfield SMB MLOps
● Enterprise MLOps
● Human-Centered AI
Lightweight MLOps Capacity
● One-person data science shop
● Small number of models (1-3)
● ML system is a greenfield
● Need to run time-critical demo for a small audience
● Models are custom, lightweight, don’t require compute-intensive model
training/HPO
● Low traffic is expected
Lightweight MLOps Solution Blueprint
● Convert models with TensorFlow Lite (or other framework-specific strip-down)
● Write a simple API microservice (e.g., Flask)
● Deploy as is, with no containerization, as a web app calling ML layer
● Use CPU-based commodity cloud instances
● Minimal model monitoring to at least capture drift
● Data analysis, feature engineering, orchestration, CI/CD, acceptance/AB
testing might be omitted
● Bootstrapping is highly needed to organize the process (e.g. Metaflow)
Cloud-Native SMB MLOps Capacity
● Have engineering resource
● Custom proprietary algorithms
● ML system is a greenfield
● Model development requires advance comput for training/HPO and inference
● Multi-model, multi tenant setup is needed
Cloud-Native SMB MLOps Solution Blueprint
● Containerize models (Docker)
● Utilize framework/cloud vendor specific HPO approaches
● Use GPU-based commodity cloud instances when needed
● Use cloud vendor specific elastic inference approaches
● Abstract and isolate data analysis, feature engineering, model training and
other steps
● Orchestrate with Apache Airflow or similar technology agnostic tools
● Ensure multi-tenancy by logical isolation of ML Workflows
● Implement model monitoring at least partially (bias drift, model quality)
Enterprise MLOps Capacity
● Have legacy ML system with a lot of microservices, models, orchestration
flows
● Have highly custom proprietary libraries requiring complex make
● Have advanced tenant isolation requirements
● Have a lot of models (>10)
● Have advanced needs for a large data science team to collaborate
Enterprise MLOps Blueprint
● Serve dockerized models with Kubeflow in a Kubernetes cluster
● Use Kubeflow tenancy isolation
● Use KFServing to deploy multiple variants of multiple models
● Use Katib for HPO
● Use Prometheus + Grafana, ELK for the full model monitoring, consuming
metrics with the open-source empowered microservices (SHAP, etc.)
● Implement advanced production acceptance testing (e.g., Differential Testing,
Shadow Deployments, Integration Testing etc.)
● Built custom human augmented Review/Labeling tools
Human-Centered AI Blueprint
● Can be added at any size/project configuration
● Ideally should be incorporated as a process touch all steps (data analysis,
training, deployment, monitoring)
● Remember: the moment your model is deployed to production it’s already
obsolete. Build with the CI/CD and human operations review in mind
The future
● Privacy-Preserving Machine Learning (differential, compressive, etc.)
● Models interpretability (global, local, saliency mapping, semantic similarity
etc.)
● Model Monitoring in AutoML (AutoKeras/Keras Tuner + SHAP, etc.)
● Measuring human augmentation (uncertainty/diversity sampling, active
learning, quality control, annotation/augmentation quality metrics, etc.)
Thanks everyone!
Questions?
The End
You could reach me via mail@artemkoval.com or LinkedIn
May the MLOps be with you!
Extra Resources
● https://github.com/EthicalML/awesome-production-machine-learning
● https://aws.amazon.com/solutions/implementations/aws-mlops-framework/
● https://cloud.google.com/architecture/mlops-continuous-delivery-and-automati
on-pipelines-in-machine-learning
● https://azure.microsoft.com/en-us/services/machine-learning/mlops/
● https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html
● https://github.com/visenger/awesome-mlops#mlops-books

More Related Content

Similar to Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf

Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learningRajesh Muppalla
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...DataScienceConferenc1
 
Deploying ML models in the enterprise
Deploying ML models in the enterpriseDeploying ML models in the enterprise
Deploying ML models in the enterprisedoppenhe
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Sotrender
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabszekeLabs Technologies
 
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)Senturus
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOpsRui Quintino
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxKnoldus Inc.
 
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...DataScienceConferenc1
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro sessionAvinash Patil
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorchgeetachauhan
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Tushar Katarki
 

Similar to Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf (20)

Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Continuous delivery for machine learning
Continuous delivery for machine learningContinuous delivery for machine learning
Continuous delivery for machine learning
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
 
Deploying ML models in the enterprise
Deploying ML models in the enterpriseDeploying ML models in the enterprise
Deploying ML models in the enterprise
 
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
Trenowanie i wdrażanie modeli uczenia maszynowego z wykorzystaniem Google Clo...
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
 
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
 
“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps“Houston, we have a model...” Introduction to MLOps
“Houston, we have a model...” Introduction to MLOps
 
MLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptxMLops on Vertex AI Presentation (AI/ML).pptx
MLops on Vertex AI Presentation (AI/ML).pptx
 
Aws autopilot
Aws autopilotAws autopilot
Aws autopilot
 
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...
 
Machine learning
Machine learningMachine learning
Machine learning
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
DevOps Days Rockies MLOps
DevOps Days Rockies MLOpsDevOps Days Rockies MLOps
DevOps Days Rockies MLOps
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 

Recently uploaded

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Recently uploaded (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf

  • 1. Cloud-Native MLOps Framework Data Fest 2021 Artem Koval Big Data and Machine Learning Practice Lead at ClearScale
  • 2. About Speaker ● Hey all! ● Name: Artem Koval ● Position: Big Data and Machine Learning Practice Lead ● Company: ClearScale
  • 3. Agenda ● What is modern MLOps ● Why the shift towards Human-Centered AI ● Fairness, Explainability, Model Monitoring ● Human Augmented AI ● How much MLOps do you need in your organization ● The future
  • 4. What is MLOps? ● https://en.wikipedia.org/wiki/MLOps ● https://ml-ops.org/ ● A process of deploying ML models in CI/CD manner into production, establishing model monitoring, explainability, fairness, and providing tools for human intervention
  • 6. Why a Framework ● A need for the end-to-end solution, from the data ingestion to the model monitoring, data labeling, algorithms explainability
  • 7. An elegant weapon for a more civilized age (c) ● Your father’s ML Pipeline
  • 8. ML has Technical Debt? ● Hidden Debt in Machine Learning Systems
  • 9. The House of MLOps
  • 10. Human-Centered AI ● https://hai.stanford.edu/ ● https://plato.stanford.edu/entries/ethics-ai/ ● https://ethical.institute/ ● Humans must control AI end-to-end solutions
  • 12. Modern MLOps Framework Drivers ● Not only CI/CD and ML code anymore ● Fairness and Explainability ● Observability (Monitoring) ● Scalability (Training and Inference) ● Data Labeling ● A/B Testing, Acceptance Testing ● Human Review ● Legacy Migration ● Multi-tenant Multi-model
  • 14. Explainability ● https://github.com/Trusted-AI/AIF360 ● No bias in data, no bias in inference (gender, racial, religious, ageism etc.) ● Fairness and Explainability by Design as a Process
  • 15. Monitor Data Quality ● Monitors ML models in production and notifies when data quality issue arise ● Enable data capture (inference input & output, historical data) ● Create a baseline (https://github.com/awslabs/deequ) ● Define and schedule data quality monitoring jobs ● View data quality metrics/violations ● Integrate data quality monitoring with a Notification Service ● Interpret the results of a monitoring job ● Visualize results
  • 16. Data Quality Violations/Metrics ● data_type_check ● completeness_check ● baseline_drift_check ● missing_column_check ● extra_column_check ● categorical_values_check ● Max, Min, Sum, SampleCount, Average, Distribution, StdDev, Mean ● ...
  • 17. Monitor Model Quality ● Monitors the performance of a model by comparing the live predictions with the actual ground truth labels ● Enable Data Capture ● Create a baseline ● Define and schedule model quality monitoring jobs ● Ingest ground truth labels that model monitor merges with captured prediction data from real-time/batch inference endpoints ● Integrate model quality monitoring with a Notification Service ● Interpret the results of a monitoring job ● Visualize the results
  • 18. Model Quality Metrics ● Regression: mae, mse, rmse, r2, ... ● Binary classification: confusion matrix, recall, precision, accuracy, recall_best_constant_classifier, precision_best_constant_classifier, accuracy_best_constant_classifier, true_positive_rate, … ● Multiclass classification: weighted_recall, weighted_f1, weighted_f2_best_constant_classifier, ...
  • 19. Monitor Bias Drift ● https://github.com/aws/amazon-sagemaker-clarify ● https://github.com/anodot/MLWatcher ● Training data differs from the live inference data ● Pre-training/post-training/common
  • 20. Bias Metrics ● Class Imbalance (CI) ● Difference in Positive Proportions in Labels (DPL) ● Kullback-Liebler Divergence (KL) ● Jensen-Shannon Divergence (JS) ● Total Variation Distance (TVD) ● Kolmogorov-Smirnov Distance (KS) ● Conditional Demographic Disparity in Labels (CDDL) ● Difference in Conditional Outcomes (DCO) ● Difference in Label Rates (DLR) ● ...
  • 21. Monitor Feature Attribution Drift ● https://github.com/slundberg/shap ● A drift in the distribution of live data for models in production can result in a corresponding drift in the feature attribution values
  • 22. Feature Attribution Drift Monitoring Methods ● LIME ● Shapley sampling values ● DeepLIFT ● QII ● Layer-wise relevance propagation ● Shapley regression values ● Tree interpreter
  • 23. Human Augmented AI Drivers ● Need human oversight to ensure accuracy with sensitive data (healthcare, finance) ● Implement human review of ML predictions ● Integrate human oversight with any application ● Flexibility to work with inside and outside reviewers ● Easy instructions for reviewers ● Workflows to simplify the human review process ● Improve results with multiple reviews
  • 25. Human Augmented Ground Truth Labeling Drivers ● Improve data label accuracy ● Easy to use (automatic snapping, image denoising, pre-selecting object contour, etc.) ● Reduce costs ● Distribute workload over varying workforce
  • 26. Human Augmented Ground Truth Labeling
  • 27. MLOps Levels ● Lightweight MLOps ● Cloud-Native Greenfield SMB MLOps ● Enterprise MLOps ● Human-Centered AI
  • 28. Lightweight MLOps Capacity ● One-person data science shop ● Small number of models (1-3) ● ML system is a greenfield ● Need to run time-critical demo for a small audience ● Models are custom, lightweight, don’t require compute-intensive model training/HPO ● Low traffic is expected
  • 29. Lightweight MLOps Solution Blueprint ● Convert models with TensorFlow Lite (or other framework-specific strip-down) ● Write a simple API microservice (e.g., Flask) ● Deploy as is, with no containerization, as a web app calling ML layer ● Use CPU-based commodity cloud instances ● Minimal model monitoring to at least capture drift ● Data analysis, feature engineering, orchestration, CI/CD, acceptance/AB testing might be omitted ● Bootstrapping is highly needed to organize the process (e.g. Metaflow)
  • 30. Cloud-Native SMB MLOps Capacity ● Have engineering resource ● Custom proprietary algorithms ● ML system is a greenfield ● Model development requires advance comput for training/HPO and inference ● Multi-model, multi tenant setup is needed
  • 31. Cloud-Native SMB MLOps Solution Blueprint ● Containerize models (Docker) ● Utilize framework/cloud vendor specific HPO approaches ● Use GPU-based commodity cloud instances when needed ● Use cloud vendor specific elastic inference approaches ● Abstract and isolate data analysis, feature engineering, model training and other steps ● Orchestrate with Apache Airflow or similar technology agnostic tools ● Ensure multi-tenancy by logical isolation of ML Workflows ● Implement model monitoring at least partially (bias drift, model quality)
  • 32. Enterprise MLOps Capacity ● Have legacy ML system with a lot of microservices, models, orchestration flows ● Have highly custom proprietary libraries requiring complex make ● Have advanced tenant isolation requirements ● Have a lot of models (>10) ● Have advanced needs for a large data science team to collaborate
  • 33. Enterprise MLOps Blueprint ● Serve dockerized models with Kubeflow in a Kubernetes cluster ● Use Kubeflow tenancy isolation ● Use KFServing to deploy multiple variants of multiple models ● Use Katib for HPO ● Use Prometheus + Grafana, ELK for the full model monitoring, consuming metrics with the open-source empowered microservices (SHAP, etc.) ● Implement advanced production acceptance testing (e.g., Differential Testing, Shadow Deployments, Integration Testing etc.) ● Built custom human augmented Review/Labeling tools
  • 34. Human-Centered AI Blueprint ● Can be added at any size/project configuration ● Ideally should be incorporated as a process touch all steps (data analysis, training, deployment, monitoring) ● Remember: the moment your model is deployed to production it’s already obsolete. Build with the CI/CD and human operations review in mind
  • 35. The future ● Privacy-Preserving Machine Learning (differential, compressive, etc.) ● Models interpretability (global, local, saliency mapping, semantic similarity etc.) ● Model Monitoring in AutoML (AutoKeras/Keras Tuner + SHAP, etc.) ● Measuring human augmentation (uncertainty/diversity sampling, active learning, quality control, annotation/augmentation quality metrics, etc.)
  • 37. The End You could reach me via mail@artemkoval.com or LinkedIn May the MLOps be with you!
  • 38. Extra Resources ● https://github.com/EthicalML/awesome-production-machine-learning ● https://aws.amazon.com/solutions/implementations/aws-mlops-framework/ ● https://cloud.google.com/architecture/mlops-continuous-delivery-and-automati on-pipelines-in-machine-learning ● https://azure.microsoft.com/en-us/services/machine-learning/mlops/ ● https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html ● https://github.com/visenger/awesome-mlops#mlops-books