This document outlines a presentation on revamping machine learning pipelines with MLOps. The presentation covers the machine learning lifecycle and challenges with ML productization. It provides examples of end-to-end ML platforms like Uber's Michaelangelo and Google's TFX. The presentation discusses MLOps best practices and methodologies, including build, retrain and release pipelines. It demonstrates MLflow and shows demos of Airflow, Tensorflow model serving, and TFX-based MLOps systems on Google Cloud and Azure.
2. 2
Presented by
Sameer Mahajan
Principal Architect
Sameer Mahajan has 25 years of experience in the
software industry. He has worked for companies
like Microsoft and Symantec across areas like
machine learning, storage, cloud, big data,
networking and analytics in the United States &
India.
Sameer holds 9 US patents and is an alumnus of IIT
Bombay and Georgia Tech. He not only conducts
hands-on workshops and seminars but also
participates in panel discussions in upcoming
technologies like machine learning and big data.
Sameer is one of the mentors for the Machine
Learning Foundations course at Coursera.
3. ⢠Background
⢠ML Lifecycle
⢠Challenges with ML Productization
⢠Examples of end-to-end ML platforms
⢠MLOps Best Practices
⢠MLOps Methodologies
⢠Build, Retrain and Release Pipelines
⢠Mlflow and demo
Agenda
⢠Airflow demo
⢠Model Serving Pipeline
⢠Tensorflow Model Serving
⢠Tensorflow js demo
⢠TFX-based MLOps system on Google Cloud
⢠Azure MLOps
⢠Conclusion
⢠Q & A
4. ⢠ML spend will reach $57.6 billion by 2021
⢠More and more ML systems are going into production
⢠Gartner 2019 Survey suggests that
i. 59% have AI deployed today
ii. The average number of deployed AI projects to increase to 35 by 2022
⢠Streamline ML lifecycle
⢠Machine Learning Operations
⢠Started traction in 2018
Background
6. Process model â Option B
⢠Reality we are
trying to
model
⢠Source of data
Ingest data from
sensors, devices,
databases
⢠Cleanse and
transform data
⢠Signal processing
Visual analytics
to capture trends
indicative of
underlying
model processes
Use the models in
the real world
applications and
processes for
predictions, insights
etc.
Data Engineering
Data
Capturing
Data
Preparation
Data
Visualization
Machine
Learning
Train models that
reflect the real-
world phenomena
InferenceWorld
7. Challenges
⢠Dealing with data, models and code
⢠Deployment and automation
⢠Collaboration : data engineers, data
scientists, ML engineers, business analysts,
operations
⢠Continuous Integration (CI), Deployment
(CD), Training (CT)
⢠Reproducibility of results
⢠Transformations
⢠Hyperparameters
⢠Initializers
⢠Hardware
8. More Challenges
⢠Complex pipelines
1. Ensemble
2. Retraining
3. Transfer learning
4. Multiple prediction pipelines in
parallel (Canary)
⢠Self-updating ML pipelines
⢠Governance : tracing failed result back
to data or code
⢠Scalability
9. Examples of end-to-end ML platforms
1. Uberâs Michaelangelo
2. Facebookâs FBLearner
3. Google has TFX
4. Airbnb has BigHead
5. Databricks introduced mlflow which is now open source
6. Sagemaker
7. Azure
8. Datarobot
9. Polyaxon and KubeFlow
10. ⢠Background
⢠ML Lifecycle
⢠Challenges with ML Productization
⢠Examples of end-to-end ML platforms
⢠MLOps Best Practices
⢠MLOps Methodologies
⢠Build, Retrain and Release Pipelines
⢠Mlflow and demo
Agenda
⢠Airflow demo
⢠Model Serving Pipeline
⢠Tensorflow Model Serving
⢠Tensorflow js demo
⢠TFX-based MLOps system on Google Cloud
⢠Azure MLOps
⢠Conclusion
⢠Q & A
11. Best Practices
⢠Data Pipeline: Discoverable and Accessible Data - data lake, data mesh
⢠Versioned control: github, Data Science Version Control (DVC), mlflow Projects
⢠Data Exploration: Jupyter,
pandas, numpy, seaborn
⢠ML: scikit-learn
⢠CI/CD: Jenkins
⢠Packaging: Docker
⢠Orchestrator: Airflow, Kubernetes
⢠Monitoring: ELK, Prometheus
12. Methodologies
1. Combination of DevOps (CI/CD), Software Engineering and ML
2. ML experiments are captured as runs
3. Each run captures all its steps, its data, parameters, hyper parameters, code,
initializers, model evaluations, artifacts like trained models and business results
after deployment
4. Packaging a model: container
13. Closer look at some pipelines
1. Build pipeline
⢠Triggered on schedule or when new code checked in / data becomes available
⢠Building code and running unit tests
⢠Data tests: schema and distribution conformance
2. Retrain pipeline
⢠Triggered on a schedule or when new data becomes available
⢠Train, evaluate and register model
3. Release pipeline
⢠Triggered every time a new artifact is available
⢠Package, test, deploy to production, start monitoring
14. ⢠Background
⢠ML Lifecycle
⢠Challenges with ML Productization
⢠Examples of end-to-end ML platforms
⢠MLOps Best Practices
⢠MLOps Methodologies
⢠Build, Retrain and Release Pipelines
⢠Mlflow and demo
Agenda
⢠Airflow demo
⢠Model Serving Pipeline
⢠Tensorflow Model Serving
⢠Tensorflow js demo
⢠TFX-based MLOps system on Google Cloud
⢠Azure MLOps
⢠Conclusion
⢠Q & A
15. MLflow Tracking
Record and query
experiments: code, data,
config, and results
mlflow
MLflow Projects
Package data science code in
a format to reproduce runs on
any platform
MLflow Models
Deploy machine learning
models in diverse serving
environments
MLflow Registry
Store, annotate, discover,
and manage models in a
central repository
18. Model serving
Embedded model
1. Serialized pickle file
2. Language agnostic exchange formats like PMML, PFA and ONNX
3. H2O exports a POJO in a JAR
Separate service
1. Cloud providersâ tools and SDKs wrapping models
2. Kubeflow
3. mlflow models
Published as data
1. Typically used in streaming / real time scenarios
19.
20. Tensorflow.js model serving demo
1. Open google chrome
2. Open chrome://apps/
3. Start web server
4. RockPaperScissorsTensorflow.jsDemo (based on a courser assignment)
5. Open http://127.0.0.1:8887 in chrome
6. Open developer tools
7. Demo retraining and predictions
21. ⢠Background
⢠ML Lifecycle
⢠Challenges with ML Productization
⢠Examples of end-to-end ML platforms
⢠MLOps Best Practices
⢠MLOps Methodologies
⢠Build, Retrain and Release Pipelines
⢠Mlflow and demo
Agenda
⢠Airflow demo
⢠Model Serving Pipeline
⢠Tensorflow Model Serving
⢠Tensorflow js demo
⢠TFX-based MLOps system on Google Cloud
⢠Azure MLOps
⢠Conclusion
⢠Q & A
24. ⢠Evolving field
⢠Applying learning from other fields like DevOps, Software Engineering
⢠Taking holistic view
⢠Upcoming tools and practices
⢠Key in making ML productization successful
Conclusion