Deploying ML models to production (frequently and safely) - PYCON 2018

How to deploy machine learning models to
production (frequently and safely)

2
hello pycon
David Tan
@davified
Developer @ ThoughtWorks

3
About us
@thoughtworks
https://www.thoughtworks.com/intelligent-empowerment

1. First, a story about all
of us...

6
Temperature check: who has...
● trained a ML model before?
● deployed a ML model for fun?
● deployed a ML model at work?
● an automated deployment pipeline for ML models?

7
The million-dollar question
How can we reliably and repeatably take our models
from our laptop to production?

8
What today’s talk is about
Share principles and practices that can
make it easier for teams to iteratively deploy better ML
products
Share about what to strive towards, and
how to strive towards it

9
Standing on the shoulders of giants
● @jezhumble
● @davefarley77
● @mat_kelcey
● @codingnirvana
● @kief

10
The stack for today’s demo

2. Why deploy
frequently and safely?

14
Why deploy?
Until the model is in production,
it creates value for no one except ourselves

15
● Iteratively improve our model (training with new {data, hyperparameters,
features}
● Correct any biases
● Model decay
● If it’s hard, do it more often
Why deploy frequently?

16
Why deploy safely?
One of these things are not like the others

17
Why deploy safely?
● ML models affect decisions that impact lives… in real-time
● Hippocratic oath for us: Do no harm.
● Safety enable us to iteratively improve ML products that better serve
people

18
Machine learning is only one part of the problem/solution
Source: Hidden Technical Debt in Machine Learning Systems (Google, 2015)
Collecting data /
data engineering
training
ML
models
Deploying and monitoring
ML models
Focus of this talk
Finding the
right
business
problem to
solve

19
Goal of today’s talk
Notebook
/
playgroun
d
:-( :-)
PROD
(maybe
)
Experiment /
Develop
Monitor Deploy
Test
Continuous
Delivery
commit and push

4. So, how do we get there?
Challenges (and solutions from Continuous Delivery practices)

21
Our story’s main characters
Mario the data scientist
Luigi the engineer
loca
l
PROD

Key concept: CI/CD Pipeline
Run unit
tests
Deploy
candidate
model to
STAGING
Deploy
model to
PROD
Train and
evaluate
model
push
Version
control
trigger
feedback
manua
l
trigger
Model
repositor
y
Data / feature repository
Local env
Model
repositor
y
Source: Continuous Delivery (Jez Humble, Dave Farley)

loca
l
PROD
#1: Automated configuration management
Challenge
● Snowflake (dev)
environments
● “Works on my machine!”
Solution
● Single-command setup
● Version control all dependencies, configuration
Benefits
● Enable experimentation by all teammates
● Production-like environment == discover potential
deployment issues early on
dev

24
#1: Automated environment configuration management (Demo)

loca
l
PROD
#2: Test pyramid
Solution
● Testing strategy
● Test every method
Benefits
● Fast feedback
● Safety harness allows team to boldly try new things /
refactor
Challenge
● How can I ensure my
changes haven’t broken
anything?
● How can I enforce the
“goodness” of our
models?
Unit tests
narrow/broad
integration tests
ML metrics
tests
Manual tests
dev
Automate
d

loca
l
PROD
#3: Continuous integration (CI) pipeline for automated testing
Solution
● CI/CD pipeline: automates unit tests → train → test →
deploy (to staging)
● Every code change is tested (assuming tests exist)
● Source code as the only source of software/models
Benefits
● Fast feedback
Challenge
● Everyone may not run
tests. “Goodness” checks
are done manually.
● We could deploy {bugs,
errors, bad models} to
production
dev unit tests train & testVCS

loca
l
PROD
#4: Artifact versioning
Challenge
● How can we revert to
previous models?
● Retraining == time-
consuming
● Manual
renaming/redeployment
s of old models (if we
still have them)
Solution
● Build your binaries once
● Each artifact is tagged with metadata (training data,
hyperparameters, datetime)
Benefits
● Save on build times
● Confidence in artifact increases down the pipeline
● Metadata enables reproducibility
dev train & test version artifactunit testsVCS

loca
l
PROD
#5: Continuous delivery (CD) pipeline for automated deployment
Solution
● Automated deployments triggered by pipeline
● Single-command deployment to staging/production
● Eliminate manual deployments
Benefits
● More rehearsal == More confidence
● Disaster recovery: (single-command) deployment of last
good model in production
Challenge
● Deployments are scary
● Manual deployments ==
potential for mistakes
dev train & test version artifact deploy-stagingunit testsVCS

33
#5: CD pipeline for automated deployment (Demo)
# Deploy model (the actual model)
gcloud beta ml-engine versions create
$VERSION_NAME --model $MODEL_NAME
--origin $DEPLOYMENT_SOURCE
--runtime-version=1.5
--framework $FRAMEWORK
--python-version=3.5

34
#5: CD pipeline for automated deployment (Demo)
# Deploy to prod
gcloud ml-engine versions set-default
$version_to_deploy_to_prod --
model=$MODEL_NAME

loca
l
PROD
#6: Canary releases + monitoring
Solution
● Request shadowing pattern (credit: @codingnirvana)
Benefits
● Confidence increases along the pipeline, backed by metrics
● Monitoring in production == Important source of feedback
Challenge
● How can I know if I’m
deploying a better /
worse model?
● Deployment to
production may not
work as expected
dev train & test version artifact deploy-staging deploy-canary-
prod
unit testsVCS

36
#6: Canary releases + monitoring (Demo)
ML App

loca
l
PROD
#7: Start simple (tracer bullet)
Solution
● Start with simple model + simple features
● Create solid pipeline first
● But, not simpler than what is required (and, don’t take
expensive shortcuts)
Benefits
● Discover integration issues/requirements sooner
● Demonstrate working software to stakeholders in less time
Challenge
● Complex models ==
longer time to develop /
debug
● Getting all the “right”
features ==
weeks / months
dev

38
#7: Start simple (tracer bullet) (Demo)
dev run-unit-tests
train-and
-evaluate-model deploy

loca
l
PROD
#8: Collect more and better data with every release
Solution
● Think about how you can collect labels (immediately or
eventually) after serving predictions (credit: @mat_kelcey)
● Create bug reports for clients
● Complete the data pipeline cycle
● Caution: attempts to game your ML system
Benefits
● More and better data. Nuff said.
Challenge
● Data collection is hard
● Garbage in, garbage out
prod
deploy-produnit testsVCS

loca
l
PROD
#9: Build cross-functional teams
Solution
● Build cross functional teams (data scientist, data engineer,
software engineer, UX, BA)
Benefits
● Less nails (because not everyone is a hammer)
● Improve empathy + reduce silos == productivity
Challenge
● How can we do all of the
above?
prod

loca
l
PROD
#10: Kaizen mindset
Solution
● Kaizen == 改善 == change for better
● Go through deployment health checklists as a team
Benefits
● Iteratively get to good
Challenge
● How can we do all of the
above?
prod

43
#10: Kaizen - Health checklists
❏ General software engineering practices
❏ Source control (e.g. git)
❏ Unit tests
❏ CI pipeline to run automated tests
❏ Automated deployments
❏ Data / feature-related tests
❏ Test all code that creates input features, both in training and serving
❏ ...
❏ Model-related tests
❏ Test against a simpler model as a baseline
❏ ...
Source: A rubric for ML production systems (Google, 2016)

44
#10: Kaizen - Health checks
● How much calendar time to deploy a model from staging to production?
● How much calendar time to add a new feature to the production model?
● How comfortable does your team feel about iteratively deploying
models?

A generalizable approach for deploying ML models frequently and safely
Run unit
tests
Deploy
candidate
model to
STAGING
Deploy
model to
PROD
Train and
evaluate
model
push
Version
control
Credit: Continuous Delivery (Jez Humble, Dave Farley)
trigger
feedback
manua
l
trigger
Model
repositor
y
Data / feature repository
Local env
Model
repositor
y

48
Solve the right problem
We don’t have a machine learning problem.
We have a {business, data, software delivery, ML, UX}
problem

49
Solve the right problem
Deployment and
monitoring
03
Machine learning02
Data collection01
Focus of
today’s talk

50
How to deploy models to prod {frequently, safely, repeatably, reliably}?
1. Automate configuration management
2. Think about your test pyramid
3. Set up a continuous integration (CI) pipeline
4. Version your artifacts (i.e. models)
5. Automated deployment
6. Try canary releases
7. Start simple (tracer bullet)
8. Collect more and better data with every release
9. Build cross-functional teams
10. Kaizen / continuous improvement

52
We’re hiring!
● Software Developers
(>= junior-level devs
welcome)
● UX Designer
● Senior Information
Security Consultant

53
Resources for further reading
● Visibility and monitoring for machine learning (12-min video)
● Using continuous delivery with machine learning models to tackle fraud
● What’s your ML Test Score? A rubric for ML production systems (Google)
● Rules of Machine Learning (Google)
● Continuous Delivery (Jez Humble, Dave Farley)
● Why you need to improve your training data and how to do it

Backup materials /
miscellaneous stuff
This section is for placing any slides / ideas that may eventually make it to the
actual presentation

55
Detailed outline
In the talk, we will show how we constructed our CI/CD/data pipelines, which consists of the following tasks
- Data pipeline
- Get data
- Transform/preprocess data
- Write to feature “repository”
- Local/dev
- Flesh out dev workflow. How can devs experiment / train / debug models?
- ML
- Get a slice of data
- Train model
- Evaluate model
- Web service
- CI - build and test stage
- Train and evaluate model on more data
- CI - deploy stage
- If tests pass, automatically deploy/promote artifact to staging
- Artifact should contain metadata that can help devs decide whether this new model is better than the older model that’s
in production (e.g. precision, accuracy, RMSE, training data-related metadata)
- Manual (one-click) deploy to production
- CI - Monitoring
- Monitor model's predicted values against real bitcoin values (and against existing model in production)
- Canary deployments / dark launches / request shadowing
- Kill switch / rollback: rollback to last known “good” model

58
TODOs
● Goal of presentation
○ How to continuously, quickly and safely deploy machine learning models in production
○ Patterns for deploying ML models
■ Data to read → Model to train → artefacts to promote → deploy → monitoring
● Format
○ Run each step manually
○ Talk about what each step is doing / trying to achieve
○ Demo the “CI” version (Just Push)
● Build demo app
● Collect learnings

59
Sketch out deployment pipeline
● Simple version (train val deploy)
● More complicated version (+ AB testing)
Principles first
Supported with tools and tech

60
Target audience of the talk:
● ML enthusiasts; people who’ve been training/evaluating ML models in jupyter notebooks but
who cannot go beyond that because (i) they can’t get data or (ii) they’re not familiar with
deploying web services

61
Potentially useful libraries
● ModelDB: Model repository + monitoring tool

62
Deployment checklist (link)
This checklist should result in scripts/procedures needed to reliably and repeatedly deploy the
application into the production environment
• The steps required to deploy the application for the first time
• How to smoke-test the application and any services it uses as part of the deployment process
• The steps required to back out the deployment should it go wrong
• The steps required to back up and restore the application’s state
• The steps required to upgrade the application without destroying the application’s state
• The steps to restart or redeploy the application should it fail
• The location of the logs and a description of the information they contain
• The methods of monitoring the application
• The steps to perform any data migrations that are necessary as part of the release
• An issue log of problems from previous deployments, and their solutions

63
Deployment checklist: II
• An asset and configuration management strategy.
• A description of the technology used for deployment. This should be agreed upon by both the operations and
development teams.
• A plan for implementing the deployment pipeline.
• An enumeration of the environments available for acceptance, capacity, integration, and user acceptance testing,
and the process by which builds will be moved through these environments.
• Requirements for monitoring the application, including any APIs or services the application should use to notify
the operations team of its state.
• Description of the integration with any external systems. At what stage and how are they tested as part of a
release? How do the operations personnel communicate with the provider in the event of a problem?
• Details of logging so that operations personnel can determine the application’s state and identify any error
conditions.
• The service-level agreements for the software, which will determine whether the application will require
techniques like failover and other high-availability strategies.
• How the initial deployment to production works.

❏ General software engineering practices
❏ Source control (e.g. git)
❏ Unit tests
❏ CI/CD pipeline
❏ Run automated tests
❏ Automated, or one-step manual deployments
❏ An ability to conduct experiments comparing different system versions
❏ Data / feature-related tests
❏ Test that the distributions of each feature match your expectations
❏ Test that a model does not contain any features that have been manually
determined as unsuitable for use
❏ Test that your system maintains privacy controls across its entire data
pipeline
❏ Test all code that creates input features, both in training and serving
❏ Model-related tests 64
Model deployment readiness checklist (source)

65

❏ ML Infrastructure tests
❏ Test the reproducibility of training
❏ Unit test model specification code
❏ Integration test the full ML pipeline
❏ Test model quality before attempting to serve it
❏ Test that a single example or training batch can be sent to the model, and changes to internal state can
be observed from training through to prediction
❏ Test models via a canary process before they enter production serving environments
❏ Test how quickly and safely a model can be rolled back to a previous serving version
❏ Monitoring tests
❏ Test for upstream instability in features, both in training and serving.
❏ Test that data invariants hold in training and serving inputs.
❏ Test that your training and serving features compute the same values (i.e. training-serving skew)
❏ Test for model staleness
❏ Test for NaNs or infinities appearing in your model during training or serving
❏ Test for dramatic or slow-leak regressions in training speed, serving latency, throughput, or RAM usage
❏ Test for regressions in prediction quality on served data
66

67
Source: https://www.safaribooksonline.com/library/view/strata-data-
conference/9781492025955/video318956.html

Deploying ML models to production (frequently and safely) - PYCON 2018

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Deploying ML models to production (frequently and safely) - PYCON 2018

Semelhante a Deploying ML models to production (frequently and safely) - PYCON 2018 (20)

Último

Último (20)

Deploying ML models to production (frequently and safely) - PYCON 2018

Notas do Editor