A workshop to demonstrate how we can apply agile and continuous delivery principles to continuously deliver value in machine learning and data science projects.
Code: https://github.com/davified/ci-workshop-app
7. 7
TODAY’S PLAN
Share principles and practices that can
make it easier for teams to iteratively deploy better ML
products
Share about what to strive towards, and
how to strive towards it
8. ● Questions are welcome (esp. if we start speaking Greek)
● Use the stickies
○ Red: “I need help!”
○ Yellow: “You’re using too much jargon!”
● Parking lot
● Cross-talking
● Punctuality
8
SOME GROUND RULES
9. 9
Time Session
09.00am - 09.30am Debugging setup (if anyone needs help)
09.30am - 10.30am Intro to agile + continuous intelligence
10.45am - 11.00am Learn enough Docker to be dangerous
11.00am - 12.30pm Dojo: Hands on exercise for continuous intelligence
12.30pm - 1.30pm Lunch
1.30pm - 3.00pm User experience
3.15pm - 4.45pm Dojo: You can’t do continuous delivery
without unit tests
5.00pm - 5.30pm General discussion + Q&A
TODAY’S SCHEDULE
10. LEARNING CHECKLIST
10
❏ Sessions
❏ Environment management with Docker
❏ User experience / product thinking
❏ Continuous integration + Continuous delivery
❏ Test pyramid and unit testing
❏ General discussion
❏ Reuse data processing pipelines to reduce complexity and training-serving
skew
❏ Explainability
❏ Model tracking and monitoring strategies (request shadowing)
❏ Monitor what we care about (metrics, business outcomes, fairness)
❏ Closing the data collection loop
❏ Cross-functional teams
❏ Kaizen health checklists
❏ etc.
12. ● What do we mean by agile?
● Why should we apply agile to machine learning?
● Pain points in machine learning
● How can agile + continuous delivery practices solve these pain points?
12
SESSION PLAN
15. 15
[1] Royce, Winston. "Managing the Development of Large Software Systems", Proceedings of IEEE WESCON26 (August): 1–9. 1970.
[2] Bell, Thomas E., and T. A. Thayer. "Software requirements: Are they really a problem?”, Proceedings of the 2nd international conference on
Software engineering. IEEE Computer Society Press, 1976.
AGILE VS. WATERFALL
17. 17
design code test release
Deployment
issues
Defects
Product
changes
A WATERFALL RELEASE
18. Deliver value continuously through working software
Shorten feedback loops
Technical practices
18
AGILE IN 1 MINUTE
19. … we have come to value:
Individuals and interactions over processes and tools
Working software over comprehensive
documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
That is, while there is value in the items on the right,
we value the items on the left more. 19
AGILE MANIFESTO
24. 25
We don’t have a machine learning problem.
We have a {UX, business, data, software delivery, ML}
problem
Source: Machine Learning: The High Interest Credit Card of Technical Debt (Google, 2015)
WE GOT 99 PROBLEMS AND MACHINE LEARNING AIN’T ONE
26. Leveraging machine
learning techniques
to exhibit intelligent
behavior, and take
autonomous actions
from data insights.
Gain insights from
data to inform
decision making,
including descriptive
and diagnostic
analytics.
Ability to design and
build data platforms,
collecting, streaming
and managing
enterprise-wide data,
ready for analysis
MACHINE
INTELLIGENCE
DATA
INSIGHTS
DATA
PLATFORM
ENGINEERING
Uncovering data
opportunities and
guiding the vision for
transformation
organizations to
become data-led
DATA
STRATEGY
SOFTWARE EXCELLENCE AND PRODUCT THINKING
THOUGHTWORKS’ APPROACH TO MACHINE LEARNING / ARTIFICIAL
INTELLIGENCE
27. 29Source: The AI hierarchy of needs
DELIVERING VALUE - A SLICE AT A TIME
28. PoC
Idea
Make it
simpler
Repeat!
Test in
Lab
Deploy
to prod
Collect,
evaluat
e
Model
iteratio
n
Dark
launch
or A/B
Test in
Lab
Add
value
Uncertain
how to
add value
PoC
Idea
Clear
value add
ML PRODUCT DEVELOPMENT SHOULD BE ITERATIVE, NOT BIG BANG
30. “Works on my machine!”
No data / Data is
everywhere but nowhere
Data stitching (web
scraping, api calls, csv
files)
Not allowed to have
production data on my
laptop
Untitled 17.ipynb
32
Deployment / infra work is
hard
Hard to keep track of
hyperparameters &
metrics
QA is hard
Is my model doing ok?
When should I retrain/re-
release?
Training-serving skew
Interpretability
“I want 100% accuracy”
Harmful models in prod
Users not using our
product
Reproducibility
Dan the data scientist
ML model in
production
31. How can agile + continuous
delivery practices solve these
problems?
32. “Works on my machine!”
No data / Data is
everywhere but nowhere
Data stitching (web
scraping, api calls, csv
files)
Not allowed to have
production data on my
laptop
Untitled 17.ipynb
34
Deployment / infra work
is hard
Hard to keep track of
hyperparameters &
metrics
QA is hard
Is my model doing ok?
When should I retrain/re-
release?
Training-serving skew
Interpretability
“I want 100% accuracy”
Harmful models in prod
Users not using our
product
Reproducibility
WHAT PROBLEMS DOES CONTINUOUS DELIVERY SOLVE?
33. 36
CONTINUOUS DELIVERY PIPELINE
Run unit
tests
Deploy
candidate
model to
STAGING
Deploy
model to
PROD
Train and
evaluate
model
push
Source code
repository
trigger
feedback
Model
repositor
y
Data / feature repository
Local env
Model
repositor
y
Source: Continuous Delivery (Jez Humble, Dave Farley)
34. 37
ANATOMY OF A PIPELINE
● Pipeline
○ Commit stage
■ build
■ run unit tests
■ train model
■ evaluate model → artifact
○ Deploy to staging
■ deploy
■ policy layer tests
■ fairness tests
■ adversarial tests
■ etc.
○ Deploy to prod
■ deploy
37. CONTINUOUS DELIVERY CHECKLIST FOR ML PRODUCTS
40
Source control
● All code changes / model parameter changes are checked into source control
● Trunk-based development
● Code reuse
Configuration management
● Automated configuration management across environments (local, qa, prod)
Data processing
● Feature engineering is done through reusable data pipelines
● Data ingestion pipelines are mature, tested, reusable and automatable
● Regularly used features are precompiled into a feature ‘store’
● Collecting more and better training data from production, with every release (garbage in, garbage out problem)
● Create data turking systems for labelling new data (necessary for monitoring and re-training)
● Self-service data access
● Data access control
Training
● Automated infrastructure provisioning and configuration for model training
● Distributed training where necessary
Testing
● Test pyramid (unit tests, functional tests, policy layer tests, exploratory tests)
● Bias testing and fairness testing
● Adversarial testing
● Established baseline for evaluating model performance
38. 41
CONTINUOUS DELIVERY CHECKLIST FOR ML PRODUCTS
Artifact versioning
● All trained models are artifacted and versioned
● Tag artifacts with relevant metadata (e.g. training data, hyperparameters, datetime)
Deployment
● Set up continuous delivery pipeline
● Tracer bullet. Start with simple model+features
● Single-command deployments
● Disaster recovery: (single-command) deployment of last good model in production
● Frequent deployments to production
Policy layer
● Don’t leave critical things to probability (Use rules / heuristics instead)
Monitoring
● Understand model performance in production using canary releases
● Monitor business metrics
● Monitor ML metrics (e.g. RMSE) tests (i) on models, (ii) using latest prod data
● Monitor anything that helps model interpretation (e.g. confusion matrices)
● Alerts and automated retraining of model candidates when/before performance begins to slip
Workflow
● Build cross-functional teams (UX, BA, DS, DE, DEV, etc)
● Iterative development lifecycle
Regular health checks
● How much calendar time to deploy a model from staging to production?
● How much calendar time to add a new feature to the production model?
● How comfortable does your team feel about iteratively deploying models?
Notas do Editor
Let me share ThoughtWorks’ approach to machine learning
We’re a software consultancy company that specialises in agile software delivery.
Check out our website. We have insights articles, podcasts, tech radar and more.
Voiceover:
We are not just implementers. We are creators, authors, bloggers and speakers who are constantly pushing the state of the art and championing the development of an ecosystem that brings competitive advantage to the enterprise.
We are proud to help shape the community by leveraging our learnings and experiences in delivering complex systems to create thought leadership.
We’ve also written over 80 books on topics ranging from programming languages to architecture, software engineering patterns and practices, continuous delivery, data management, analytics, experience design and building adaptive and responsive organisations. We wrote the book on complex delivery, literally.
Our Technology Radar is read by over 25,000 people on the day it’s released every quarter. In it, we review new commercial and opensource technologies on the horizon and discuss their applicability to the enterprise, based upon our own practical experience. We make recommendations for where you might gain competitive advantage by adopting some new technologies early, and perhaps avoid pitfalls by waiting to adopt others.
Notes:
Caveat: I’ve only been in ThoughtWorks for 2 years, so I haven’t been on many projects. But what I share are based on my experience seeing/being on data projects, and also more importantly, it’s also based on our experience collectively as ThoughtWorks in this space. Think of it as what we know, from projects around the globe - Australia, US, Singapore, etc.
Warn them that the workshop will be quite interactive
Go through schedule
What is a dojo: A school for training in Japanese arts of self-defense, such as judo and karate.
Warn them that the workshop will be quite interactive
Ask the audience: “what does it mean to be agile?”
1970Long feedback loops – by the time we find problems it’s too late to fix them.
“the implementation described above is risky and invites failure. “
Royce: "Figure 10 summarizes the five steps that I feel necessary to transform a risky development process into one that will provide the desired product. I would emphasize that each item costs some additional sum of money. If the relatively simpler process without the five complexities described here would work successfully, then of course the additional money is not well spent. Ii, my experience, however, the simpler method has never worked on large software development efforts and the costs to recover far exceeded those required to finance the five-step process listed.”
“I believe in this concept, but the implementation described above is risky and invites failure. “
CALL OUT: What problems are likely with this approach?
All the problems come at the end
Agile is as the word suggests: the ability to be quick.
It’s not just rituals and ceremonies.
When user’s needs change, we should be able to change code and get it into production within days or weeks.
Agile prevents this by delivering value iteratively, rather than in big bang approaches.
Technical practices:
Automated testing, TDD, unit testing
Test pyramid
Infrastructure automation
DevOps
CI pipelines
Softer practices
Retrospectives
Feedback
Tools are important, but should never be at the expense of people.
2 types of documentation – how to use, how to maintain
Better to work together as partners
FEEDBACK LOOPS
Some of you might have seen this picture before. It’s from Google (2015). It highlights the problems that we have faced before.
All we wanted to do was to write some train some machine learning model and feel awesome, but then we encounter countless challenges (name some of them...)
Complexity is inevitable. That's why we need to build software that constantly manages and partitions complexity.
So that we can keep growing software that can evolve and adapt to changing requirements
This is what it feels like to be doing machine learning, most of the time. (let gif play)
We want to train a model, and we find out we need data first. To get data, we need to wait for access clearance.
All we wanted to do was to write some train some machine learning model and feel awesome, but then we encounter countless challenges (name some of them...)
Complexity is inevitable. That's why we need to build software that constantly manages and partitions complexity.
So that we can keep growing software that can evolve and adapt to changing requirements
We often find ourselves in a high-experimentation-low-engineering environment, and while that's good for experimentation, it's bad for delivering high quality software to users. The ML products we've delivered sit on the low-experimentation-high-engineering spectrum. That allowed us to make code easy to change, maintain, etc. etc.
(animate picture to disappear)
Stages of maturity / readiness for ML. Just as we can't expect a baby “just” walk, we can't expect an organisation to just "do ML".
ML is not data scientists’ capability, it’s an organisational capability
There needs to be prior capabilities, people, processes, infrastructure and tooling to enable this. And at ThoughtWorks, we believe that the best way is to find a problem to solve, and grow that capability organically, by building working software.
Data strategy - align intelligence to the goals, provide right org structure/incentives to get different teams to share data to unlock the value.
Go through the idea of thin slices
Underlying it all, to consider ethical consequences.
We often hear that organisations have many PoCs, but they are unsure how to derive value (HSBC 150 AI prototypes?)
With an approach to continuous intelligence, we can turn this problem on its head, because we can safely deploy experiments to production
So we start by find the simplest thing that adds value and figure out how to deploy that - you might not have the infrastructure yet, but the thought experiment at least resolves some of the uncertainty of the pure PoC approach, and the assumption that production deployment should be costly
This thinking shows how you get on the ladder on the previous slide
So in a few minutes we will be deep diving into one of the practices that help us be agile - continuous delivery.
Such practices solve particular problems, and I think we can better appreciate the value of the practices if we know what kinds of problems / pain points it can solve
So i want to take a few minutes to do a short pain points canvassing.
(read the room, and consider doing an interactive canvassing exercise if the audience is ready)
After pain points canvas slide - so is there a way out of this pain?
3 things
Deployment pipelines
Data processing pipelines
Monitoring
Some of these problems that we face in the ML space are solved problems in the software engineering space.
Repeat the game 3x
Moral of the game: We deliver value faster if we
Continuously integrate/push small chunks of code
Reduce wait time between team members (data scientists, devs, QA, Ops, etc)
Pushing code is easy. But ensuring quality and that code results in value for users - is not a given. For that, we need the CD pipeline.
I’m gonna talk about the key concepts in CD pipeline
Continuous delivery pipeline. It gives us fast feedback
30 seconds - quick overview of this.
The model goes through different stages
Each of them solves a different problem, which we’ll talk about next
Generalizable approach: we can see it working for classifiers, regression models, deep learning models, NLP models, etc.
Tracer bullet - build this pipeline as early as possible in your project, so that you can put models in production as soon as possible, so all subsequent deploys is as simple as a push and a click
Last point - as the artifact goes down this pipeline, our confidence in it increases
Share the litmus test of continuous delivery: when you can deploy from a beach
3 things
Deployment pipelines
Data processing pipelines
Monitoring