One of the hardest challenges data teams face today is selecting which tools to use in their workflow. Marketing messages are vague, and you continuously hear of new buzzwords you “just have to have in your stack”. There is a constant stream of new tools, open-source and proprietary that make buyer’s remorse especially bad. I call it “MLOps Fatigue”.
This talk will not discuss a specific MLOps tool, but instead present guidelines and mental models for how to think about the problems you and your team are facing, and how to select the best tools for the task. We will review a few example problems, analyze them, and suggest Open Source solutions for them. We will provide a mental framework that will help tackle future problems you might face and extract the concrete value each tool provides.
What you’ll learn
You’ll learn what signals to watch for to notice you might have MLOps fatigue. How to define the challenge you’re facing and which questions to ask in order to build a “decision tree” for selecting the best-suited tools for the task. A few examples for using this framework in practice on challenges involving data management and automating training/pipeline tasks
About 2 years ago we faced a crisis in our DevOps consulting company - the market demand was higher than we could supply. The traditional recruiting process depending on CV and artificial credentials was not working. So we came up with an alternative solution, and since then - we are growing exponentially and diversely. In this talk we will show the practical tools we deployed in order to increase our capacity, and we will show how these tools overcome the inherited bias in the process.
2. What we’ll cover
1. First principles thinking and mental models – a brief introduction
2. MLOps buyer’s remorse
3. Assumptions about the problem
4. Assumptions about ideal solutions
5. Example 1: Data Versioning
6. Generalizing to a framework
– CONFIDENTIAL –
6. The ML world is changing
– CONFIDENTIAL –
OF TEAMS HAVE MODELS
IN PRODUCTION
0%
10%
20%
30%
40%
50%
60%
70%
0 1-10 10-100 Over 100
# of models in production
80%
Not just GAFAM…
THE NEXT CHALLENGE IS
SCALING
FROM 1 TO 10
(OR 10 TO 100) (Poll with over 2000 votes)
7. MLOps Fatigue – Too many tools, manually synchronized
– CONFIDENTIAL –
MLOPS TOOLS
TO CHOOSE FROM &
INTEGRATE WITH
280
9. Minimal Assumptions about the PROBLEM
You are not Google
/ Facebook
MLOps is still in
early days
Save time / future
proof / production
ready tradeoff
10. Minimal Assumptions about the SOLUTION
Problem vs
Feature focus
Hard part starts
when the first
model goes to
production
Data scientists !=
developers and
how this affects
tooling
Building on OSS
makes sense for
most cases
11. Example 1: Data Versioning
I want to version my data
My data is regularly changing and I want to
revert back to an older version for disaster
recovery / governance
Step 1: Define the problem
12. Step 1: Define the problem
Revert in case
of bug
Compare
different
versions
Knowing
which data is
used where
Add/modify
data without
breaking
13. Step 1: Define the problem
• Do you actually suffer from “all the above”?
• Prioritizing is important, separating must-have and nice-to-
have
14. Example 1: Data Versioning
The type of
data you work
with
The type of
data changes
you expect
What are the
organizational
constraints
Who am I?
Step 2: Define the problem parameters
15. Step 2: Define the problem parameters
• Flexibility to anything is tempting, but answering each
question differently will lead to very different tooling, so being
specific is important
• Organizational constraints are specifically critical, since they
are many times the largest limitations on the tools to use.
This also ties into modularity. E.g. does your org only work
with Azure cloud tools?
• This can also be the step where we define a “user story” or
workflow that includes this problem – e.g. are we going to
version the DB directly, or just the outputs of our queries?
17. Step 3: Google the problem
• Specifically, budget a reasonable amount of time (at least 2-3
hours) to research existing solutions
• Now that you’ve defined the problem, and not just features, search
for those (as well as experimenting with problem parameters), this
will give you more tools, that prioritize different problem aspects
• Build out an info page so that other people in the org can review
and add inputs
• You will probably learn that you were searching for the wrong
keywords
• Read blogs and forum posts and see what TERMS people are
using, and search again
• Ask friends, use Reddit as a tool to discover keywords – describe
your problem and people will recommend the tools and
categories you need.
18. Step 3: Google the problem
Reddit
example
Googling
examples
Example of a tool
research output
Recommend
ed blogs
19. Example 1: Data Versioning
Pre-technical
evaluation
Operating
principles
“Hello World” Kick the tires –
mechanically
Step 4: Evaluate solutions
20. Step 4: Evaluate solutions
• Is there a hosted solution?
• How much does it cost?
• If I go for a hosted solution, how easy will it be to bring it in-
house in the future, or customize it to my needs
• How easy is it to get out of them
• How easy is it to get out of them if they prove less useful
21. Step 4: Evaluate solutions
Comparing 2 data
versioning tools
from a “face value”
perspective
Looking at the
operating
principles of DVC
Get started
tutorial
Try to add a
dataset with
10K images
22. Example 1: Data Versioning
Start simple – 1
project, 1 user
Define criteria for
success, or don’t
Review and
extrapolate
Step 5: Integrate
23. The 5 step process
1. Define the problem
2. Define the problem parameters
3. Google the problem
4. Evaluate solutions
5. Integrate