SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub

DagsHub
Solving MLOps
from first principles
November ‘21, Dean Pleban

What we’ll cover
1. First principles thinking and mental models – a brief introduction
2. MLOps buyer’s remorse
3. Assumptions about the problem
4. Assumptions about ideal solutions
5. Example 1: Data Versioning
6. Generalizing to a framework
– CONFIDENTIAL –

About Me
Dean Pleban
Building tools for ML teamwork
Strongly believe in open source
Follow me:
@DeanPlbn
DeanPleban

Definitions
Mental models First principles thinking

Remember “that” statistic…

The ML world is changing
OF TEAMS HAVE MODELS
IN PRODUCTION
0%
10%
20%
30%
40%
50%
60%
70%
0 1-10 10-100 Over 100
# of models in production
80%
Not just GAFAM…
THE NEXT CHALLENGE IS
SCALING
FROM 1 TO 10
(OR 10 TO 100) (Poll with over 2000 votes)

MLOps Fatigue – Too many tools, manually synchronized
MLOPS TOOLS
TO CHOOSE FROM &
INTEGRATE WITH
280

The signs
Everything is
manual
Analysis
Paralysis
All or nothing Building
everything in-
house

Minimal Assumptions about the PROBLEM
You are not Google
/ Facebook
MLOps is still in
early days
Save time / future
proof / production
ready tradeoff

Minimal Assumptions about the SOLUTION
Problem vs
Feature focus
Hard part starts
when the first
model goes to
production
Data scientists !=
developers and
how this affects
tooling
Building on OSS
makes sense for
most cases

Example 1: Data Versioning
I want to version my data
My data is regularly changing and I want to
revert back to an older version for disaster
recovery / governance
Step 1: Define the problem

Revert in case
of bug
Compare
different
versions
Knowing
which data is
used where
Add/modify
data without
breaking

• Do you actually suffer from “all the above”?
• Prioritizing is important, separating must-have and nice-to-
have

The type of
data you work
with
The type of
data changes
you expect
What are the
organizational
constraints
Who am I?
Step 2: Define the problem parameters

Step 2: Define the problem parameters
• Flexibility to anything is tempting, but answering each
question differently will lead to very different tooling, so being
specific is important
• Organizational constraints are specifically critical, since they
are many times the largest limitations on the tools to use.
This also ties into modularity. E.g. does your org only work
with Azure cloud tools?
• This can also be the step where we define a “user story” or
workflow that includes this problem – e.g. are we going to
version the DB directly, or just the outputs of our queries?

Step 3: Google the problem

• Specifically, budget a reasonable amount of time (at least 2-3
hours) to research existing solutions
• Now that you’ve defined the problem, and not just features, search
for those (as well as experimenting with problem parameters), this
will give you more tools, that prioritize different problem aspects
• Build out an info page so that other people in the org can review
and add inputs
• You will probably learn that you were searching for the wrong
keywords
• Read blogs and forum posts and see what TERMS people are
using, and search again
• Ask friends, use Reddit as a tool to discover keywords – describe
your problem and people will recommend the tools and
categories you need.

Reddit
example
Googling
examples
Example of a tool
research output
Recommend
ed blogs

Pre-technical
evaluation
Operating
principles
“Hello World” Kick the tires –
mechanically
Step 4: Evaluate solutions

• Is there a hosted solution?
• How much does it cost?
• If I go for a hosted solution, how easy will it be to bring it in-
house in the future, or customize it to my needs
• How easy is it to get out of them
• How easy is it to get out of them if they prove less useful

Comparing 2 data
versioning tools
from a “face value”
perspective
Looking at the
operating
principles of DVC
Get started
tutorial
Try to add a
dataset with
10K images

Start simple – 1
project, 1 user
Define criteria for
success, or don’t
Review and
extrapolate
Step 5: Integrate

The 5 step process
1. Define the problem
2. Define the problem parameters
3. Google the problem
4. Evaluate solutions
5. Integrate

SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub

Similar to SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub (20)

More from DevOpsDays Tel Aviv

More from DevOpsDays Tel Aviv (20)

Recently uploaded

Recently uploaded (20)

SOLVING MLOPS FROM FIRST PRINCIPLES, DEAN PLEBAN, DagsHub