Over the next few years, every company must develop a strategy to leverage artificial intelligence and machine learning to stay relevant and beat out competitors. This requires hiring talented data scientists as well as DevOps and data engineers who can put these into production. Today, finding that perfect combination of talent can be difficult, but a focus on retraining and productivity tools can increase a small team’s impact on business ROI by over 10x. In this technical talk, we discuss how enterprises can better prepare their employees to deploy artificial intelligence and machine learning into production by using the same techniques used in software to add provenance, reliability, and efficiency to these processes. Specifically, we describe the benefits of adding provenance including reliable deployments and builds, A/B testing, continuous deployment, and automation and show how they can decrease the time to business ROI by over 10x.
3. @AnandSampat +
Talk Outline
1. Rise of AI / ML in the Enterprise
2. Unique challenges of AI
3. Provenance, Reliability, and Efficiency
4. How Datmo bridges the gap
4. @AnandSampat +
Demand for Talent is Increasing
Today
Data Scientists: 48k
https://www.pwc.com/us/en/library/
data-science-and-analytics.html
Tomorrow
Data Engineers: 558k
http://www.mckinsey.com/business-functions/mckinsey-analytics/our-
insights/the-age-of-analytics-competing-in-a-data-driven-world
6. @AnandSampat +
Talk Outline
1. Rise of AI / ML in the Enterprise
2. Unique challenges of AI
3. Provenance, Reliability, and Efficiency
4. How Datmo bridges the gap
8. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
Am I a QoD?
11. @AnandSampat +
It’s time to talk about MLOps
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-
systems.pdf
12. @AnandSampat +
MLOps: The Elephant in the Room
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-
systems.pdf
13. @AnandSampat +
ML systems have a special capacity for incurring
technical debt, because they have all of the
maintenance problems of traditional code plus an
additional set of ML-specific issues. This debt may be
difficult to detect because it exists at the system level.
“
— Google (Sculley et. al, 2015)
14. @AnandSampat +
Typical methods for paying down code level
technical debt are not sufficient to address
ML-specific technical debt at the system level.
“
— Google (Sculley et. al, 2015)
20. @AnandSampat +
Talk Outline
1. Rise of AI / ML in the Enterprise
2. Unique challenges of AI
3. Provenance, Reliability, and Efficiency
4. How Datmo bridges the gap
21. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Provenance:
Model and Workflow
Reproducibility
22. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Problem: Model
reproduction is tough
- Configurations & Metrics
- Traditional SCM tools (like Git) do a
good job of tracking changes
between code snippets but
overlook machine learning
parameters and scoring metrics
- Dependencies
- Hardware Configuration
- GPU Setup/CUDA
- OS-level settings/programs
- How can you install packages
without a package manager?
23. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Solution: Tracking and
Containerization
- Track your configurations
and metrics in 1 place
- With containers, you can
write build files that enable
you to enumerate
everything required to
reproduce a given system
state
Problem: Model
reproduction is tough
- Configurations & Metrics
- Traditional SCM tools (like Git) do a
good job of tracking changes
between code snippets but
overlook machine learning
parameters and scoring metrics
- Dependencies
- Hardware Configuration
- GPU Setup/CUDA
- OS-level settings/programs
- How can you install packages
without a package manager?
24. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Example 1: “Offline” Logging (bad)
25. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Example 2: Online Logging with Visualizable
Metrics (good)
Unfortunately, TensorBoard
is only available
for TensorFlow!
26. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Example 3: Docker and Dockerfiles
27. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Reliability:
Peace of Mind
28. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
- Traditional software build tools
overlook model scoring and
metrics and thus do not check
builds for these metrics
- Traditional software
deployment don’t take into
account the nuances of
machine learning models
Problem: Builds and
deployments don’t account
for machine learning
29. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Solution:
Builds and Deployment
with machine learning
metrics
- Set scoring thresholds for
validation metrics of
models for builds
- Deploy your machine
learning as micro services
which can be updated on
a different schedule from
the main application.
Problem: Builds and
deployments don’t account
for machine learning
- Traditional software build tools
overlook model scoring and
metrics and thus do not check
builds for these metrics
- Traditional software
deployment don’t take into
account the nuances of
machine learning models
30. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Efficiency:
Reduce the time to success
31. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Problem: Disjoint tools
slow down iteration
- Software tools are not built to
iterate on machine learning
algorithms
- Machine learning does not
follow the same build schedule
as your main application
32. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Solution: A/B testing,
continuous deployment, and
automation
- A/B testing models enables
quick performance
comparisons to identify the
best parameters
- Continuous deployment
ensures that deployed
models work as expected
- Automation enables
triggers to create actions
Problem: Disjoint tools
slow down iteration
- Software tools are not built to
iterate on machine learning
algorithms
- Machine learning does not
follow the same build schedule
as your main application
33. @AnandSampat +
Talk Outline
1. Rise of AI / ML in the Enterprise
2. Unique challenges of AI
3. Provenance, Reliability, and Efficiency
4. How Datmo bridges the gap
34. @AnandSampat +
What is Datmo?
Datmo is a unified platform for ML, AI, and Data
Science developers. Datmo’s free Community
Edition enables model version control, easy
environment handling, and reproducing results
through the power of snapshots. Datmo
Enterprise leverages snapshots to enable
reliable builds, quick deployments, efficient A/
B testing and continuous delivery of analytics
workflows and models
35. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
Provenance: Datmo CE
36. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
- Snapshots - Model versions which combine code, files,
environments, configurations, and performance metrics
- Runnable Anywhere - The tool can be run on any server to
enable you to move your models freely between servers and
share them with colleagues
Datmo CE
38. @AnandSampat +
Why are they important?
Environment
Configuration
Metrics
Datmo Snapshots
Git Commits
Code
Files*
39. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
GUI to View Snapshots
40. @AnandSampat +
How will it help?
Datmo leverages containers to quickly
spin up perfectly reproducible
developer environments. It tracks this
environment, along with model
metadata inside of snapshots.
41. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
Reliability: Datmo EE
42. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
- Builds - Model versions with Snapshot can be built by adding
validation tests that track your performance metrics
- Deployment - can be pushed as microservices so you can
update them on a different schedule from the rest of your main
application
Datmo EE
(Builds and Deployment)
43. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Deployment:
Containerization
44. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
Efficiency: Datmo EE
45. @TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
- A/B Testing — enables you to deploy a few microservices in
parallel which let’s you compare algorithms
- Continuous Deployment — enables you to update your builds
with tests that ensure your validation metrics meet your threshold
- Automation — Create triggers and actions to retrain your models
with new data, update your models frequently, or ensure you are
always in the know when models aren’t working.
Datmo EE
(A/B Testing, Continuous Deployment, Automation)
46. @AnandSampat +
Datmo CE + EE
Make ML Ops and workflows
manageable and simple, not
completely abstracted away.
Reduce the amount of glue code
so that people can have more
robust pipelines.
47. @AnandSampat +
1. AI applications are growing day-by-day. These
technologies require new capabilities
Key Takeaways
2. Provenance, Reliability, and Efficiency are required
for any production system — ML is no different
3. Datmo CE and EE provide full provenance, reliability,
and efficiency through snapshots which enable builds,
deployments, A/B testing and continuous delivery