SlideShare uma empresa Scribd logo
1 de 53
Baixar para ler offline
@AnandSampat +
Provenance in Production-Grade
Machine Learning
Talk
Santa Clara Convention Center
@AnandSampat +
Anand Sampat
CEO & Co-founder, Datmo
@AnandSampat
@AnandSampat +
Talk Outline
1. Rise of AI / ML in the Enterprise
2. Unique challenges of AI
3. Provenance, Reliability, and Efficiency
4. How Datmo bridges the gap
@AnandSampat +
Demand for Talent is Increasing
Today
Data Scientists: 48k
https://www.pwc.com/us/en/library/
data-science-and-analytics.html
Tomorrow
Data Engineers: 558k
http://www.mckinsey.com/business-functions/mckinsey-analytics/our-
insights/the-age-of-analytics-competing-in-a-data-driven-world
@AnandSampat +
Supply is Limited, but it’s growing
https://github.com
@AnandSampat +
Talk Outline
1. Rise of AI / ML in the Enterprise
2. Unique challenges of AI
3. Provenance, Reliability, and Efficiency
4. How Datmo bridges the gap
@AnandSampat +
QoD’s == Quantitative Oriented Developers
Artificial IntelligenceData Science Machine Learning
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
Am I a QoD?
@AnandSampat +
https://blog.datmo.io/demystifying-the-ml-ai-and-data-science-development-
ecosystem-part-1-build-76c6d4911d07
@AnandSampat +
https://blog.datmo.io/demystifying-the-ml-ai-and-data-science-development-
ecosystem-part-1-build-76c6d4911d07
+ Deployment!

+ Post-Deployment!
(DevOps!)
@AnandSampat +
It’s time to talk about MLOps
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-
systems.pdf
@AnandSampat +
MLOps: The Elephant in the Room
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-
systems.pdf
@AnandSampat +
ML systems have a special capacity for incurring
technical debt, because they have all of the
maintenance problems of traditional code plus an
additional set of ML-specific issues. This debt may be
difficult to detect because it exists at the system level.
“
— Google (Sculley et. al, 2015)
@AnandSampat +
Typical methods for paying down code level
technical debt are not sufficient to address
ML-specific technical debt at the system level.
“
— Google (Sculley et. al, 2015)
@AnandSampat +
http://eng.uber.com/wp-content/uploads/2017/09/image8.png
Here’s where traditional tools fall short
@AnandSampat +
http://eng.uber.com/wp-content/uploads/2017/09/image8.png
Here’s where traditional tools fall short
@AnandSampat +
@AnandSampat +
https://eng.uber.com/michelangelo/
https://code.facebook.com/posts/1072626246134461/
introducing-fblearner-flow-facebook-s-ai-backbone/
@AnandSampat +
As for everyone else?
@AnandSampat +
Talk Outline
1. Rise of AI / ML in the Enterprise
2. Unique challenges of AI
3. Provenance, Reliability, and Efficiency
4. How Datmo bridges the gap
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Provenance:
Model and Workflow
Reproducibility
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Problem: Model
reproduction is tough
- Configurations & Metrics
- Traditional SCM tools (like Git) do a
good job of tracking changes
between code snippets but
overlook machine learning
parameters and scoring metrics
- Dependencies
- Hardware Configuration
- GPU Setup/CUDA
- OS-level settings/programs
- How can you install packages
without a package manager?
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Solution: Tracking and
Containerization
- Track your configurations
and metrics in 1 place
- With containers, you can
write build files that enable
you to enumerate
everything required to
reproduce a given system
state
Problem: Model
reproduction is tough
- Configurations & Metrics
- Traditional SCM tools (like Git) do a
good job of tracking changes
between code snippets but
overlook machine learning
parameters and scoring metrics
- Dependencies
- Hardware Configuration
- GPU Setup/CUDA
- OS-level settings/programs
- How can you install packages
without a package manager?
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Example 1: “Offline” Logging (bad)
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Example 2: Online Logging with Visualizable
Metrics (good)
Unfortunately, TensorBoard
is only available
for TensorFlow!
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Example 3: Docker and Dockerfiles
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Reliability:
Peace of Mind
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
- Traditional software build tools
overlook model scoring and
metrics and thus do not check
builds for these metrics
- Traditional software
deployment don’t take into
account the nuances of
machine learning models
Problem: Builds and
deployments don’t account
for machine learning
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Solution:
Builds and Deployment
with machine learning
metrics
- Set scoring thresholds for
validation metrics of
models for builds
- Deploy your machine
learning as micro services
which can be updated on
a different schedule from
the main application.
Problem: Builds and
deployments don’t account
for machine learning
- Traditional software build tools
overlook model scoring and
metrics and thus do not check
builds for these metrics
- Traditional software
deployment don’t take into
account the nuances of
machine learning models
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Efficiency:
Reduce the time to success
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Problem: Disjoint tools
slow down iteration
- Software tools are not built to
iterate on machine learning
algorithms
- Machine learning does not
follow the same build schedule
as your main application
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Solution: A/B testing,
continuous deployment, and
automation
- A/B testing models enables
quick performance
comparisons to identify the
best parameters
- Continuous deployment
ensures that deployed
models work as expected
- Automation enables
triggers to create actions
Problem: Disjoint tools
slow down iteration
- Software tools are not built to
iterate on machine learning
algorithms
- Machine learning does not
follow the same build schedule
as your main application
@AnandSampat +
Talk Outline
1. Rise of AI / ML in the Enterprise
2. Unique challenges of AI
3. Provenance, Reliability, and Efficiency
4. How Datmo bridges the gap
@AnandSampat +
What is Datmo?
Datmo is a unified platform for ML, AI, and Data
Science developers. Datmo’s free Community
Edition enables model version control, easy
environment handling, and reproducing results
through the power of snapshots. Datmo
Enterprise leverages snapshots to enable
reliable builds, quick deployments, efficient A/
B testing and continuous delivery of analytics
workflows and models
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
Provenance: Datmo CE
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
- Snapshots - Model versions which combine code, files,
environments, configurations, and performance metrics
- Runnable Anywhere - The tool can be run on any server to
enable you to move your models freely between servers and
share them with colleagues
Datmo CE
@AnandSampat +
What are Datmo Snapshots?
Code
Environment
Configuration
Files*
Metrics
@AnandSampat +
Why are they important?
Environment
Configuration
Metrics
Datmo Snapshots
Git Commits
Code
Files*
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
GUI to View Snapshots
@AnandSampat +
How will it help?
Datmo leverages containers to quickly
spin up perfectly reproducible
developer environments. It tracks this
environment, along with model
metadata inside of snapshots.
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
Reliability: Datmo EE
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
- Builds - Model versions with Snapshot can be built by adding
validation tests that track your performance metrics
- Deployment - can be pushed as microservices so you can
update them on a different schedule from the rest of your main
application
Datmo EE
(Builds and Deployment)
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +
Deployment:
Containerization
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
Efficiency: Datmo EE
@TheNickWalsh +
Big Header
Section Header
Here are a bunch of words that will be
used to describe something. I’m
typing a bunch of words to fill up the
box.
Medium header with
A lot of words
Caption
Subtitle
@AnandSampat +@TheNickWalsh
- A/B Testing — enables you to deploy a few microservices in
parallel which let’s you compare algorithms
- Continuous Deployment — enables you to update your builds
with tests that ensure your validation metrics meet your threshold
- Automation — Create triggers and actions to retrain your models
with new data, update your models frequently, or ensure you are
always in the know when models aren’t working.
Datmo EE
(A/B Testing, Continuous Deployment, Automation)
@AnandSampat +
Datmo CE + EE
Make ML Ops and workflows
manageable and simple, not
completely abstracted away.
Reduce the amount of glue code
so that people can have more
robust pipelines.
@AnandSampat +
1. AI applications are growing day-by-day. These
technologies require new capabilities
Key Takeaways
2. Provenance, Reliability, and Efficiency are required
for any production system — ML is no different
3. Datmo CE and EE provide full provenance, reliability,
and efficiency through snapshots which enable builds,
deployments, A/B testing and continuous delivery
@AnandSampat +
Going Forward
@AnandSampat +
2015 NIPS Paper from Google
https://papers.nips.cc/paper/5656-hidden-
technical-debt-in-machine-learning-systems.pdf
@AnandSampat +
Learn More about Us at our Blog
https://blog.datmo.com/
@AnandSampat +
Check out our Product Pages
https://datmo.com/enterprisehttps://datmo.com/community
@AnandSampat +
Full Slides Available at:
http://bit.ly/global-ai-conf-provenance
@AnandSampat +
Thank You!

Mais conteúdo relacionado

Mais procurados

Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
Scaling Analysis Responsibly
Scaling Analysis ResponsiblyScaling Analysis Responsibly
Scaling Analysis ResponsiblyWork-Bench
 
Managers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsManagers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsGianmario Spacagna
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowJan Kirenz
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkDatabricks
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningLviv Startup Club
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016MLconf
 
CD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systemsCD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systemsSeldon
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)dtz001
 
Deploying ML models to production (frequently and safely) - PYCON 2018
Deploying ML models to production (frequently and safely) - PYCON 2018Deploying ML models to production (frequently and safely) - PYCON 2018
Deploying ML models to production (frequently and safely) - PYCON 2018David Tan
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data SciencePouria Amirian
 
From NASA to Startups to Big Commerce
From NASA to Startups to Big CommerceFrom NASA to Startups to Big Commerce
From NASA to Startups to Big CommerceDaniel Greenfeld
 
Patrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San Francisco
Patrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San FranciscoPatrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San Francisco
Patrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San FranciscoSri Ambati
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
 
Machine Learning in Production
Machine Learning in ProductionMachine Learning in Production
Machine Learning in ProductionBen Freundorfer
 
jlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STARjlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STARJonathan Lettvin
 
Spark NLP: State of the Art Natural Language Processing at Scale
Spark NLP: State of the Art Natural Language Processing at ScaleSpark NLP: State of the Art Natural Language Processing at Scale
Spark NLP: State of the Art Natural Language Processing at ScaleDatabricks
 

Mais procurados (20)

Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
DevOps for DataScience
DevOps for DataScienceDevOps for DataScience
DevOps for DataScience
 
Scaling Analysis Responsibly
Scaling Analysis ResponsiblyScaling Analysis Responsibly
Scaling Analysis Responsibly
 
Managers guide to effective building of machine learning products
Managers guide to effective building of machine learning productsManagers guide to effective building of machine learning products
Managers guide to effective building of machine learning products
 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & KubeflowMLOps - Build pipelines with Tensor Flow Extended & Kubeflow
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
 
Simplifying AI integration on Apache Spark
Simplifying AI integration on Apache SparkSimplifying AI integration on Apache Spark
Simplifying AI integration on Apache Spark
 
Magdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine LearningMagdalena Stenius: MLOPS Will Change Machine Learning
Magdalena Stenius: MLOPS Will Change Machine Learning
 
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
Ryan Curtin, Principal Research Scientist, Symantec at MLconf ATL 2016
 
CD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systemsCD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systems
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
Deploying ML models to production (frequently and safely) - PYCON 2018
Deploying ML models to production (frequently and safely) - PYCON 2018Deploying ML models to production (frequently and safely) - PYCON 2018
Deploying ML models to production (frequently and safely) - PYCON 2018
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
From NASA to Startups to Big Commerce
From NASA to Startups to Big CommerceFrom NASA to Startups to Big Commerce
From NASA to Startups to Big Commerce
 
Patrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San Francisco
Patrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San FranciscoPatrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San Francisco
Patrick Hall, H2O.ai - Human Friendly Machine Learning - H2O World San Francisco
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
Machine Learning in Production
Machine Learning in ProductionMachine Learning in Production
Machine Learning in Production
 
jlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STARjlettvin.resume.20160922.STAR
jlettvin.resume.20160922.STAR
 
Spark NLP: State of the Art Natural Language Processing at Scale
Spark NLP: State of the Art Natural Language Processing at ScaleSpark NLP: State of the Art Natural Language Processing at Scale
Spark NLP: State of the Art Natural Language Processing at Scale
 

Semelhante a Provenance in Production-Grade Machine Learning

Talend webinar
Talend webinarTalend webinar
Talend webinarEdureka!
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineSrivatsan Srinivasan
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.Edureka!
 
Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?Edureka!
 
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendSimplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendEdureka!
 
Mastering Data Engineering: Common Data Engineer Interview Questions You Shou...
Mastering Data Engineering: Common Data Engineer Interview Questions You Shou...Mastering Data Engineering: Common Data Engineer Interview Questions You Shou...
Mastering Data Engineering: Common Data Engineer Interview Questions You Shou...FredReynolds2
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 
Why do most machine learning projects never make it to production
Why do most machine learning projects never make it to productionWhy do most machine learning projects never make it to production
Why do most machine learning projects never make it to productionCameron Vetter
 
Single sourcing to the max
Single sourcing to the maxSingle sourcing to the max
Single sourcing to the maxNeil Perlin
 
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data  : Secret Key to HadoopTalend For Big Data  : Secret Key to Hadoop
Talend For Big Data : Secret Key to HadoopEdureka!
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionInside Analysis
 
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB
 
ETL using Big Data Talend
ETL using Big Data Talend  ETL using Big Data Talend
ETL using Big Data Talend Edureka!
 
Topic based and structured authoring - slides
Topic based and structured authoring - slidesTopic based and structured authoring - slides
Topic based and structured authoring - slidesNeil Perlin
 
Topic based and structured authoring - slides
Topic based and structured authoring - slidesTopic based and structured authoring - slides
Topic based and structured authoring - slidesNeil Perlin
 
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...NadinaLisbon1
 
ChatGPT and not only: How to use the power of GPT-X models at scale
ChatGPT and not only: How to use the power of GPT-X models at scaleChatGPT and not only: How to use the power of GPT-X models at scale
ChatGPT and not only: How to use the power of GPT-X models at scaleMaxim Salnikov
 

Semelhante a Provenance in Production-Grade Machine Learning (20)

Talend webinar
Talend webinarTalend webinar
Talend webinar
 
Real World End to End machine Learning Pipeline
Real World End to End machine Learning PipelineReal World End to End machine Learning Pipeline
Real World End to End machine Learning Pipeline
 
AI 2023.pdf
AI 2023.pdfAI 2023.pdf
AI 2023.pdf
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Manipulating Data with Talend.
Manipulating Data with Talend.Manipulating Data with Talend.
Manipulating Data with Talend.
 
Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?Manipulating data with Talend. Learn how?
Manipulating data with Talend. Learn how?
 
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendSimplifying Big Data ETL with Talend
Simplifying Big Data ETL with Talend
 
Mastering Data Engineering: Common Data Engineer Interview Questions You Shou...
Mastering Data Engineering: Common Data Engineer Interview Questions You Shou...Mastering Data Engineering: Common Data Engineer Interview Questions You Shou...
Mastering Data Engineering: Common Data Engineer Interview Questions You Shou...
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
Why do most machine learning projects never make it to production
Why do most machine learning projects never make it to productionWhy do most machine learning projects never make it to production
Why do most machine learning projects never make it to production
 
Single sourcing to the max
Single sourcing to the maxSingle sourcing to the max
Single sourcing to the max
 
Talend For Big Data : Secret Key to Hadoop
Talend For Big Data  : Secret Key to HadoopTalend For Big Data  : Secret Key to Hadoop
Talend For Big Data : Secret Key to Hadoop
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google CloudMongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
MongoDB World 2018: Building Intelligent Apps with MongoDB & Google Cloud
 
ETL using Big Data Talend
ETL using Big Data Talend  ETL using Big Data Talend
ETL using Big Data Talend
 
Topic based and structured authoring - slides
Topic based and structured authoring - slidesTopic based and structured authoring - slides
Topic based and structured authoring - slides
 
Topic based and structured authoring - slides
Topic based and structured authoring - slidesTopic based and structured authoring - slides
Topic based and structured authoring - slides
 
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
Salesforce Architect Group, Frederick, United States July 2023 - Generative A...
 
ChatGPT and not only: How to use the power of GPT-X models at scale
ChatGPT and not only: How to use the power of GPT-X models at scaleChatGPT and not only: How to use the power of GPT-X models at scale
ChatGPT and not only: How to use the power of GPT-X models at scale
 

Último

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 

Último (20)

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 

Provenance in Production-Grade Machine Learning

  • 1. @AnandSampat + Provenance in Production-Grade Machine Learning Talk Santa Clara Convention Center
  • 2. @AnandSampat + Anand Sampat CEO & Co-founder, Datmo @AnandSampat
  • 3. @AnandSampat + Talk Outline 1. Rise of AI / ML in the Enterprise 2. Unique challenges of AI 3. Provenance, Reliability, and Efficiency 4. How Datmo bridges the gap
  • 4. @AnandSampat + Demand for Talent is Increasing Today Data Scientists: 48k https://www.pwc.com/us/en/library/ data-science-and-analytics.html Tomorrow Data Engineers: 558k http://www.mckinsey.com/business-functions/mckinsey-analytics/our- insights/the-age-of-analytics-competing-in-a-data-driven-world
  • 5. @AnandSampat + Supply is Limited, but it’s growing https://github.com
  • 6. @AnandSampat + Talk Outline 1. Rise of AI / ML in the Enterprise 2. Unique challenges of AI 3. Provenance, Reliability, and Efficiency 4. How Datmo bridges the gap
  • 7. @AnandSampat + QoD’s == Quantitative Oriented Developers Artificial IntelligenceData Science Machine Learning
  • 8. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat +@TheNickWalsh Am I a QoD?
  • 11. @AnandSampat + It’s time to talk about MLOps https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning- systems.pdf
  • 12. @AnandSampat + MLOps: The Elephant in the Room https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning- systems.pdf
  • 13. @AnandSampat + ML systems have a special capacity for incurring technical debt, because they have all of the maintenance problems of traditional code plus an additional set of ML-specific issues. This debt may be difficult to detect because it exists at the system level. “ — Google (Sculley et. al, 2015)
  • 14. @AnandSampat + Typical methods for paying down code level technical debt are not sufficient to address ML-specific technical debt at the system level. “ — Google (Sculley et. al, 2015)
  • 19. @AnandSampat + As for everyone else?
  • 20. @AnandSampat + Talk Outline 1. Rise of AI / ML in the Enterprise 2. Unique challenges of AI 3. Provenance, Reliability, and Efficiency 4. How Datmo bridges the gap
  • 21. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat + Provenance: Model and Workflow Reproducibility
  • 22. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat + Problem: Model reproduction is tough - Configurations & Metrics - Traditional SCM tools (like Git) do a good job of tracking changes between code snippets but overlook machine learning parameters and scoring metrics - Dependencies - Hardware Configuration - GPU Setup/CUDA - OS-level settings/programs - How can you install packages without a package manager?
  • 23. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat + Solution: Tracking and Containerization - Track your configurations and metrics in 1 place - With containers, you can write build files that enable you to enumerate everything required to reproduce a given system state Problem: Model reproduction is tough - Configurations & Metrics - Traditional SCM tools (like Git) do a good job of tracking changes between code snippets but overlook machine learning parameters and scoring metrics - Dependencies - Hardware Configuration - GPU Setup/CUDA - OS-level settings/programs - How can you install packages without a package manager?
  • 24. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat + Example 1: “Offline” Logging (bad)
  • 25. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat + Example 2: Online Logging with Visualizable Metrics (good) Unfortunately, TensorBoard is only available for TensorFlow!
  • 26. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat + Example 3: Docker and Dockerfiles
  • 27. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat + Reliability: Peace of Mind
  • 28. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat + - Traditional software build tools overlook model scoring and metrics and thus do not check builds for these metrics - Traditional software deployment don’t take into account the nuances of machine learning models Problem: Builds and deployments don’t account for machine learning
  • 29. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat + Solution: Builds and Deployment with machine learning metrics - Set scoring thresholds for validation metrics of models for builds - Deploy your machine learning as micro services which can be updated on a different schedule from the main application. Problem: Builds and deployments don’t account for machine learning - Traditional software build tools overlook model scoring and metrics and thus do not check builds for these metrics - Traditional software deployment don’t take into account the nuances of machine learning models
  • 30. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat + Efficiency: Reduce the time to success
  • 31. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat + Problem: Disjoint tools slow down iteration - Software tools are not built to iterate on machine learning algorithms - Machine learning does not follow the same build schedule as your main application
  • 32. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat + Solution: A/B testing, continuous deployment, and automation - A/B testing models enables quick performance comparisons to identify the best parameters - Continuous deployment ensures that deployed models work as expected - Automation enables triggers to create actions Problem: Disjoint tools slow down iteration - Software tools are not built to iterate on machine learning algorithms - Machine learning does not follow the same build schedule as your main application
  • 33. @AnandSampat + Talk Outline 1. Rise of AI / ML in the Enterprise 2. Unique challenges of AI 3. Provenance, Reliability, and Efficiency 4. How Datmo bridges the gap
  • 34. @AnandSampat + What is Datmo? Datmo is a unified platform for ML, AI, and Data Science developers. Datmo’s free Community Edition enables model version control, easy environment handling, and reproducing results through the power of snapshots. Datmo Enterprise leverages snapshots to enable reliable builds, quick deployments, efficient A/ B testing and continuous delivery of analytics workflows and models
  • 35. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat +@TheNickWalsh Provenance: Datmo CE
  • 36. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat +@TheNickWalsh - Snapshots - Model versions which combine code, files, environments, configurations, and performance metrics - Runnable Anywhere - The tool can be run on any server to enable you to move your models freely between servers and share them with colleagues Datmo CE
  • 37. @AnandSampat + What are Datmo Snapshots? Code Environment Configuration Files* Metrics
  • 38. @AnandSampat + Why are they important? Environment Configuration Metrics Datmo Snapshots Git Commits Code Files*
  • 39. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat +@TheNickWalsh GUI to View Snapshots
  • 40. @AnandSampat + How will it help? Datmo leverages containers to quickly spin up perfectly reproducible developer environments. It tracks this environment, along with model metadata inside of snapshots.
  • 41. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat +@TheNickWalsh Reliability: Datmo EE
  • 42. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat +@TheNickWalsh - Builds - Model versions with Snapshot can be built by adding validation tests that track your performance metrics - Deployment - can be pushed as microservices so you can update them on a different schedule from the rest of your main application Datmo EE (Builds and Deployment)
  • 43. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat + Deployment: Containerization
  • 44. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat +@TheNickWalsh Efficiency: Datmo EE
  • 45. @TheNickWalsh + Big Header Section Header Here are a bunch of words that will be used to describe something. I’m typing a bunch of words to fill up the box. Medium header with A lot of words Caption Subtitle @AnandSampat +@TheNickWalsh - A/B Testing — enables you to deploy a few microservices in parallel which let’s you compare algorithms - Continuous Deployment — enables you to update your builds with tests that ensure your validation metrics meet your threshold - Automation — Create triggers and actions to retrain your models with new data, update your models frequently, or ensure you are always in the know when models aren’t working. Datmo EE (A/B Testing, Continuous Deployment, Automation)
  • 46. @AnandSampat + Datmo CE + EE Make ML Ops and workflows manageable and simple, not completely abstracted away. Reduce the amount of glue code so that people can have more robust pipelines.
  • 47. @AnandSampat + 1. AI applications are growing day-by-day. These technologies require new capabilities Key Takeaways 2. Provenance, Reliability, and Efficiency are required for any production system — ML is no different 3. Datmo CE and EE provide full provenance, reliability, and efficiency through snapshots which enable builds, deployments, A/B testing and continuous delivery
  • 49. @AnandSampat + 2015 NIPS Paper from Google https://papers.nips.cc/paper/5656-hidden- technical-debt-in-machine-learning-systems.pdf
  • 50. @AnandSampat + Learn More about Us at our Blog https://blog.datmo.com/
  • 51. @AnandSampat + Check out our Product Pages https://datmo.com/enterprisehttps://datmo.com/community
  • 52. @AnandSampat + Full Slides Available at: http://bit.ly/global-ai-conf-provenance