This document summarizes a webinar on building machine learning platforms. It discusses how operating ML models is complex, requiring tasks like monitoring performance, handling data drift, and ensuring governance and security. It then outlines common components of ML platforms, including data management, model management, and code/deployment management. The webinar will demonstrate how different organizations handle these components and include demos from four companies. It will also cover Databricks' approach to providing an ML platform that integrates various tools and simplifies the full ML lifecycle from data preparation to deployment.
3. Even After Deploying, Operating ML is Complex!
Monitoring performance of the model
Data drift
Governance and security
Many ML teams spend >50% of their time maintaining existing
models
4. Why is ML Hard to Operationalize?
Dependence on data
Multiple, application-specific
ways to evaluate performance
Many teams and systems
involved
Data Prep
Training
Deployment
Raw Data
ML
ENGINEER
APPLICATIO
N
DEVELOPER
DATA
ENGINEE
R
5. Response: ML Platforms
Software platforms to manage ML applications, from development to
production
Most companies that use ML at scale are building one
Tech examples: Facebook FBLearner, Google TFX, Uber
Michelangelo
6. Common Components in an ML Platform
Data management, in development and at scoring time
▪ Data transformation, quality monitoring, data versioning
▪ Feature stores
Model management
▪ Packaging, review, quality assurance, versioning
Code and deployment management
▪ Reproducibility, deployment, monitoring, experimentation
ModelDB
7. Our Approach at Databricks
Every team’s requirements will be different, and will change over
time
Provide a general platform that is easy to integrate with diverse
tools
Open source machine
learning platform
Transactional, versioned
data lake storage
Data science & ML workspace
8. In This Webinar
How we and other organizations handle the different components of
a machine learning platform
Demos and experience from 4 different companies
9. End-to-End Data Science
and Machine Learning
on Databricks
Clemens Mewald
Director of Product Management, Databricks
10. End-to-End Data Science and ML on
AutoML
End-to-End ML Lifecycle
ML Runtime and
Environments
Batch Scoring
Online
Serving
Data Science Workspace
Prep Data Build Model Deploy/Monitor Model
Open,pluggable
architecture
11. End-to-End Data Science and ML on
AutoML
End-to-End ML Lifecycle
Batch Scoring
Online
Serving
Data Science Workspace
Prep Data Build Model Deploy/Monitor Model
Open,pluggable
architecture
ML Runtime and
Environments
14. Tracking
Record and query
experiments: code,
metrics, parameters,
artifacts, models
Models
General model
format
that standardizes
deployment options
Projects
Packaging format
for reproducible runs
on any compute
platform
Components
15. Tracking
Record and query
experiments: code,
metrics, parameters,
artifacts, models
Models
General model
format
that standardizes
deployment options
Model Registry
Centralized and
collaborative
model lifecycle
management
Projects
Packaging format
for reproducible runs
on any compute
platform
Components
18. Model Lifecycle
Staging Production Archived
Data Scientists Deployment Engineers
v1
v2
v3
Models Tracking
Flavor
2
Flavor 1
Model Registry
Custom
Models
Parameter
s
Metrics Artifacts
ModelsMetadata
19. Model Lifecycle
Staging Production Archived
Data Scientists Deployment Engineers
v1
v2
v3
Models Tracking
Flavor
2
Flavor 1
Model Registry
Custom
Models
In-Line Code
Containers
Batch & Stream
Scoring
Cloud Inference
Services
OSS Serving
Solutions
Serving
Parameter
s
Metrics Artifacts
ModelsMetadata
20. Parameters and (a time series of) metrics Artifacts (including model)
Auto-logging for ML Frameworks: A single line of code logs parameters, metrics, and artifacts.
mlflow.keras.autolog() # or: mlflow.tensorflow.autolog()
Auto-Logging
21. End-to-End Data Science and ML on
AutoML
End-to-End ML Lifecycle
Batch Scoring
Online
Serving
Data Science Workspace
Prep Data Build Model Deploy/Monitor Model
Open,pluggable
architecture
ML Runtime and
Environments
22. Enterprise Ready
Enterprise grade access
controls, identity pass-through,
and auditability
Collaborative
Realtime co-editing
and commenting
Reproducible
Auto-logged revision history
and Git integration for
version control
Visualizations
Built-in visualizations and
support for the most popular
visualization libraries
(e.g. matplotlib, ggplot)
Experiment Tracking
Built-in tracking of Data
Science and ML experiments,
with metrics, parameters,
artifacts, and more
Multi-Language
Scala, SQL, Python, R:
All in one notebook
Databricks Notebooks
Provide a collaborative environment for Unified Data Analytics
23. Databricks Notebooks for Collaborative Data
Science
Data Engineers, Data Scientists, ML Engineers, and Data Analysts can all collaborate in one shared environment
using modern collaboration patterns.
Co-Presence / Co-Editing CommentingVersioning
24. Integration with Databricks Notebooks
● Runs Sidebar integrated with MLflow
Tracking
● Track runs, sort by metrics and parameters
● Linked to revision history of the notebook
25. End-to-End Data Science and ML on
AutoML
End-to-End ML Lifecycle
Batch Scoring
Online
Serving
Data Science Workspace
Prep Data Build Model Deploy/Monitor Model
Open,pluggable
architecture
ML Runtime and
Environments
26. Your Existing Data Lake
Ingestion
Tables
Data
Catalog
Feature
Store
Azure Data
Lake Storage
Amazon S3
Streaming
Batch
3rd Party Data
Marketplace
Files
for Data Science and ML
● Schema enforced high
quality data
● Optimized
performance
● Full data lineage /
governance
● Reproducibility
through time travel
ML Runtime
27. for Data Science and ML
Ingest data and visualize data distribution
31. End-to-End Data Science and ML on
AutoML
End-to-End ML Lifecycle
Batch Scoring
Online
Serving
Data Science Workspace
Prep Data Build Model Deploy/Monitor Model
Open,pluggable
architecture
ML Runtime and
Environments
33. Packages and optimizes most common ML
Frameworks
...
Built-in Optimization for Distributed Deep
Learning
Distribute and Scale any Single-Machine
ML Code to 1,000’s of machines
Machine Learning Runtime
34. Built-In AutoML and Experiment
Tracking
Packages and optimizes most common ML
Frameworks
...
Built-in Optimization for Distributed Deep
Learning
Distribute and Scale any Single-Machine
ML Code to 1,000’s of machines
AutoML and Tracking /
Visualizations with MLflow
Machine Learning Runtime
35. Machine
Learning
Pre-configured
Environment
Customizatio
n
requirements.txt
Built-In AutoML and Experiment
Tracking
conda.yaml
Packages and optimizes most common ML
Frameworks
...
Built-in Optimization for Distributed Deep
Learning
Distribute and Scale any Single-Machine
ML Code to 1,000’s of machines
Customized Environments using
Conda
Conda-
BasedAutoML and Tracking /
Visualizations with MLflow
Machine Learning Runtime
37. End-to-End Data Science and ML on
AutoML
End-to-End ML Lifecycle
Batch Scoring
Online
Serving
Data Science Workspace
Prep Data Build Model Deploy/Monitor Model
Open,pluggable
architecture
ML Runtime and
Environments
38. Model Deployment
Staging Production Archived
Data Scientists Deployment Engineers
v1
v2
v3
Model Registry
In-Line Code
Containers
Batch & Stream
Scoring
Cloud Inference
Services
OSS Serving
Solutions
Serving
40. In summary, Databri cks accelerates the full ML Lifecycle
AutoML
End-to-End ML Lifecycle
Batch Scoring
Online
Serving
Data Science Workspace
Prep Data Build Model Deploy/Monitor Model
Open,pluggable
architecture
ML Runtime and
Environments