SlideShare uma empresa Scribd logo
1 de 51
Model Serving
27 May 2021
Andre Mesarovic
Sr. Resident Solutions Architect at
Databricks
Agenda
• MLflow model serving
• How to score models with MLflow
○ Offline scoring with Spark
○ Online scoring with MLflow model server
• Custom model deployment and scoring
• Databricks model server
Prediction overview
Offline Prediction
● High throughput
● Bulk predictions
● Predict with Spark/Databricks
● Batch Prediction
● Structured Streaming
Prediction
Online Prediction
● Low latency
● Individual predictions
● Real-time model serving
● MLflow scoring server
● MLflow deployment plugin
● Serving outside MLflow
● Databricks model serving
Vocabulary - Synonyms
Model Serving
● Model serving
● Model scoring
● Model prediction
● Model inference
● Model deployment
Prediction Types
● Offline == batch prediction
○ Spark batch prediction
○ Spark streaming prediction
● Online == real-time prediction
MLflow Model Serving Ecosystem
Offline Scoring
• Spark is ideally suited for offline scoring
• Distributed scoring with Spark
• Can use either Spark batch or structured streaming
• Use MLflow UDF (Python or SQL) to parallelize scoring for
single-node frameworks such as scikit-learn
• Create UDF model using Pyfunc flavor
• Load model from MLflow Model Registry
MLflow Offline Scoring Example
Score with native flavor
model = mlflow.sklearn.load_model(model_uri)
predictions = model.predict(data)
Score with Python UDF
model_uri = "models:/sklearn-wine/production"
Model URI
model = mlflow.pyfunc.load_model(model_uri)
predictions = model.predict(data)
Score with Pyfunc flavor
udf = mlflow.pyfunc.spark_udf(spark, model_uri)
predictions = data.withColumn("prediction", udf(*data.columns))
Score with SQL UDF
udf = mlflow.pyfunc.spark_udf(spark, model_uri)
spark.udf.register("predict", udf)
%sql select *, predict(*) as prediction from my_data
Online Scoring
• Score models outside of Spark
• Low-latency scoring one record at a time with immediate
feedback
• Variety of real-time deployment options:
○ REST API - web servers - self-managed or as containers in cloud providers
○ Embedded in browsers, .e.g TensorFlow Lite
○ Edge devices: mobile phones, sensors, routers, etc.
○ Hardware accelerators - GPU, NVIDIA, Intel
Options to serve real-time models
• Embed model in application code
• Model deployed as a service
• Model published as data
• Martin Fowler’s Continuous Delivery for Machine Learning
MLflow Model Server
• Cornerstone of different MLflow deployment targets
• Web server that exposes a standard REST API:
○ Input: CSV, JSON (pandas-split or pandas-records formats) or Tensor
(JSON)
○ Output: JSON list of predictions
• Implementation: Python Flask or Java Jetty server (only
MLeap)
• MLflow CLI builds a server with embedded model
MLflow Model Server deployment options
• Local web server
• SageMaker docker container
• Azure ML docker container
• Generic docker container
• Custom deployment plugin targets
○ TorchServe, RedisAi, Ray or Algorithmia
MLflow Model Server container types
Python Server
Model and its ML
framework libraries
embedded in Flask
server. Only for non-
Spark ML models.
Python Server + Spark
Flask server delegates
scoring to Spark
server. Only for Spark
ML models.
Java MLeap Server
Model and MLeap
runtime are embedded
in Jetty server. Only
for Spark ML models.
No Spark runtime.
MLflow Model Server container types
Launch Local MLflow Model Server
mlflow models serve --model-uri models:/sklearn-iris/production --port 5001
Launch server
Launch local server
Score with CSV format
curl http://localhost:5001/invocations 
-H "Content-Type:text/csv" 
-d '
sepal_length,sepal_width,petal_length,petal_width
5.1,3.5,1.4,0.2
4.9,3.0,1.4,0.2'
Launch server
Request
[0, 0]
Launch server
Response
Score with JSON split-oriented format
curl http://localhost:5001/invocations 
-H "Content-Type:application/json" 
-d ' {
"columns": ["sepal_length", "sepal_width", "petal_length", "petal_width"],
"data": [
[ 5.1, 3.5, 1.4, 0.2 ],
[ 4.9, 3.0, 1.4, 0.2 ] ] }'
Launch server
Request
[0, 0]
Launch server
Response
Score with JSON record-oriented format
curl http://localhost:5001/invocations 
-H "Content-Type:application/json; format=pandas-records" 
-d '[
{ "sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2 },
{ "sepal_length": 4.9, "sepal_width": 3.0, "petal_length": 1.4, "petal_width": 0.2 } ]'
Launch server
Request
[0, 0]
Launch server
Response
Score with JSON Tensor
● Tensor Input Now Supported in MLflow - Databricks blog
- 2021-03-18
● Request will be cast to Numpy arrays. This format is
specified using a Content-Type request header value of
application/json and the instances or inputs key in the
request body dictionary
● Content-type is application/json - 2 request formats will
be inferred
● See Deploy MLflow models - MLflow doc
● TFX Serving request format - TensorFlow doc
Score with JSON Tensor - numpy/tensor
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{
"instances": [
{"a": "s1", "b": 1, "c": [1, 2, 3]},
{"a": "s2", "b": 2, "c": [4, 5, 6]},
{"a": "s3", "b": 3, "c": [7, 8, 9]}
]
}'
Launch server
TensorFlow serving "instances" format
curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d
'{
"inputs": {
"a": ["s1", "s2", "s3"],
"b": [1, 2, 3]
}'
Launch server
TensorFlow serving "inputs" format
MLflow SageMaker and Azure ML containers
• Two types of containers are deployed to cloud providers
○ Python container - Flask web server process with embedded model
○ SparkML container - Flask web server process and Spark process
• SageMaker container
○ Most versatile container type
○ Can run in local mode on laptop as regular docker container
Python container
• Flask web server
• Flask server runs behind gunicorn to allow concurrent
requests
• Serialized model (e.g. sklearn or Keras/TF) is embedded inside
web server and wrapped by Pyfunc
• Server leverages Pyfunc flavor to make generic predictions
SparkML container
• Two processes:
○ Flask web server
○ Spark server - OSS Spark - no Databricks
• Web server delegates scoring to Spark server
MLflow SageMaker container deployment
• CLI:
○ mlflow sagemaker build-and-push-container
○ mlflow sagemaker deploy
○ mlflow sagemaker run-local
• API: mlflow.sagemaker API
• Deploy a model on Amazon SageMaker
mlflow sagemaker build-and-push-container --build --container my-container
mlflow sagemaker deploy --app-name wine-quality 
--model-uri models:/wine-quality/production 
--image-url my-image-url 
--region us-west-2 --mode replace
Launch server
Launch container on SageMaker
Launch container on SageMaker
mlflow sagemaker build-and-push-container --build --no-push 
--container my-container
mlflow sagemaker run-local 
--model-uri models:/wine-quality/production 
--port 5051 --image my-container
Launch server
Launch local container
Deploy MLflow SageMaker container
MLflow AzureML container deployment
• Deploy to Azure Kubernetes Service (AKS) or Azure Container
Instances (ACI)
• CLI - mlflow azureml build-image
• API - mlflow.azureml.build-image
• Deploy a python_function model on Microsoft Azure ML
model_image, azure_model = mlflow.azureml.build_image(
model_uri="models:/sklearn_wine",
workspace="my_workspace",
model_name="sklearn_wine",
image_name="sklearn_wine_image",
description="Sklearn Wine Quality",
synchronous=True)
Deploy MLflow Azure ML container
MLeap
• Non-Spark serialization format for Spark models
• Advantage is ability to score Spark model without overhead of
Spark
• Faster than scoring Spark ML models with Spark
• Problem is stability, maturity and lack of dedicated
commercial support
End-to-end ML Pipeline Example with MLflow
See mlflow-examples - e2e-ml-pipeline
MLflow Deployment Plugins
• MLflow Deployment Plugins - Deploy model to custom serving
platform
• Current deployment plugins:
○ mlflow-torchserve
○ mlflow-redisai
○ mlflow-algorithmia
○ mlflow-ray-serve
MLflow Deployment Plugin Examples
Ray
mlflow deployments create -t torchserve ---name DEPLOYMENT_NAME -m <model uri> 
-C 'MODEL_FILE=<model file path>' -C 'HANDLER=<handler file path>'
TorchServe
mlflow deployments create -t redisai --name <rediskey> -m <model uri> 
-C <config option>
RedisAI
mlflow deployments create -t ray-serve --name <deployment name> -m <model uri> 
-C num_replicas=<number of replicas>
Algorithmia
mlflow deployments create -t algorithmia --name mlflow_sklearn_demo 
-m mlruns/0/<run-id>/artifacts/model
• MLflow PyTorch integration released in MLflow 1.12.0 (Nov.
2020)
• Integrates with PyTorch Lightning, TorchScript and
TorchServe
• Features:
○ Autologging with PyTorch Lightning
○ Translate to TorchScript model format for Python-free optimized scoring
○ Deploy MLflow models to TorchServe
MLflow + PyTorch = 🧡
MLflow + PyTorch = 🧡
MLflow TorchServe Resources
• PyTorch and MLflow Integration Announcement - 2020-11-12
• MLflow 1.12 Features Extended PyTorch Integration - 2012-11-
13
• MLflow and PyTorch — Where Cutting Edge AI meets MLOps -
medium - pytorch - 2020-11-12
• https://github.com/mlflow/mlflow-torchserve
MLflow RedisAI Resources
• https://github.com/RedisAI/mlflow-redisai
• All you need is PyTorch, MLflow, RedisAI and a cup of mocha
latte - medium - 2020-08-28
• Taking Deep Learning to Production with MLflow & RedisAI -
video - 30:16 - Sherin Thomas (RedisAI) - 2020-08-30
• PyPI mlflow-redisai
MLflow Ray Resources
• github.com/ray-project/mlflow-ray-serve
○ mlflow_example.py - test_mlflow_plugin.py
• Ray & MLflow: Taking Distributed Machine Learning
Applications to Production blog
○ Databricks blog - 2021-02-03
○ Ray blog - 2021-01-13
• Using MLflow with Tune - Ray docs
• PyPI mlflow-ray-serve
MLflow Algorithmia Resources
• MLFlow Integrations - Development Center - Algorithmia
• https://github.com/algorithmiaio/mlflow-algorithmia
• Algorithmia and MLflow: Integrating open-source tooling with
enterprise MLOps - Algorithmia blog - 2021-04-15
• pip install mlflow-algorithmia
• MLflow Integration from Azure ML and Algorithmia - video - 1:03:12 -
DAIS-2021 -2021-04-22
• Algorithmia and MLflow: Integrating open-source tooling with
enterprise MLOps - video - 11:19 - 2021-04-01
MLflow and TensorFlow Serving Custom
Deployment
• Example how to deploy models to a custom deployment target
• MLflow deployment plugin has not yet been developed
• Deployment builds a standard TensorFlow Serving docker
container with a MLflow model from Model Registry
• https://github.com/amesar/mlflow-
tools/tree/master/mlflow_tools/tensorflow_serving
Keras/TensorFlow Model Formats
• SavedModel format
○ TensorFlow standard format is default in Keras TF 2.x
○ Supported by TensorFlow Serving
○ As of MLflow 1.15.0 SavedModel is the default MLflow
format
• HD5 format
○ Default in Keras TF 1.x but is legacy in Keras TF 2.x
and
Serving
Serving
Current Legacy pre-MLflow 1.15.0
TensorFlow Serving Example
docker run -t --rm --publish 8501 
--volume /opt/mlflow/mlruns/1/f48dafd70be044298f71488b0ae10df4/artifacts/tensorflow-
model:/models/keras_wine
--env MODEL_NAME=keras_wine 
tensorflow/serving
Launch server
Launch server in Docker container
curl -d '{"instances": [12.8, 0.03, 0.48, 0.98, 6.2, 29, 1.2, 0.4, 75 ] }' 
-X POST http://localhost:8501/v1/models/keras_wine:predict
Launch server
Request
{ "predictions": [[-0.70597136]] }
Launch server
Response
MLflow Keras/TensorFlow Run Models Example
MLflow Flavors - Managed Models
● Model flavors
○ tensorflow-model - standard SavedModel format
(saved_model.pb)
○ onnx-model
● Can be served as Pyfunc models
MLflow Unmanaged Models
● Models
○ tensorflow-model - Standard TF SavedModel format
○ tensorflow-lite-model
○ tensorflow.js model
● Load as raw artifacts and then serve manually
● Sample code: wine_train.py and wine_predict.py
ONNX - Open Neural Network Exchange
• Interoperable model format supported by:
○ MSFT, Facebook, AWS, Nvidia, Intel, etc.
• Train once, deploy anywhere (in theory) - depends on the
robustness of ML framework conversion
• Save MLflow model as ONNX flavor
• Deploy ONNX model in standard MLflow scoring server
• ONNX runtime is embedded in MLflow Python model server
Model Serving on Databricks
Model Serving
Turnkey serving
solution to expose
models as REST
endpoints
Reports
Applications
...
REST
Endpoint
Tracking
Record and query
experiments: code,
metrics, parameters,
artifacts, models
Models
General format
that standardizes
deployment options
Logged
Model
Model Registry
Centralized and
collaborative
model lifecycle
management
Staging Production Archived
Data Scientists Deployment Engineers
Current Databricks Model Serving - Public Preview
• HTTP endpoint publicly exposed using Databricks
authentication
• One node cluster is automatically provisioned
• Can select instance type
• Limited production capability - for light loads and testing
• Hosts all active versions of a model
• Can score from UI or via HTTP (curl, Python Requests)
Databricks Production-grade Model Serving
• Production-grade serving is on the product roadmap
• Spin up a multi-node cluster
• Model monitoring, model drift detection and metrics
• Log prediction requests and responses
• Low latency, high availability and scalable
• GPU support
Databricks Model Serving Launch
Databricks Model Scoring
Score with curl
curl 
https:/my_workspace.acme.com/my_model/Production/invocations 
-H "Content-Type:application/json" 
-d ' {
"columns": ["sepal_length", "sepal_width", "petal_length", "petal_width"],
"data": [
[ 5.1, 3.5, 1.4, 0.2 ],
[ 4.9, 3.0, 1.4, 0.2 ] ] }'
Launch server
Request
[0, 0]
Launch server
Response
Databricks Model Serving Resources
• Blog posts
○ Quickly Deploy, Test, and Manage ML Models as REST Endpoints with
MLflow Model Serving on Databricks - 2020-11-02
○ Announcing MLflow Model Serving on Databricks - 2020-06-25
• Documentation
○ MLflow Model Serving on Databricks
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.
Thank You
Happy model serving!

Mais conteúdo relacionado

Mais procurados

Going from three nines to four nines using Kafka | Tejas Chopra, Netflix
Going from three nines to four nines using Kafka | Tejas Chopra, NetflixGoing from three nines to four nines using Kafka | Tejas Chopra, Netflix
Going from three nines to four nines using Kafka | Tejas Chopra, NetflixHostedbyConfluent
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code GenerationDatabricks
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceDatabricks
 
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021StreamNative
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
 
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycleKyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycleLviv Startup Club
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksSarah Dutkiewicz
 
Apache Spark Acceleration Using Hardware Resources in the Cloud, Seamlessl wi...
Apache Spark Acceleration Using Hardware Resources in the Cloud, Seamlessl wi...Apache Spark Acceleration Using Hardware Resources in the Cloud, Seamlessl wi...
Apache Spark Acceleration Using Hardware Resources in the Cloud, Seamlessl wi...Databricks
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...HostedbyConfluent
 
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching ThemKey Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching ThemYahoo Developer Network
 
Drill architecture 20120913
Drill architecture 20120913Drill architecture 20120913
Drill architecture 20120913jasonfrantz
 
Whirlpools in the Stream with Jayesh Lalwani
 Whirlpools in the Stream with Jayesh Lalwani Whirlpools in the Stream with Jayesh Lalwani
Whirlpools in the Stream with Jayesh LalwaniDatabricks
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKData Con LA
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streamingdatamantra
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineDataWorks Summit
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceChin Huang
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computingTimothy Spann
 
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...DevOps_Fest
 

Mais procurados (20)

Going from three nines to four nines using Kafka | Tejas Chopra, Netflix
Going from three nines to four nines using Kafka | Tejas Chopra, NetflixGoing from three nines to four nines using Kafka | Tejas Chopra, Netflix
Going from three nines to four nines using Kafka | Tejas Chopra, Netflix
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML Inference
 
Graph Databases at Netflix
Graph Databases at NetflixGraph Databases at Netflix
Graph Databases at Netflix
 
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycleKyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
Kyryl Truskovskyi: Kubeflow for end2end machine learning lifecycle
 
Predicting Flights with Azure Databricks
Predicting Flights with Azure DatabricksPredicting Flights with Azure Databricks
Predicting Flights with Azure Databricks
 
Apache Spark Acceleration Using Hardware Resources in the Cloud, Seamlessl wi...
Apache Spark Acceleration Using Hardware Resources in the Cloud, Seamlessl wi...Apache Spark Acceleration Using Hardware Resources in the Cloud, Seamlessl wi...
Apache Spark Acceleration Using Hardware Resources in the Cloud, Seamlessl wi...
 
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
Low-latency data applications with Kafka and Agg indexes | Tino Tereshko, Fir...
 
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching ThemKey Challenges in Cloud Computing and How Yahoo! is Approaching Them
Key Challenges in Cloud Computing and How Yahoo! is Approaching Them
 
Drill architecture 20120913
Drill architecture 20120913Drill architecture 20120913
Drill architecture 20120913
 
Whirlpools in the Stream with Jayesh Lalwani
 Whirlpools in the Stream with Jayesh Lalwani Whirlpools in the Stream with Jayesh Lalwani
Whirlpools in the Stream with Jayesh Lalwani
 
Real-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNKReal-time Streaming Pipelines with FLaNK
Real-time Streaming Pipelines with FLaNK
 
Presto
PrestoPresto
Presto
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
On-boarding with JanusGraph Performance
On-boarding with JanusGraph PerformanceOn-boarding with JanusGraph Performance
On-boarding with JanusGraph Performance
 
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
 
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...
 

Semelhante a MLflow Model Serving - DAIS 2021

MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model ServingDatabricks
 
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model ServingDAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model Servingamesar0
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with DatabricksLiangjun Jiang
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricksLiangjun Jiang
 
Productionizing Real-time Serving With MLflow
Productionizing Real-time Serving With MLflowProductionizing Real-time Serving With MLflow
Productionizing Real-time Serving With MLflowDatabricks
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
 
Cnam azure ze cloud resource manager
Cnam azure ze cloud  resource managerCnam azure ze cloud  resource manager
Cnam azure ze cloud resource managerAymeric Weinbach
 
Microservices Patterns and Anti-Patterns
Microservices Patterns and Anti-PatternsMicroservices Patterns and Anti-Patterns
Microservices Patterns and Anti-PatternsCorneil du Plessis
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkAlex Zeltov
 
Azure Resource Manager templates: Improve deployment time and reusability
Azure Resource Manager templates: Improve deployment time and reusabilityAzure Resource Manager templates: Improve deployment time and reusability
Azure Resource Manager templates: Improve deployment time and reusabilityStephane Lapointe
 
Managing the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOpsManaging the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOpsFatih Baltacı
 
Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Uri Cohen
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APIshareddatamsft
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityWes McKinney
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesDatabricks
 
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Emerson Eduardo Rodrigues Von Staffen
 
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...Amazon Web Services
 
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAdvanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAnimesh Singh
 

Semelhante a MLflow Model Serving - DAIS 2021 (20)

MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
 
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model ServingDAIS Europe Nov. 2020 presentation on MLflow Model Serving
DAIS Europe Nov. 2020 presentation on MLflow Model Serving
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 
Mlflow with databricks
Mlflow with databricksMlflow with databricks
Mlflow with databricks
 
Productionizing Real-time Serving With MLflow
Productionizing Real-time Serving With MLflowProductionizing Real-time Serving With MLflow
Productionizing Real-time Serving With MLflow
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
Cnam azure ze cloud resource manager
Cnam azure ze cloud  resource managerCnam azure ze cloud  resource manager
Cnam azure ze cloud resource manager
 
Microservices Patterns and Anti-Patterns
Microservices Patterns and Anti-PatternsMicroservices Patterns and Anti-Patterns
Microservices Patterns and Anti-Patterns
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
 
Ml2
Ml2Ml2
Ml2
 
Azure Resource Manager templates: Improve deployment time and reusability
Azure Resource Manager templates: Improve deployment time and reusabilityAzure Resource Manager templates: Improve deployment time and reusability
Azure Resource Manager templates: Improve deployment time and reusability
 
Managing the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOpsManaging the Machine Learning Lifecycle with MLOps
Managing the Machine Learning Lifecycle with MLOps
 
Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
 
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
Devops continuousintegration and deployment onaws puttingmoneybackintoyourmis...
 
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
DevOps, Continuous Integration and Deployment on AWS: Putting Money Back into...
 
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and IstioAdvanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
Advanced Model Inferencing leveraging Kubeflow Serving, KNative and Istio
 

Último

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

MLflow Model Serving - DAIS 2021

  • 1. Model Serving 27 May 2021 Andre Mesarovic Sr. Resident Solutions Architect at Databricks
  • 2. Agenda • MLflow model serving • How to score models with MLflow ○ Offline scoring with Spark ○ Online scoring with MLflow model server • Custom model deployment and scoring • Databricks model server
  • 3. Prediction overview Offline Prediction ● High throughput ● Bulk predictions ● Predict with Spark/Databricks ● Batch Prediction ● Structured Streaming Prediction Online Prediction ● Low latency ● Individual predictions ● Real-time model serving ● MLflow scoring server ● MLflow deployment plugin ● Serving outside MLflow ● Databricks model serving
  • 4. Vocabulary - Synonyms Model Serving ● Model serving ● Model scoring ● Model prediction ● Model inference ● Model deployment Prediction Types ● Offline == batch prediction ○ Spark batch prediction ○ Spark streaming prediction ● Online == real-time prediction
  • 6. Offline Scoring • Spark is ideally suited for offline scoring • Distributed scoring with Spark • Can use either Spark batch or structured streaming • Use MLflow UDF (Python or SQL) to parallelize scoring for single-node frameworks such as scikit-learn • Create UDF model using Pyfunc flavor • Load model from MLflow Model Registry
  • 7. MLflow Offline Scoring Example Score with native flavor model = mlflow.sklearn.load_model(model_uri) predictions = model.predict(data) Score with Python UDF model_uri = "models:/sklearn-wine/production" Model URI model = mlflow.pyfunc.load_model(model_uri) predictions = model.predict(data) Score with Pyfunc flavor udf = mlflow.pyfunc.spark_udf(spark, model_uri) predictions = data.withColumn("prediction", udf(*data.columns)) Score with SQL UDF udf = mlflow.pyfunc.spark_udf(spark, model_uri) spark.udf.register("predict", udf) %sql select *, predict(*) as prediction from my_data
  • 8. Online Scoring • Score models outside of Spark • Low-latency scoring one record at a time with immediate feedback • Variety of real-time deployment options: ○ REST API - web servers - self-managed or as containers in cloud providers ○ Embedded in browsers, .e.g TensorFlow Lite ○ Edge devices: mobile phones, sensors, routers, etc. ○ Hardware accelerators - GPU, NVIDIA, Intel
  • 9. Options to serve real-time models • Embed model in application code • Model deployed as a service • Model published as data • Martin Fowler’s Continuous Delivery for Machine Learning
  • 10. MLflow Model Server • Cornerstone of different MLflow deployment targets • Web server that exposes a standard REST API: ○ Input: CSV, JSON (pandas-split or pandas-records formats) or Tensor (JSON) ○ Output: JSON list of predictions • Implementation: Python Flask or Java Jetty server (only MLeap) • MLflow CLI builds a server with embedded model
  • 11. MLflow Model Server deployment options • Local web server • SageMaker docker container • Azure ML docker container • Generic docker container • Custom deployment plugin targets ○ TorchServe, RedisAi, Ray or Algorithmia
  • 12. MLflow Model Server container types Python Server Model and its ML framework libraries embedded in Flask server. Only for non- Spark ML models. Python Server + Spark Flask server delegates scoring to Spark server. Only for Spark ML models. Java MLeap Server Model and MLeap runtime are embedded in Jetty server. Only for Spark ML models. No Spark runtime.
  • 13. MLflow Model Server container types
  • 14. Launch Local MLflow Model Server mlflow models serve --model-uri models:/sklearn-iris/production --port 5001 Launch server Launch local server
  • 15. Score with CSV format curl http://localhost:5001/invocations -H "Content-Type:text/csv" -d ' sepal_length,sepal_width,petal_length,petal_width 5.1,3.5,1.4,0.2 4.9,3.0,1.4,0.2' Launch server Request [0, 0] Launch server Response
  • 16. Score with JSON split-oriented format curl http://localhost:5001/invocations -H "Content-Type:application/json" -d ' { "columns": ["sepal_length", "sepal_width", "petal_length", "petal_width"], "data": [ [ 5.1, 3.5, 1.4, 0.2 ], [ 4.9, 3.0, 1.4, 0.2 ] ] }' Launch server Request [0, 0] Launch server Response
  • 17. Score with JSON record-oriented format curl http://localhost:5001/invocations -H "Content-Type:application/json; format=pandas-records" -d '[ { "sepal_length": 5.1, "sepal_width": 3.5, "petal_length": 1.4, "petal_width": 0.2 }, { "sepal_length": 4.9, "sepal_width": 3.0, "petal_length": 1.4, "petal_width": 0.2 } ]' Launch server Request [0, 0] Launch server Response
  • 18. Score with JSON Tensor ● Tensor Input Now Supported in MLflow - Databricks blog - 2021-03-18 ● Request will be cast to Numpy arrays. This format is specified using a Content-Type request header value of application/json and the instances or inputs key in the request body dictionary ● Content-type is application/json - 2 request formats will be inferred ● See Deploy MLflow models - MLflow doc ● TFX Serving request format - TensorFlow doc
  • 19. Score with JSON Tensor - numpy/tensor curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{ "instances": [ {"a": "s1", "b": 1, "c": [1, 2, 3]}, {"a": "s2", "b": 2, "c": [4, 5, 6]}, {"a": "s3", "b": 3, "c": [7, 8, 9]} ] }' Launch server TensorFlow serving "instances" format curl http://127.0.0.1:5000/invocations -H 'Content-Type: application/json' -d '{ "inputs": { "a": ["s1", "s2", "s3"], "b": [1, 2, 3] }' Launch server TensorFlow serving "inputs" format
  • 20. MLflow SageMaker and Azure ML containers • Two types of containers are deployed to cloud providers ○ Python container - Flask web server process with embedded model ○ SparkML container - Flask web server process and Spark process • SageMaker container ○ Most versatile container type ○ Can run in local mode on laptop as regular docker container
  • 21. Python container • Flask web server • Flask server runs behind gunicorn to allow concurrent requests • Serialized model (e.g. sklearn or Keras/TF) is embedded inside web server and wrapped by Pyfunc • Server leverages Pyfunc flavor to make generic predictions
  • 22. SparkML container • Two processes: ○ Flask web server ○ Spark server - OSS Spark - no Databricks • Web server delegates scoring to Spark server
  • 23. MLflow SageMaker container deployment • CLI: ○ mlflow sagemaker build-and-push-container ○ mlflow sagemaker deploy ○ mlflow sagemaker run-local • API: mlflow.sagemaker API • Deploy a model on Amazon SageMaker
  • 24. mlflow sagemaker build-and-push-container --build --container my-container mlflow sagemaker deploy --app-name wine-quality --model-uri models:/wine-quality/production --image-url my-image-url --region us-west-2 --mode replace Launch server Launch container on SageMaker Launch container on SageMaker mlflow sagemaker build-and-push-container --build --no-push --container my-container mlflow sagemaker run-local --model-uri models:/wine-quality/production --port 5051 --image my-container Launch server Launch local container Deploy MLflow SageMaker container
  • 25. MLflow AzureML container deployment • Deploy to Azure Kubernetes Service (AKS) or Azure Container Instances (ACI) • CLI - mlflow azureml build-image • API - mlflow.azureml.build-image • Deploy a python_function model on Microsoft Azure ML
  • 26. model_image, azure_model = mlflow.azureml.build_image( model_uri="models:/sklearn_wine", workspace="my_workspace", model_name="sklearn_wine", image_name="sklearn_wine_image", description="Sklearn Wine Quality", synchronous=True) Deploy MLflow Azure ML container
  • 27. MLeap • Non-Spark serialization format for Spark models • Advantage is ability to score Spark model without overhead of Spark • Faster than scoring Spark ML models with Spark • Problem is stability, maturity and lack of dedicated commercial support
  • 28. End-to-end ML Pipeline Example with MLflow See mlflow-examples - e2e-ml-pipeline
  • 29. MLflow Deployment Plugins • MLflow Deployment Plugins - Deploy model to custom serving platform • Current deployment plugins: ○ mlflow-torchserve ○ mlflow-redisai ○ mlflow-algorithmia ○ mlflow-ray-serve
  • 30. MLflow Deployment Plugin Examples Ray mlflow deployments create -t torchserve ---name DEPLOYMENT_NAME -m <model uri> -C 'MODEL_FILE=<model file path>' -C 'HANDLER=<handler file path>' TorchServe mlflow deployments create -t redisai --name <rediskey> -m <model uri> -C <config option> RedisAI mlflow deployments create -t ray-serve --name <deployment name> -m <model uri> -C num_replicas=<number of replicas> Algorithmia mlflow deployments create -t algorithmia --name mlflow_sklearn_demo -m mlruns/0/<run-id>/artifacts/model
  • 31. • MLflow PyTorch integration released in MLflow 1.12.0 (Nov. 2020) • Integrates with PyTorch Lightning, TorchScript and TorchServe • Features: ○ Autologging with PyTorch Lightning ○ Translate to TorchScript model format for Python-free optimized scoring ○ Deploy MLflow models to TorchServe MLflow + PyTorch = 🧡
  • 33. MLflow TorchServe Resources • PyTorch and MLflow Integration Announcement - 2020-11-12 • MLflow 1.12 Features Extended PyTorch Integration - 2012-11- 13 • MLflow and PyTorch — Where Cutting Edge AI meets MLOps - medium - pytorch - 2020-11-12 • https://github.com/mlflow/mlflow-torchserve
  • 34. MLflow RedisAI Resources • https://github.com/RedisAI/mlflow-redisai • All you need is PyTorch, MLflow, RedisAI and a cup of mocha latte - medium - 2020-08-28 • Taking Deep Learning to Production with MLflow & RedisAI - video - 30:16 - Sherin Thomas (RedisAI) - 2020-08-30 • PyPI mlflow-redisai
  • 35. MLflow Ray Resources • github.com/ray-project/mlflow-ray-serve ○ mlflow_example.py - test_mlflow_plugin.py • Ray & MLflow: Taking Distributed Machine Learning Applications to Production blog ○ Databricks blog - 2021-02-03 ○ Ray blog - 2021-01-13 • Using MLflow with Tune - Ray docs • PyPI mlflow-ray-serve
  • 36. MLflow Algorithmia Resources • MLFlow Integrations - Development Center - Algorithmia • https://github.com/algorithmiaio/mlflow-algorithmia • Algorithmia and MLflow: Integrating open-source tooling with enterprise MLOps - Algorithmia blog - 2021-04-15 • pip install mlflow-algorithmia • MLflow Integration from Azure ML and Algorithmia - video - 1:03:12 - DAIS-2021 -2021-04-22 • Algorithmia and MLflow: Integrating open-source tooling with enterprise MLOps - video - 11:19 - 2021-04-01
  • 37. MLflow and TensorFlow Serving Custom Deployment • Example how to deploy models to a custom deployment target • MLflow deployment plugin has not yet been developed • Deployment builds a standard TensorFlow Serving docker container with a MLflow model from Model Registry • https://github.com/amesar/mlflow- tools/tree/master/mlflow_tools/tensorflow_serving
  • 38. Keras/TensorFlow Model Formats • SavedModel format ○ TensorFlow standard format is default in Keras TF 2.x ○ Supported by TensorFlow Serving ○ As of MLflow 1.15.0 SavedModel is the default MLflow format • HD5 format ○ Default in Keras TF 1.x but is legacy in Keras TF 2.x
  • 40. TensorFlow Serving Example docker run -t --rm --publish 8501 --volume /opt/mlflow/mlruns/1/f48dafd70be044298f71488b0ae10df4/artifacts/tensorflow- model:/models/keras_wine --env MODEL_NAME=keras_wine tensorflow/serving Launch server Launch server in Docker container curl -d '{"instances": [12.8, 0.03, 0.48, 0.98, 6.2, 29, 1.2, 0.4, 75 ] }' -X POST http://localhost:8501/v1/models/keras_wine:predict Launch server Request { "predictions": [[-0.70597136]] } Launch server Response
  • 41. MLflow Keras/TensorFlow Run Models Example MLflow Flavors - Managed Models ● Model flavors ○ tensorflow-model - standard SavedModel format (saved_model.pb) ○ onnx-model ● Can be served as Pyfunc models MLflow Unmanaged Models ● Models ○ tensorflow-model - Standard TF SavedModel format ○ tensorflow-lite-model ○ tensorflow.js model ● Load as raw artifacts and then serve manually ● Sample code: wine_train.py and wine_predict.py
  • 42. ONNX - Open Neural Network Exchange • Interoperable model format supported by: ○ MSFT, Facebook, AWS, Nvidia, Intel, etc. • Train once, deploy anywhere (in theory) - depends on the robustness of ML framework conversion • Save MLflow model as ONNX flavor • Deploy ONNX model in standard MLflow scoring server • ONNX runtime is embedded in MLflow Python model server
  • 43. Model Serving on Databricks Model Serving Turnkey serving solution to expose models as REST endpoints Reports Applications ... REST Endpoint Tracking Record and query experiments: code, metrics, parameters, artifacts, models Models General format that standardizes deployment options Logged Model Model Registry Centralized and collaborative model lifecycle management Staging Production Archived Data Scientists Deployment Engineers
  • 44. Current Databricks Model Serving - Public Preview • HTTP endpoint publicly exposed using Databricks authentication • One node cluster is automatically provisioned • Can select instance type • Limited production capability - for light loads and testing • Hosts all active versions of a model • Can score from UI or via HTTP (curl, Python Requests)
  • 45. Databricks Production-grade Model Serving • Production-grade serving is on the product roadmap • Spin up a multi-node cluster • Model monitoring, model drift detection and metrics • Log prediction requests and responses • Low latency, high availability and scalable • GPU support
  • 48. Score with curl curl https:/my_workspace.acme.com/my_model/Production/invocations -H "Content-Type:application/json" -d ' { "columns": ["sepal_length", "sepal_width", "petal_length", "petal_width"], "data": [ [ 5.1, 3.5, 1.4, 0.2 ], [ 4.9, 3.0, 1.4, 0.2 ] ] }' Launch server Request [0, 0] Launch server Response
  • 49. Databricks Model Serving Resources • Blog posts ○ Quickly Deploy, Test, and Manage ML Models as REST Endpoints with MLflow Model Serving on Databricks - 2020-11-02 ○ Announcing MLflow Model Serving on Databricks - 2020-06-25 • Documentation ○ MLflow Model Serving on Databricks
  • 50. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.