Training and deploying ML models with Google Cloud Platform

Training and deploying ML models with
Google Cloud Platform
Maciej Pieńkosz
Data Science Summit 2020 1

ML in Sotrender
• Sotrender - a platform for analyzing comunication on your
Social Media
• Our models:
– Sentiment
– Hatespeech
– Topic modelling
– Keyphrase extractor
– NER (brands and products)
– Image Tagger
– Logo Detector
– ….
2

ML models lifecycle
3
https://www.jeremyjordan.me/ml-projects-guide/

Modeling with AI Notebooks
1. AI Platform Notebooks is used for initial data exploration and modeling
2. Allows to quickly start working on new problem without worrying about infrostructure
3. For the start, we favor faster, simpler model architectures that can be easily built,
validated, iterated and eventually deployed (usually on CPU)
4. Experiment tracking: MlFlow
4
https://databricks.com/blog/2018/06/05/introducing-mlflow-an-open-source-machine-learning-platform.html
https://cloud.google.com/ai-platform-notebooks?hl=id

Structuring training code
• Notebooks disadvantages:
– You pay for the whole time the notebook is running
– Code quality is usually lower
– Hard to parametrize, unit test, and review
• After initial experimentation phase, we try to give more structure to
the model training code:
– Refactor codebase to Python packages and modules and move
to git repository (Gitlab)
– Add tests (more on it later)
– Wrap code into a Docker container
– Use dedicated AI Platform Training service to train in the cloud
5
https://www.jeremyjordan.me/ml-projects-guide/

AI Platform Training with custom containers
6
• Advantages:
– Develop locally, train in the cloud
– We are billed only for the actual training
time
– Broad hardware conﬁguration options
(e.g. GPU type)
– Job statuses and logs for historical runs
are available in the dashboard
– Support for hyperparameter tuning
almost out-of-the box
Training job dockerﬁle
Cloud training script

Google Storage for Models and Datasets
• We use Google Storage as primary Store for models
and datasets
• One bucket per model
• We follow strict, unified bucket, directory and file
structure, same for every model
– Raw data
– Combined datasets, with predefined splits
– Model files
• Documentation in Knowledge Base (Confluence)
• One can use dedicated systems like DVC, Quilt
7

Model deployment
• Your options:
– Online
– Batch (ofﬂine)
• Our approach is to deploy models as services
– Easy to integrate
– Easy to use by other teams
• We serve them as REST service with Flask (or, most
recently, FastApi)
• We wrap them in Docker containers so they can be
easily deployed to cloud and serve with Cloud Run
8
https://mlinproduction.com/batch-inference-vs-online-inference/
Online inference
Batch inference

Cloud Deployment: Cloud Run
• We use Cloud Run to deploy our model services
• Cloud Build for delegating build process to GCP
• GCP has dedicated service for serving models, AI
Platform Prediction, but we use Cloud Run
– It is more ﬂexible for us, we can set up any
environment and add any dependencies
– AI Predictions has limits regarding model
size
– We can add additional endpoints (e.g.
/explain to services)
9
Service dockerﬁle
Cloud deployment script

Cloud Run c.d.
• Useful features out-of-the box
– Autoscaling
– Multiple Revisions (versions), easy Rollback
– Trafﬁc management
– Multiple Namespaces (dev, prod)
– Resource Monitoring
10

Delivery pipeline automation (CI/CD)
11
• Conﬁgured for the models that we use on bigger, production scale
• Implemented in Gitlab CI/CD
push Download files
Build image
Run tests
Run static analysis
Push image to registry
Code Review
Canary
rollout
deploy

Monitoring
• System level metrics:
– Resource consumption (RAM, CPU), healthchecks, status codes, latency, etc.
• Data level metrics
– Prediction distributions, input data distributions
– System performance against real time labels (collected automatically or manually)
12
https://mlinproduction.com/

Streamlit
• https://www.streamlit.io/
• Easy tool to create simple web Data Products directly in Python
• You can use it to create Demos, share your work, showcase your models behaviour, debug
• Very intuitive, no Web skills required
13
https://towardsdatascience.com/coding-ml-tools-like-you-code-ml-models-ddba3357eace

Training and deploying ML models with Google Cloud Platform

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (14)

Semelhante a Training and deploying ML models with Google Cloud Platform

Semelhante a Training and deploying ML models with Google Cloud Platform (20)

Mais de Sotrender

Mais de Sotrender (19)

Último

Último (20)

Training and deploying ML models with Google Cloud Platform