Transformer-based pretrained language models such as BERT, XLNet, Roberta and Albert significantly advance the state-of-the-art of NLP and open doors for solving practical business problems with high performance transfer learning. However, operationalizing these models with production-quality continuous integration/ delivery (CI/CD) end-to-end pipelines that cover the full machine learning life cycle stages of train, test, deploy and serve while managing associated data and code repositories is still a challenging task.
Aspirational Block Program Block Syaldey District - Almora
Continuous Delivery of Deep Transformer-Based NLP Models Using MLflow and AWS Sagemaker for Enterprise AI Scenarios
1.
2. Continuous Delivery of Deep Transformer-based
NLP Models Using MLflow and AWS Sagemaker
for Enterprise AI Scenarios
Yong Liu
Principal Data Scientist
Outreach Corporation
Andrew Brooks
Senior Data Scientist
Outreach Corporation
3. Presentation Outline
➢ Introduction and Background
➢ Challenges in Enterprise AI
Implementation
➢ Full LifeCycle ML Experience at
Outreach
➢ Conclusion and Future Work
6. Sales Engagement Platform (SEP)
▪ SEP encodes and
automates sales
activities into
workflows
▪ Enables reps to
perform one-on-one
personalized
outreach to up to 10x
Day 1
Phone Call
Day 1
Email
Day 3
LinkedIn
Day 5
Phone
Day 5
Email
7. ML/NLP/AI Roles in Enterprise Sales Scenarios
▪ Continuous learning from data (
emails, phone calls,
engagement logs etc.)
▪ Reasoning from knowledge to
create a flywheel for the
continual success of Reps
10. Challenge 1: Dev-Prod Divide
▪ can’t test on “live” data
▪ can’t verify model invoked correctly
▪ can’t reproduce bugs or issues
reported by users
▪ can’t reuse prod code for model
development
Isolated prod environment
Source: Winderresearch
11. Challenge 2: Dev-Prod Differences Dev-Prod
▪ training data != prod data
▪ production scoring requires logic not
used during model training
When training & prod pipelines are and need to
be different
Source:
MoneyUser.com
12. when the “whole” is not greater than the
“sum of its parts”.
Challenge 3: Arbitrary Uniqueness
▪ deploying each model feels like a
“special case”
▪ gates and deploy mechanisms are ad
hoc
▪ pipeline maintenance is costly
Source: Rowperfect UK
13. Challenge 4: Provenance
▪ don’t know what exactly is running in
prod
▪ inability to repro and debug model
issues reported by customers
▪ model/pipeline changes = 😬
▪ undocumented model/code changes
compromise metric drifting.
▪ model experiments wasted
Provenance from models to source code and
data.
Source: Slane Cartoons
15. A Use Case: Guided Engagement
powered by an intent classification model
▪ ML model predicts the
intent of prospect’s email
reply and then
recommends the right
template to respond.
18. Model Development and Offline Experimentation
MLFlow tracking server to log all offline experiments
Tracking experiments
19. Creating a transformer flavor model
A new MLflow model flavor (transformer) & TransformerClassifier (sklearn pipeline)
Transformer
MLflow model
Flavor code
TransformerClassifier
20. Saving and Loading Transformer Artifacts
An example of a fully saved and reloadable MLflow “Transformer”-flavor Model
Load:
mlflow.pyfunc.load_model(model_uri)
Save:
mlflow_transformer.log_model(
transformer_classifier=trained_model,
artifact_path="transformer_classifier",
conda_env=CONDA_ENV,
)
21. Productionizing Code and Git Repos
▪ MLProject Conda.yml
IDE Dev Environment, MLflow MLProject, Github repo structure, flake8
22. Flexible Execution Mode
MLProject allows using code and execution environments either locally or remotely
(1)mlflow run ./ -e train
(1)mlflow run git+ssh://git@github.com/model-repo -e train --version
1.1
(1)mlflow run ./ -e train.py --backend databricks --backend-config
gpu_cluster_type.json
(1)mlflow run git+ssh://git@github.com/model-repo -e train.py --
version 1.1 --backend databricks --backend-config
gpu_cluster_type.json
23. Models: trained, wrapped, private-wheeled
To support deployment specific logic and environment, we create three progressively
evolved models for final deployment in a host (Sagemaker)
Fine-tuned Trained
Transformer Classifier
Pre-score
filter
Post-score
filter
Wrapped sklearn
pipeline model
Private wheeled model
No need to access github
25. Continuous Delivery/Rollback through Concourse
Gated by two human gates: 1) start the full model deployment 2)promote the model from staging to
production; and one regression test gate: accuracy must not be lower than previous version
29. Conclusions and Future Work
▪ We highlight four typical enterprise AI implementation
challenges and how we solve them with MLflow, Sagemaker and
CICD tools
▪ Our intent classification model has been deployed in production
and in operation using this framework
▪ Next steps:
Incorporating model in-production feedback loop into annotation
and model dev cycle
We are further improving the annotation pipeline to have seamless
human-in-the-loop active learning and model validation