SlideShare uma empresa Scribd logo
1 de 81
Baixar para ler offline
Full Stack Deep Learning
Josh Tobin, Sergey Karayev, Pieter Abbeel
Testing & Deployment
Full Stack Deep Learning
“All-in-one”
Deployment
or or
Development Training/Evaluation
or
Data
or
Experiment Management
Labeling
Interchange
Storage
Monitoring
?
Hardware /
Mobile
Software Engineering
Frameworks
Database
Hyperparameter TuningDistributed Training
Web
Versioning
Processing
CI / Testing
?
Resource Management
Testing & Deployment - Overview
Full Stack Deep Learning
Module Overview
• Concepts

• Project structure

• ML Test Score: a rubric for production readiness

• Infrastructure/Tooling

• CI/Testing

• Docker in more depth

• Deploying to the web

• Monitoring

• Deploying to hardware/mobile
3
Full Stack Deep Learning
4
Training System
- Process raw data

- Run experiments

- Manage results
Prediction System
- Process input data

- Construct networks with
trained weights

- Make predictions
Serving System

- Serve predictions

- Scale to demand
Training System
Tests

- Test full training pipeline

- Start from raw data

- Runs in < 1 day (nightly)

- Catches upstream
regressions
Validation Tests

- Test prediction system on
validation set

- Start from processed data

- Runs in < 1 hour (every
code push)

- Catches model
regressions
Functionality Tests

- Test prediction system on
a few important examples

- Runs in <5 minutes
(during development)

- Catches code regressions
Monitoring

- Alert to downtime,
errors, distribution
shifts

- Catches service and
data regressions
Training/Validation Data Production Data
Testing & Deployment - Project Structure
Full Stack Deep Learning
5
The ML Test Score:
A Rubric for ML Production Readiness and Technical Debt Reduction
Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, D. Sculley
Google, Inc.
ebreck, cais, nielsene, msalib, dsculley@google.com
bstract—Creating reliable, production-level machine learn-
systems brings on a host of concerns not found in
ll toy examples or even large offline research experiments.
ing and monitoring are key considerations for ensuring
production-readiness of an ML system, and for reducing
nical debt of ML systems. But it can be difficult to formu-
specific tests, given that the actual prediction behavior of
given model is difficult to specify a priori. In this paper,
present 28 specific tests and monitoring needs, drawn from
may find difficult. Note that this rubric focuses on is
specific to ML systems, and so does not include gen
software engineering best practices such as ensuring g
unit test coverage and a well-defined binary release proc
Such strategies remain necessary as well. We do call
a few specific areas for unit or integration tests that h
unique ML-related behavior.
How to read the tests: Each test is written as
IEEE Big Data 2017
An exhaustive framework/checklist from practitioners at Google
Testing & Deployment - ML Test Score
Full Stack Deep Learning
6
Follow-up to previous work

from Google
Testing & Deployment - ML Test Score
Full Stack Deep Learning
7
ML Systems Require Extensive Testing and Monitoring. The key consideration is that unlike a manually coded system (left),
avior is not easily specified in advance. This behavior depends on dynamic qualities of the data, and on various model configuration
d with determining how reliable an ML system is
an how to build one.
of surprising sources of technical debt in ML
bias. Visualization tools such as Facets1
can be ver
for analyzing the data to produce the schema. Invar
capture in a schema can also be inferred automatica
Traditional Software
ML Systems Require Extensive Testing and Monitoring. The key consideration is that unlike a manually coded system
ehavior is not easily specified in advance. This behavior depends on dynamic qualities of the data, and on various model config
ed with determining how reliable an ML system is
han how to build one.
s of surprising sources of technical debt in ML
bias. Visualization tools such as Facets1
can b
for analyzing the data to produce the schema
capture in a schema can also be inferred autom
Machine Learning Software
Training
System
Prediction
System
Serving
System

Data
Testing & Deployment - ML Test Score
Full Stack Deep Learning
8and then compare them to the data to avoid an anchoring
1 Feature expectations are captured in a schema.
2 All features are beneficial.
3 No feature’s cost is too much.
4 Features adhere to meta-level requirements.
5 The data pipeline has appropriate privacy controls.
6 New features can be added quickly.
7 All input feature code is tested.
Table I
BRIEF LISTING OF THE SEVEN DATA TESTS.
out a wide variety of potential features to improv
quality.
How? Programmatically enforce these requ
that all models in production properly adhere t
Data 5: The data pipeline has appropr
controls: Training data, validation data, and voc
all have the potential to contain sensitive user
teams often are aware of the need to remove per
tifiable information (PII), during this type of e
1https://pair-code.github.io/facets/
Data Tests
Table III
BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS
1 Training is reproducible.
2 Model specs are unit tested.
3 The ML pipeline is Integration tested.
4 Model quality is validated before serving.
5 The model is debuggable.
6 Models are canaried before serving.
7 Serving models can be rolled back.
country, users by frequency of use, or movies by genre.
Infra 1: Training is reproducible: Idea
twice on the same data should produce two ide
els. Deterministic training dramatically simplifie
about the whole system and can aid auditability
ging. For example, optimizing feature generatio
delicate process but verifying that the old and
generation code will train to an identical model
more confidence that the refactoring was correc
of diff-testing relies entirely on deterministic tra
Unfortunately, model training is often not rep
practice, especially when working with non-conv
ML Infrastructure Tests
the model specification can help make training auditabl
improve reproducibility.
1 Model specs are reviewed and submitted.
2 Offline and online metrics correlate.
3 All hyperparameters have been tuned.
4 The impact of model staleness is known.
5 A simpler model is not better.
6 Model quality is sufficient on important data slices.
7 The model is tested for considerations of inclusion.
Table II
BRIEF LISTING OF THE SEVEN MODEL TESTS
Model Tests
Table IV
BRIEF LISTING OF THE SEVEN MONITORING TESTS
1 Dependency changes result in notification.
2 Data invariants hold for inputs.
3 Training and serving are not skewed.
4 Models are not too stale.
5 Models are numerically stable.
6 Computing performance has not regressed.
7 Prediction quality has not regressed.
normally, when not in emergency conditions.
Monitoring Tests
Testing & Deployment - ML Test Score
Full Stack Deep Learning
Scoring the Test
Points Description
0 More of a research project than a productionized system.
(0,1] Not totally untested, but it is worth considering the possibility of serious holes in reliability.
(1,2] There’s been first pass at basic productionization, but additional investment may be needed.
(2,3] Reasonably tested, but it’s possible that more of those tests and procedures may be automated.
(3,5] Strong levels of automated testing and monitoring, appropriate for mission-critical systems.
> 5 Exceptional levels of automated testing and monitoring.
Table V
L Test Score. THIS SCORE IS COMPUTED BY TAKING THE minimum SCORE FROM EACH OF THE FOUR TEST A
MS AT DIFFERENT POINTS IN THEIR DEVELOPMENT MAY REASONABLY AIM TO BE AT DIFFERENT POINTS AL
worth the same number of points. This is numbers”). When we asked if they had d9Testing & Deployment - ML Test Score
Full Stack Deep LearningFigure 4. Average scores for interviewed teams. These graphs display the average score for each test across the 36 systems we examined.
• 36 teams with ML
systems in production
at Google
Results at Google
10Testing & Deployment - ML Test Score
Full Stack Deep Learning
Questions?
11
Full Stack Deep Learning
“All-in-one”
Deployment
or or
Development Training/Evaluation
or
Data
or
Experiment Management
Labeling
Interchange
Storage
Monitoring
?
Hardware /
Mobile
Software Engineering
Frameworks
Database
Hyperparameter TuningDistributed Training
Web
Versioning
Processing
CI / Testing
?
Resource Management
Testing & Deployment - CI/Testing
Full Stack Deep Learning
Testing / CI
• Unit / Integration Tests
• Tests for individual module functionality and for the whole system

• Continuous Integration
• Tests are run every time new code is pushed to the repository, before updated model is
deployed.

• SaaS for CI
• CircleCI, Travis, Jenkins, Buildkite

• Containerization (via Docker)
• A self-enclosed environment for running the tests
13Testing & Deployment - CI/Testing
Full Stack Deep Learning
14
Training System
- Process raw data

- Run experiments

- Manage results
Prediction System
- Process input data

- Construct networks with
trained weights

- Make predictions
Serving System

- Serve predictions

- Scale to demand
Training System
Tests

- Test full training pipeline

- Start from raw data

- Runs in < 1 day (nightly)
- Catches upstream
regressions
Validation Tests

- Test prediction system on
validation set

- Start from processed data

- Runs in < 1 hour (every
code push)
- Catches model
regressions
Functionality Tests

- Test prediction system on
a few important examples

- Runs in <5 minutes
(during development)
- Catches code regressions
Monitoring

- Alert to downtime,
errors, distribution
shifts

- Catches service and
data regressions
Training/Validation Data Production Data
Testing & Deployment - CI/Testing
Full Stack Deep Learning
15
and then compare them to the data to avoid an anchoring
1 Feature expectations are captured in a schema.
2 All features are beneficial.
3 No feature’s cost is too much.
4 Features adhere to meta-level requirements.
5 The data pipeline has appropriate privacy controls.
6 New features can be added quickly.
7 All input feature code is tested.
Table I
BRIEF LISTING OF THE SEVEN DATA TESTS.
out a wide variety of potential features to improv
quality.
How? Programmatically enforce these requ
that all models in production properly adhere t
Data 5: The data pipeline has appropr
controls: Training data, validation data, and voc
all have the potential to contain sensitive user
teams often are aware of the need to remove per
tifiable information (PII), during this type of e
1https://pair-code.github.io/facets/
Data Tests
improve reproducibility.
1 Model specs are reviewed and submitted.
2 Offline and online metrics correlate.
3 All hyperparameters have been tuned.
4 The impact of model staleness is known.
5 A simpler model is not better.
6 Model quality is sufficient on important data slices.
7 The model is tested for considerations of inclusion.
Table II
BRIEF LISTING OF THE SEVEN MODEL TESTS
Model Tests
Table III
BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS
1 Training is reproducible.
2 Model specs are unit tested.
3 The ML pipeline is Integration tested.
4 Model quality is validated before serving.
5 The model is debuggable.
6 Models are canaried before serving.
7 Serving models can be rolled back.
country, users by frequency of use, or movies by genre.
Examining sliced data avoids having fine-grained quality
Infra 1: Training is reproducible: Idea
twice on the same data should produce two ide
els. Deterministic training dramatically simplifie
about the whole system and can aid auditability
ging. For example, optimizing feature generatio
delicate process but verifying that the old and
generation code will train to an identical model
more confidence that the refactoring was correc
of diff-testing relies entirely on deterministic tra
Unfortunately, model training is often not rep
practice, especially when working with non-conv
ML Infrastructure Tests
Table IV
BRIEF LISTING OF THE SEVEN MONITORING TESTS
1 Dependency changes result in notification.
2 Data invariants hold for inputs.
3 Training and serving are not skewed.
4 Models are not too stale.
5 Models are numerically stable.
6 Computing performance has not regressed.
7 Prediction quality has not regressed.
normally, when not in emergency conditions.
Monitoring Tests
Testing & Deployment - CI/Testing
Full Stack Deep Learning
CircleCI / Travis / etc
• SaaS for continuous integration

• Integrate with your repository: every push kicks off a cloud job

• Job can be defined as commands in a Docker container, and can store
results for later review

• No GPUs

• CircleCI has a free plan
16Testing & Deployment - CI/Testing
Full Stack Deep Learning
Jenkins / Buildkite
• Nice option for running CI on your own hardware, in the cloud, or mixed

• Good option for a scheduled training test (can use your GPUs)

• Very flexible
17
Full Stack Deep Learning
Docker
• Before proceeding, we have to understand Docker better.

• Docker vs VM

• Dockerfile and layer-based images

• DockerHub and the ecosystem

• Kubernetes and other orchestrators
18Testing & Deployment - Docker
Full Stack Deep Learning
19
https://medium.freecodecamp.org/a-beginner-friendly-introduction-to-containers-vms-and-docker-79a9e3e119b
No OS -> Light weight
Testing & Deployment - Docker
Full Stack Deep Learning
• Spin up a container for
every discrete task

• For example, a web app
might have four containers:

• Web server

• Database

• Job queue

• Worker
20
https://www.docker.com/what-container
Light weight -> Heavy use
Testing & Deployment - Docker
Full Stack Deep Learning
Dockerfile
21Testing & Deployment - Docker
Full Stack Deep Learning
Strong Ecosystem
• Images are easy to
find, modify, and
contribute back to
DockerHub

• Private images
easy to store in
same place
22
https://docs.docker.com/engine/docker-overview
Testing & Deployment - Docker
Full Stack Deep Learning
23
https://www.docker.com/what-container#/package_software
Testing & Deployment - Docker
Full Stack Deep Learning
Container Orchestration
• Distributing containers onto underlying VMs
or bare metal

• Kubernetes is the open-source winner, but
cloud providers have good offerings, too.
24Testing & Deployment - Docker
Full Stack Deep Learning
Questions?
25
Full Stack Deep Learning
“All-in-one”
Deployment
or or
Development Training/Evaluation
or
Data
or
Experiment Management
Labeling
Interchange
Storage
Monitoring
?
Hardware /
Mobile
Software Engineering
Frameworks
Database
Hyperparameter TuningDistributed Training
Web
Versioning
Processing
CI / Testing
?
Resource Management
Testing & Deployment - Web Deployment
Full Stack Deep Learning
Terminology
• Inference: making predictions (not training)

• Model Serving: ML-specific deployment, usually focused on optimizing
GPU usage
27Testing & Deployment - Web Deployment
Full Stack Deep Learning
Web Deployment
• REST API
• Serving predictions in response to canonically-formatted HTTP requests

• The web server is running and calling the prediction system

• Options
• Deploying code to VMs, scale by adding instances

• Deploy code as containers, scale via orchestration

• Deploy code as a “serverless function”

• Deploy via a model serving solution
28Testing & Deployment - Web Deployment
Full Stack Deep Learning
29
Training System
- Process raw data

- Run experiments

- Manage results
Prediction System
- Process input data

- Construct networks with
trained weights

- Make predictions
Serving System

- Serve predictions

- Scale to demand
Training System
Tests

- Test full training pipeline

- Start from raw data

- Runs in < 1 day (nightly)

- Catches upstream
regressions
Validation Tests

- Test prediction system on
validation set

- Start from processed data

- Runs in < 1 hour (every
code push)

- Catches model
regressions
Functionality Tests

- Test prediction system on
a few important examples

- Runs in <5 minutes
(during development)

- Catches code regressions
Monitoring

- Alert to downtime,
errors, distribution
shifts

- Catches service and
data regressions
Training/Validation Data Production Data
Testing & Deployment - Web Deployment
Full Stack Deep Learning
30
and then compare them to the data to avoid an anchoring
1 Feature expectations are captured in a schema.
2 All features are beneficial.
3 No feature’s cost is too much.
4 Features adhere to meta-level requirements.
5 The data pipeline has appropriate privacy controls.
6 New features can be added quickly.
7 All input feature code is tested.
Table I
BRIEF LISTING OF THE SEVEN DATA TESTS.
out a wide variety of potential features to improv
quality.
How? Programmatically enforce these requ
that all models in production properly adhere t
Data 5: The data pipeline has appropr
controls: Training data, validation data, and voc
all have the potential to contain sensitive user
teams often are aware of the need to remove per
tifiable information (PII), during this type of e
1https://pair-code.github.io/facets/
Data Tests
improve reproducibility.
1 Model specs are reviewed and submitted.
2 Offline and online metrics correlate.
3 All hyperparameters have been tuned.
4 The impact of model staleness is known.
5 A simpler model is not better.
6 Model quality is sufficient on important data slices.
7 The model is tested for considerations of inclusion.
Table II
BRIEF LISTING OF THE SEVEN MODEL TESTS
Model Tests
Table III
BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS
1 Training is reproducible.
2 Model specs are unit tested.
3 The ML pipeline is Integration tested.
4 Model quality is validated before serving.
5 The model is debuggable.
6 Models are canaried before serving.
7 Serving models can be rolled back.
country, users by frequency of use, or movies by genre.
Examining sliced data avoids having fine-grained quality
Infra 1: Training is reproducible: Idea
twice on the same data should produce two ide
els. Deterministic training dramatically simplifie
about the whole system and can aid auditability
ging. For example, optimizing feature generatio
delicate process but verifying that the old and
generation code will train to an identical model
more confidence that the refactoring was correc
of diff-testing relies entirely on deterministic tra
Unfortunately, model training is often not rep
practice, especially when working with non-conv
ML Infrastructure Tests
Table IV
BRIEF LISTING OF THE SEVEN MONITORING TESTS
1 Dependency changes result in notification.
2 Data invariants hold for inputs.
3 Training and serving are not skewed.
4 Models are not too stale.
5 Models are numerically stable.
6 Computing performance has not regressed.
7 Prediction quality has not regressed.
normally, when not in emergency conditions.
Monitoring Tests
Testing & Deployment - Web Deployment
Full Stack Deep Learning
31
DEPLOY CODE TO YOUR OWN BARE METAL
Testing & Deployment - Web Deployment
Full Stack Deep Learning
32
DEPLOY CODE TO YOUR OWN BARE METAL
DEPLOY CODE TO CLOUD INSTANCES
Testing & Deployment - Web Deployment
Full Stack Deep Learning
Deploying code to cloud instances
• Each instance has to be provisioned with dependencies and app code

• Number of instances is managed manually or via auto-scaling

• Load balancer sends user traffic to the instances
33
Full Stack Deep Learning
Deploying code to cloud instances
• Cons:
• Provisioning can be brittle

• Paying for instances even when not using them (auto-scaling does help)
34Testing & Deployment - Web Deployment
Full Stack Deep Learning
35
DEPLOY DOCKER CONTAINERS TO

CLOUD INSTANCES
DEPLOY CODE TO YOUR OWN BARE METAL
DEPLOY CODE TO CLOUD INSTANCES
Testing & Deployment - Web Deployment
Full Stack Deep Learning
Deploying containers
• App code and dependencies are packaged into Docker containers

• Kubernetes or alternative (AWS Fargate) orchestrates containers (DB /
workers / web servers / etc.
36Testing & Deployment - Web Deployment
Full Stack Deep Learning
Deploying containers
• Cons:
• Still managing your own servers and paying for uptime, not compute-
time
37Testing & Deployment - Web Deployment
Full Stack Deep Learning
38
DEPLOY CODE TO YOUR OWN BARE METAL
DEPLOY CODE TO CLOUD INSTANCES
DEPLOY DOCKER CONTAINERS TO

CLOUD INSTANCES
DEPLOY SERVERLESS FUNCTIONS
Testing & Deployment - Web Deployment
Full Stack Deep Learning
Deploying code as serverless functions
• App code and dependencies are packaged into .zip files, with a single
entry point function

• AWS Lambda (or Google Cloud Functions, or Azure Functions) manages
everything else: instant scaling to 10,000+ requests per second, load
balancing, etc.

• Only pay for compute-time.
39Testing & Deployment - Web Deployment
Full Stack Deep Learning
40
Testing & Deployment - Web Deployment
Full Stack Deep Learning
Deploying code as serverless functions
• Cons:
• Entire deployment package has to fit within 500MB, <5 min execution,
<3GB memory (on AWS Lambda)

• Only CPU execution
41Testing & Deployment - Web Deployment
Full Stack Deep Learning
42
and then compare them to the data to avoid an anchoring
1 Feature expectations are captured in a schema.
2 All features are beneficial.
3 No feature’s cost is too much.
4 Features adhere to meta-level requirements.
5 The data pipeline has appropriate privacy controls.
6 New features can be added quickly.
7 All input feature code is tested.
Table I
BRIEF LISTING OF THE SEVEN DATA TESTS.
out a wide variety of potential features to improv
quality.
How? Programmatically enforce these requ
that all models in production properly adhere t
Data 5: The data pipeline has appropr
controls: Training data, validation data, and voc
all have the potential to contain sensitive user
teams often are aware of the need to remove per
tifiable information (PII), during this type of e
1https://pair-code.github.io/facets/
Data Tests
improve reproducibility.
1 Model specs are reviewed and submitted.
2 Offline and online metrics correlate.
3 All hyperparameters have been tuned.
4 The impact of model staleness is known.
5 A simpler model is not better.
6 Model quality is sufficient on important data slices.
7 The model is tested for considerations of inclusion.
Table II
BRIEF LISTING OF THE SEVEN MODEL TESTS
Model Tests
Table III
BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS
1 Training is reproducible.
2 Model specs are unit tested.
3 The ML pipeline is Integration tested.
4 Model quality is validated before serving.
5 The model is debuggable.
6 Models are canaried before serving.
7 Serving models can be rolled back.
country, users by frequency of use, or movies by genre.
Examining sliced data avoids having fine-grained quality
Infra 1: Training is reproducible: Idea
twice on the same data should produce two ide
els. Deterministic training dramatically simplifie
about the whole system and can aid auditability
ging. For example, optimizing feature generatio
delicate process but verifying that the old and
generation code will train to an identical model
more confidence that the refactoring was correc
of diff-testing relies entirely on deterministic tra
Unfortunately, model training is often not rep
practice, especially when working with non-conv
ML Infrastructure Tests
Table IV
BRIEF LISTING OF THE SEVEN MONITORING TESTS
1 Dependency changes result in notification.
2 Data invariants hold for inputs.
3 Training and serving are not skewed.
4 Models are not too stale.
5 Models are numerically stable.
6 Computing performance has not regressed.
7 Prediction quality has not regressed.
normally, when not in emergency conditions.
Monitoring Tests
Testing & Deployment - Web Deployment
Full Stack Deep Learning
Deploying code as serverless functions
• Canarying
• Easy to have two versions of Lambda functions in production, and start
sending low volume traffic to one.

• Rollback
• Easy to switch back to the previous version of the function.
43Testing & Deployment - Web Deployment
Full Stack Deep Learning
Model Serving
• Web deployment options specialized for machine learning models

• Tensorflow Serving (Google)

• Model Server for MXNet (Amazon)

• Clipper (Berkeley RISE Lab)

• SaaS solutions like Algorithmia

• The most important feature is batching requests for GPU inference
44Testing & Deployment - Web Deployment
Full Stack Deep Learning
Tensorflow Serving
• Able to serve Tensorflow predictions at high
throughput.

• Used by Google and their “all-in-one” machine
learning offerings

• Overkill unless you know you need to use GPU for
inference
45
Figure 2: TFS2
(hosted model serving) architecture.
2.2.1 Inter-Request Batching
Full Stack Deep Learning
MXNet Model Server
• Amazon’s answer to Tensorflow Serving

• Part of their SageMaker “all-in-one” machine learning offering.
46
Full Stack Deep Learning
Clipper
• Open-source model serving via
REST, using Docker.

• Nice-to-haves on top of the basics

• e.g “state-of-the-art bandit and
ensemble methods to
intelligently select and combine
predictions”

• Probably too early to be actually
useful (unless you know why you
need it)
47
Full Stack Deep Learning
Algorithmia
• Some startups promise effortless model serving
48
Full Stack Deep Learning
Seldon
49
https://www.seldon.io/tech/products/core/
Testing & Deployment - Web Deployment
Full Stack Deep Learning
Takeaway
• If you are doing CPU inference, can get away with scaling by launching
more servers, or going serverless.

• The dream is deploying Docker as easily as Lambda

• Next best thing is either Lambda or Docker, depending on your model’s
needs.

• If using GPU inference, things like TF Serving and Clipper become useful
by adaptive batching, etc.
50Testing & Deployment - Web Deployment
Full Stack Deep Learning
Questions?
51
Full Stack Deep Learning
“All-in-one”
Deployment
or or
Development Training/Evaluation
or
Data
or
Experiment Management
Labeling
Interchange
Storage
Monitoring
?
Hardware /
Mobile
Software Engineering
Frameworks
Database
Hyperparameter TuningDistributed Training
Web
Versioning
Processing
CI / Testing
?
Resource Management
Testing & Deployment - Monitoring
Full Stack Deep Learning
53
Training System
- Process raw data

- Run experiments

- Manage results
Prediction System
- Process input data

- Construct networks with
trained weights

- Make predictions
Serving System

- Serve predictions

- Scale to demand
Training System
Tests

- Test full training pipeline

- Start from raw data

- Runs in < 1 day (nightly)

- Catches upstream
regressions
Validation Tests

- Test prediction system on
validation set

- Start from processed data

- Runs in < 1 hour (every
code push)

- Catches model
regressions
Functionality Tests

- Test prediction system on
a few important examples

- Runs in <5 minutes
(during development)

- Catches code regressions
Monitoring

- Alert to downtime,
errors, distribution
shifts

- Catches service and
data regressions
Training/Validation Data Production Data
Testing & Deployment - Monitoring
Full Stack Deep Learning
54
and then compare them to the data to avoid an anchoring
1 Feature expectations are captured in a schema.
2 All features are beneficial.
3 No feature’s cost is too much.
4 Features adhere to meta-level requirements.
5 The data pipeline has appropriate privacy controls.
6 New features can be added quickly.
7 All input feature code is tested.
Table I
BRIEF LISTING OF THE SEVEN DATA TESTS.
out a wide variety of potential features to improv
quality.
How? Programmatically enforce these requ
that all models in production properly adhere t
Data 5: The data pipeline has appropr
controls: Training data, validation data, and voc
all have the potential to contain sensitive user
teams often are aware of the need to remove per
tifiable information (PII), during this type of e
1https://pair-code.github.io/facets/
Data Tests
improve reproducibility.
1 Model specs are reviewed and submitted.
2 Offline and online metrics correlate.
3 All hyperparameters have been tuned.
4 The impact of model staleness is known.
5 A simpler model is not better.
6 Model quality is sufficient on important data slices.
7 The model is tested for considerations of inclusion.
Table II
BRIEF LISTING OF THE SEVEN MODEL TESTS
Model Tests
Table III
BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS
1 Training is reproducible.
2 Model specs are unit tested.
3 The ML pipeline is Integration tested.
4 Model quality is validated before serving.
5 The model is debuggable.
6 Models are canaried before serving.
7 Serving models can be rolled back.
country, users by frequency of use, or movies by genre.
Examining sliced data avoids having fine-grained quality
Infra 1: Training is reproducible: Idea
twice on the same data should produce two ide
els. Deterministic training dramatically simplifie
about the whole system and can aid auditability
ging. For example, optimizing feature generatio
delicate process but verifying that the old and
generation code will train to an identical model
more confidence that the refactoring was correc
of diff-testing relies entirely on deterministic tra
Unfortunately, model training is often not rep
practice, especially when working with non-conv
ML Infrastructure Tests
Table IV
BRIEF LISTING OF THE SEVEN MONITORING TESTS
1 Dependency changes result in notification.
2 Data invariants hold for inputs.
3 Training and serving are not skewed.
4 Models are not too stale.
5 Models are numerically stable.
6 Computing performance has not regressed.
7 Prediction quality has not regressed.
normally, when not in emergency conditions.
Monitoring Tests
Testing & Deployment - Monitoring
Full Stack Deep Learning
Monitoring
• Alarms for when things go
wrong, and records for tuning
things.

• Cloud providers have decent
monitoring solutions.

• Anything that can be logged
can be monitored (i.e. data
skew)

• Will explore in lab
55Testing & Deployment - Monitoring
Full Stack Deep Learning
Data Distribution Monitoring
• Underserved need!
56
Domino Data Lab
Full Stack Deep Learning
Closing the Flywheel
• Important to monitor the business uses of the model, not just its own
statistics

• Important to be able to contribute failures back to dataset
57Testing & Deployment - Monitoring
Full Stack Deep Learning
Closing the Flywheel
58Testing & Deployment - Monitoring
Full Stack Deep Learning 59Testing & Deployment - Monitoring
Closing the Flywheel
Full Stack Deep Learning 60Testing & Deployment - Monitoring
Closing the Flywheel
Full Stack Deep Learning 61Testing & Deployment - Monitoring
Closing the Flywheel
Full Stack Deep Learning
Questions?
62
Full Stack Deep Learning
“All-in-one”
Deployment
or or
Development Training/Evaluation
or
Data
or
Experiment Management
Labeling
Interchange
Storage
Monitoring
?
Hardware /
Mobile
Software Engineering
Frameworks
Database
Hyperparameter TuningDistributed Training
Web
Versioning
Processing
CI / Testing
?
Resource Management
Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
Problems
• Embedded and mobile frameworks are less fully featured than full
PyTorch/Tensorflow

• Have to be careful with architecture

• Interchange format

• Embedded and mobile devices have little memory and slow/expensive
compute

• Have to reduce network size / quantize weights / distill knowledge
64Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
Problems
• Embedded and mobile frameworks are less fully featured than full
PyTorch/Tensorflow

• Have to be careful with architecture

• Interchange format

• Embedded and mobile devices have little memory and slow/expensive
compute

• Have to reduce network size / quantize weights / distill knowledge
65Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
Tensorflow Lite
• Not all models will work, but much better than "Tensorflow Mobile" used
to be
66Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
PyTorch Mobile
• Optional quantization

• TorchScript: static execution graph

• Libraries for Android and iOS
67
Full Stack Deep Learning
68
https://heartbeat.fritz.ai/core-ml-vs-ml-kit-which-mobile-machine-learning-framework-is-right-for-you-e25c5d34c765
CoreML
- Released at Apple WWDC 2017

- Inference only

- https://coreml.store/
- Supposed to work with both
CoreML and MLKit
- Announced at Google I/O 2018

- Either via API or on-device

- Offers pre-trained models, or
can upload Tensorflow Lite
model
Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
Problems
69Testing & Deployment - Hardware/Mobile
• Embedded and mobile frameworks are less fully featured than full
PyTorch/Tensorflow

• Have to be careful with architecture

• Interchange format
• Embedded and mobile devices have little memory and slow/expensive
compute

• Have to reduce network size / quantize weights / distill knowledge
Full Stack Deep Learning
• Open Neural Network Exchange: open-source format for deep learning
models

• The dream is to mix different frameworks, such that frameworks that are
good for development (PyTorch) don’t also have to be good at inference
(Caffe2)
70Testing & Deployment - Hardware/Mobile
Interchange Format
Full Stack Deep Learning 71Testing & Deployment - Hardware/Mobile
Interchange Format
https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-onnx
Full Stack Deep Learning
Problems
• Embedded and mobile frameworks are less fully featured than full
PyTorch/Tensorflow

• Have to be careful with architecture

• Interchange format

• Embedded and mobile devices have little memory and slow/expensive
compute

• Have to reduce network size / quantize weights / distill knowledge
72Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
NVIDIA for Embedded
• Todo
73Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
Problems
74Testing & Deployment - Hardware/Mobile
• Embedded and mobile frameworks are less fully featured than full
PyTorch/Tensorflow

• Have to be careful with architecture

• Interchange format

• Embedded and mobile devices have little memory and slow/expensive
compute

• Have to reduce network size / quantize weights / distill knowledge
Full Stack Deep Learning
75
Standard
1x1
Inception
https://medium.com/@yu4u/why-mobilenet-and-its-variants-e-g-shufflenet-are-fast-1c7048b9618d
MobileNet
Testing & Deployment - Hardware/Mobile
MobileNets
Full Stack Deep Learning
76
https://medium.com/@yu4u/why-mobilenet-and-its-variants-e-g-shufflenet-are-fast-1c7048b9618d
Testing & Deployment - Hardware/Mobile
Full Stack Deep Learning
Problems
77Testing & Deployment - Hardware/Mobile
• Embedded and mobile frameworks are less fully featured than full
PyTorch/Tensorflow

• Have to be careful with architecture

• Interchange format

• Embedded and mobile devices have little memory and slow/expensive
compute

• Have to reduce network size / quantize weights / distill knowledge
Full Stack Deep Learning
Knowledge distillation
• First, train a large "teacher" network on the data directly

• Next, train a smaller "student" network to match the "teacher" network's
prediction distribution
78
Full Stack Deep Learning
Recommended case study: DistillBERT
79
https://medium.com/huggingface/distilbert-8cf3380435b5
Full Stack Deep Learning
Questions?
80
Full Stack Deep Learning
Thank you!
81

Mais conteúdo relacionado

Mais procurados

2022 AAAI DSTC10 Invited Talk
2022 AAAI DSTC10 Invited Talk2022 AAAI DSTC10 Invited Talk
2022 AAAI DSTC10 Invited TalkVerena Rieser
 
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Weak Supervision.pdf
Weak Supervision.pdfWeak Supervision.pdf
Weak Supervision.pdfStephenLeo7
 
Using Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalUsing Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalSujit Pal
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language ProcessingYunyao Li
 
データサイエンティストのつくり方
データサイエンティストのつくり方データサイエンティストのつくり方
データサイエンティストのつくり方Shohei Hido
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural searchDmitry Kan
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine LearningLogical Clocks
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Databricks
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?NAVER D2
 
Kubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOKubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOAnimesh Singh
 
Apache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームApache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームKouhei Sutou
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model ServingDatabricks
 
ONNX - The Lingua Franca of Deep Learning
ONNX - The Lingua Franca of Deep LearningONNX - The Lingua Franca of Deep Learning
ONNX - The Lingua Franca of Deep LearningHagay Lupesko
 
[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence LearningDeep Learning JP
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
OWASP Top 10 超初級編 (2021 Ver.)
OWASP Top 10 超初級編 (2021 Ver.)OWASP Top 10 超初級編 (2021 Ver.)
OWASP Top 10 超初級編 (2021 Ver.)AkitadaOmagari
 

Mais procurados (20)

2022 AAAI DSTC10 Invited Talk
2022 AAAI DSTC10 Invited Talk2022 AAAI DSTC10 Invited Talk
2022 AAAI DSTC10 Invited Talk
 
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
Lecture 10: ML Testing & Explainability (Full Stack Deep Learning - Spring 2021)
 
Weak Supervision.pdf
Weak Supervision.pdfWeak Supervision.pdf
Weak Supervision.pdf
 
Using Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based RetrievalUsing Graph and Transformer Embeddings for Vector Based Retrieval
Using Graph and Transformer Embeddings for Vector Based Retrieval
 
Explainability for Natural Language Processing
Explainability for Natural Language ProcessingExplainability for Natural Language Processing
Explainability for Natural Language Processing
 
Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
データサイエンティストのつくり方
データサイエンティストのつくり方データサイエンティストのつくり方
データサイエンティストのつくり方
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Managed Feature Store for Machine Learning
Managed Feature Store for Machine LearningManaged Feature Store for Machine Learning
Managed Feature Store for Machine Learning
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
 
Kubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPOKubeflow Distributed Training and HPO
Kubeflow Distributed Training and HPO
 
ゲームプラットフォーム on AWS
ゲームプラットフォーム on AWSゲームプラットフォーム on AWS
ゲームプラットフォーム on AWS
 
Apache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォームApache Arrow - データ処理ツールの次世代プラットフォーム
Apache Arrow - データ処理ツールの次世代プラットフォーム
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
 
ONNX - The Lingua Franca of Deep Learning
ONNX - The Lingua Franca of Deep LearningONNX - The Lingua Franca of Deep Learning
ONNX - The Lingua Franca of Deep Learning
 
[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning[DL輪読会]Convolutional Sequence to Sequence Learning
[DL輪読会]Convolutional Sequence to Sequence Learning
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
Lecture 4: Transformers (Full Stack Deep Learning - Spring 2021)
 
OWASP Top 10 超初級編 (2021 Ver.)
OWASP Top 10 超初級編 (2021 Ver.)OWASP Top 10 超初級編 (2021 Ver.)
OWASP Top 10 超初級編 (2021 Ver.)
 

Semelhante a Full Stack Deep Learning Testing & Deployment

An Automation Framework That Really Works
An Automation Framework That Really WorksAn Automation Framework That Really Works
An Automation Framework That Really WorksBasivi Reddy Junna
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
QA Team Goes to Agile and Continuous integration
QA Team Goes to Agile and Continuous integrationQA Team Goes to Agile and Continuous integration
QA Team Goes to Agile and Continuous integrationSujit Ghosh
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP TestingRTTS
 
10-Testing-system.pdf
10-Testing-system.pdf10-Testing-system.pdf
10-Testing-system.pdfn190212
 
Software Testing
Software TestingSoftware Testing
Software TestingSKumar11384
 
vodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applicationsvodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applicationsvodQA
 
Software Risk Analysis
Software Risk AnalysisSoftware Risk Analysis
Software Risk AnalysisBrett Leonard
 
Building functional Quality Gates with ReportPortal
Building functional Quality Gates with ReportPortalBuilding functional Quality Gates with ReportPortal
Building functional Quality Gates with ReportPortalDmitriy Gumeniuk
 
Test Automation NYC 2014
Test Automation NYC 2014Test Automation NYC 2014
Test Automation NYC 2014Kishore Bhatia
 
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingCognizant
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaDatabricks
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro sessionAvinash Patil
 

Semelhante a Full Stack Deep Learning Testing & Deployment (20)

An Automation Framework That Really Works
An Automation Framework That Really WorksAn Automation Framework That Really Works
An Automation Framework That Really Works
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
QA Team Goes to Agile and Continuous integration
QA Team Goes to Agile and Continuous integrationQA Team Goes to Agile and Continuous integration
QA Team Goes to Agile and Continuous integration
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Test Automation for Data Warehouses
Test Automation for Data Warehouses Test Automation for Data Warehouses
Test Automation for Data Warehouses
 
How to Automate your Enterprise Application / ERP Testing
How to Automate your  Enterprise Application / ERP TestingHow to Automate your  Enterprise Application / ERP Testing
How to Automate your Enterprise Application / ERP Testing
 
Manualtestingppt
ManualtestingpptManualtestingppt
Manualtestingppt
 
Introduction & Manual Testing
Introduction & Manual TestingIntroduction & Manual Testing
Introduction & Manual Testing
 
10-Testing-system.pdf
10-Testing-system.pdf10-Testing-system.pdf
10-Testing-system.pdf
 
Batch Process Analytics
Batch Process Analytics Batch Process Analytics
Batch Process Analytics
 
Software Testing
Software TestingSoftware Testing
Software Testing
 
vodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applicationsvodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applications
 
Software Risk Analysis
Software Risk AnalysisSoftware Risk Analysis
Software Risk Analysis
 
Building functional Quality Gates with ReportPortal
Building functional Quality Gates with ReportPortalBuilding functional Quality Gates with ReportPortal
Building functional Quality Gates with ReportPortal
 
Test Automation NYC 2014
Test Automation NYC 2014Test Automation NYC 2014
Test Automation NYC 2014
 
Deliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL TestingDeliver Trusted Data by Leveraging ETL Testing
Deliver Trusted Data by Leveraging ETL Testing
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
FlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at HumanaFlorenceAI: Reinventing Data Science at Humana
FlorenceAI: Reinventing Data Science at Humana
 
Ml ops intro session
Ml ops   intro sessionMl ops   intro session
Ml ops intro session
 
Gcs day1
Gcs day1Gcs day1
Gcs day1
 

Mais de Sergey Karayev

Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...Sergey Karayev
 
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021Sergey Karayev
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Sergey Karayev
 
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Sergey Karayev
 
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021Sergey Karayev
 
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021Sergey Karayev
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningSergey Karayev
 
Research Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningResearch Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningSergey Karayev
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningSergey Karayev
 
AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019Sergey Karayev
 
Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Sergey Karayev
 

Mais de Sergey Karayev (16)

Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
Lecture 13: ML Teams (Full Stack Deep Learning - Spring 2021)
 
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
Lecture 12: Research Directions (Full Stack Deep Learning - Spring 2021)
 
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
Lecture 9: AI Ethics (Full Stack Deep Learning - Spring 2021)
 
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
Lecture 8: Data Management (Full Stack Deep Learning - Spring 2021)
 
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
Lecture 7: Troubleshooting Deep Neural Networks (Full Stack Deep Learning - S...
 
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
Lecture 5: ML Projects (Full Stack Deep Learning - Spring 2021)
 
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
Lecture 3: RNNs - Full Stack Deep Learning - Spring 2021
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
Lecture 2.A: Convolutional Networks - Full Stack Deep Learning - Spring 2021
 
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
Lab 1: Intro and Setup - Full Stack Deep Learning - Spring 2021
 
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
Lecture 1: Deep Learning Fundamentals - Full Stack Deep Learning - Spring 2021
 
Machine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep LearningMachine Learning Teams - Full Stack Deep Learning
Machine Learning Teams - Full Stack Deep Learning
 
Research Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep LearningResearch Directions - Full Stack Deep Learning
Research Directions - Full Stack Deep Learning
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 
AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019AI Masterclass at ASU GSV 2019
AI Masterclass at ASU GSV 2019
 
Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.Attentional Object Detection - introductory slides.
Attentional Object Detection - introductory slides.
 

Último

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Último (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Full Stack Deep Learning Testing & Deployment

  • 1. Full Stack Deep Learning Josh Tobin, Sergey Karayev, Pieter Abbeel Testing & Deployment
  • 2. Full Stack Deep Learning “All-in-one” Deployment or or Development Training/Evaluation or Data or Experiment Management Labeling Interchange Storage Monitoring ? Hardware / Mobile Software Engineering Frameworks Database Hyperparameter TuningDistributed Training Web Versioning Processing CI / Testing ? Resource Management Testing & Deployment - Overview
  • 3. Full Stack Deep Learning Module Overview • Concepts • Project structure • ML Test Score: a rubric for production readiness • Infrastructure/Tooling • CI/Testing • Docker in more depth • Deploying to the web • Monitoring • Deploying to hardware/mobile 3
  • 4. Full Stack Deep Learning 4 Training System - Process raw data - Run experiments - Manage results Prediction System - Process input data - Construct networks with trained weights - Make predictions Serving System
 - Serve predictions - Scale to demand Training System Tests
 - Test full training pipeline - Start from raw data - Runs in < 1 day (nightly) - Catches upstream regressions Validation Tests
 - Test prediction system on validation set - Start from processed data - Runs in < 1 hour (every code push) - Catches model regressions Functionality Tests
 - Test prediction system on a few important examples - Runs in <5 minutes (during development) - Catches code regressions Monitoring
 - Alert to downtime, errors, distribution shifts - Catches service and data regressions Training/Validation Data Production Data Testing & Deployment - Project Structure
  • 5. Full Stack Deep Learning 5 The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, D. Sculley Google, Inc. ebreck, cais, nielsene, msalib, dsculley@google.com bstract—Creating reliable, production-level machine learn- systems brings on a host of concerns not found in ll toy examples or even large offline research experiments. ing and monitoring are key considerations for ensuring production-readiness of an ML system, and for reducing nical debt of ML systems. But it can be difficult to formu- specific tests, given that the actual prediction behavior of given model is difficult to specify a priori. In this paper, present 28 specific tests and monitoring needs, drawn from may find difficult. Note that this rubric focuses on is specific to ML systems, and so does not include gen software engineering best practices such as ensuring g unit test coverage and a well-defined binary release proc Such strategies remain necessary as well. We do call a few specific areas for unit or integration tests that h unique ML-related behavior. How to read the tests: Each test is written as IEEE Big Data 2017 An exhaustive framework/checklist from practitioners at Google Testing & Deployment - ML Test Score
  • 6. Full Stack Deep Learning 6 Follow-up to previous work
 from Google Testing & Deployment - ML Test Score
  • 7. Full Stack Deep Learning 7 ML Systems Require Extensive Testing and Monitoring. The key consideration is that unlike a manually coded system (left), avior is not easily specified in advance. This behavior depends on dynamic qualities of the data, and on various model configuration d with determining how reliable an ML system is an how to build one. of surprising sources of technical debt in ML bias. Visualization tools such as Facets1 can be ver for analyzing the data to produce the schema. Invar capture in a schema can also be inferred automatica Traditional Software ML Systems Require Extensive Testing and Monitoring. The key consideration is that unlike a manually coded system ehavior is not easily specified in advance. This behavior depends on dynamic qualities of the data, and on various model config ed with determining how reliable an ML system is han how to build one. s of surprising sources of technical debt in ML bias. Visualization tools such as Facets1 can b for analyzing the data to produce the schema capture in a schema can also be inferred autom Machine Learning Software Training System Prediction System Serving System
 Data Testing & Deployment - ML Test Score
  • 8. Full Stack Deep Learning 8and then compare them to the data to avoid an anchoring 1 Feature expectations are captured in a schema. 2 All features are beneficial. 3 No feature’s cost is too much. 4 Features adhere to meta-level requirements. 5 The data pipeline has appropriate privacy controls. 6 New features can be added quickly. 7 All input feature code is tested. Table I BRIEF LISTING OF THE SEVEN DATA TESTS. out a wide variety of potential features to improv quality. How? Programmatically enforce these requ that all models in production properly adhere t Data 5: The data pipeline has appropr controls: Training data, validation data, and voc all have the potential to contain sensitive user teams often are aware of the need to remove per tifiable information (PII), during this type of e 1https://pair-code.github.io/facets/ Data Tests Table III BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS 1 Training is reproducible. 2 Model specs are unit tested. 3 The ML pipeline is Integration tested. 4 Model quality is validated before serving. 5 The model is debuggable. 6 Models are canaried before serving. 7 Serving models can be rolled back. country, users by frequency of use, or movies by genre. Infra 1: Training is reproducible: Idea twice on the same data should produce two ide els. Deterministic training dramatically simplifie about the whole system and can aid auditability ging. For example, optimizing feature generatio delicate process but verifying that the old and generation code will train to an identical model more confidence that the refactoring was correc of diff-testing relies entirely on deterministic tra Unfortunately, model training is often not rep practice, especially when working with non-conv ML Infrastructure Tests the model specification can help make training auditabl improve reproducibility. 1 Model specs are reviewed and submitted. 2 Offline and online metrics correlate. 3 All hyperparameters have been tuned. 4 The impact of model staleness is known. 5 A simpler model is not better. 6 Model quality is sufficient on important data slices. 7 The model is tested for considerations of inclusion. Table II BRIEF LISTING OF THE SEVEN MODEL TESTS Model Tests Table IV BRIEF LISTING OF THE SEVEN MONITORING TESTS 1 Dependency changes result in notification. 2 Data invariants hold for inputs. 3 Training and serving are not skewed. 4 Models are not too stale. 5 Models are numerically stable. 6 Computing performance has not regressed. 7 Prediction quality has not regressed. normally, when not in emergency conditions. Monitoring Tests Testing & Deployment - ML Test Score
  • 9. Full Stack Deep Learning Scoring the Test Points Description 0 More of a research project than a productionized system. (0,1] Not totally untested, but it is worth considering the possibility of serious holes in reliability. (1,2] There’s been first pass at basic productionization, but additional investment may be needed. (2,3] Reasonably tested, but it’s possible that more of those tests and procedures may be automated. (3,5] Strong levels of automated testing and monitoring, appropriate for mission-critical systems. > 5 Exceptional levels of automated testing and monitoring. Table V L Test Score. THIS SCORE IS COMPUTED BY TAKING THE minimum SCORE FROM EACH OF THE FOUR TEST A MS AT DIFFERENT POINTS IN THEIR DEVELOPMENT MAY REASONABLY AIM TO BE AT DIFFERENT POINTS AL worth the same number of points. This is numbers”). When we asked if they had d9Testing & Deployment - ML Test Score
  • 10. Full Stack Deep LearningFigure 4. Average scores for interviewed teams. These graphs display the average score for each test across the 36 systems we examined. • 36 teams with ML systems in production at Google Results at Google 10Testing & Deployment - ML Test Score
  • 11. Full Stack Deep Learning Questions? 11
  • 12. Full Stack Deep Learning “All-in-one” Deployment or or Development Training/Evaluation or Data or Experiment Management Labeling Interchange Storage Monitoring ? Hardware / Mobile Software Engineering Frameworks Database Hyperparameter TuningDistributed Training Web Versioning Processing CI / Testing ? Resource Management Testing & Deployment - CI/Testing
  • 13. Full Stack Deep Learning Testing / CI • Unit / Integration Tests • Tests for individual module functionality and for the whole system • Continuous Integration • Tests are run every time new code is pushed to the repository, before updated model is deployed. • SaaS for CI • CircleCI, Travis, Jenkins, Buildkite • Containerization (via Docker) • A self-enclosed environment for running the tests 13Testing & Deployment - CI/Testing
  • 14. Full Stack Deep Learning 14 Training System - Process raw data - Run experiments - Manage results Prediction System - Process input data - Construct networks with trained weights - Make predictions Serving System
 - Serve predictions - Scale to demand Training System Tests
 - Test full training pipeline - Start from raw data - Runs in < 1 day (nightly) - Catches upstream regressions Validation Tests
 - Test prediction system on validation set - Start from processed data - Runs in < 1 hour (every code push) - Catches model regressions Functionality Tests
 - Test prediction system on a few important examples - Runs in <5 minutes (during development) - Catches code regressions Monitoring
 - Alert to downtime, errors, distribution shifts - Catches service and data regressions Training/Validation Data Production Data Testing & Deployment - CI/Testing
  • 15. Full Stack Deep Learning 15 and then compare them to the data to avoid an anchoring 1 Feature expectations are captured in a schema. 2 All features are beneficial. 3 No feature’s cost is too much. 4 Features adhere to meta-level requirements. 5 The data pipeline has appropriate privacy controls. 6 New features can be added quickly. 7 All input feature code is tested. Table I BRIEF LISTING OF THE SEVEN DATA TESTS. out a wide variety of potential features to improv quality. How? Programmatically enforce these requ that all models in production properly adhere t Data 5: The data pipeline has appropr controls: Training data, validation data, and voc all have the potential to contain sensitive user teams often are aware of the need to remove per tifiable information (PII), during this type of e 1https://pair-code.github.io/facets/ Data Tests improve reproducibility. 1 Model specs are reviewed and submitted. 2 Offline and online metrics correlate. 3 All hyperparameters have been tuned. 4 The impact of model staleness is known. 5 A simpler model is not better. 6 Model quality is sufficient on important data slices. 7 The model is tested for considerations of inclusion. Table II BRIEF LISTING OF THE SEVEN MODEL TESTS Model Tests Table III BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS 1 Training is reproducible. 2 Model specs are unit tested. 3 The ML pipeline is Integration tested. 4 Model quality is validated before serving. 5 The model is debuggable. 6 Models are canaried before serving. 7 Serving models can be rolled back. country, users by frequency of use, or movies by genre. Examining sliced data avoids having fine-grained quality Infra 1: Training is reproducible: Idea twice on the same data should produce two ide els. Deterministic training dramatically simplifie about the whole system and can aid auditability ging. For example, optimizing feature generatio delicate process but verifying that the old and generation code will train to an identical model more confidence that the refactoring was correc of diff-testing relies entirely on deterministic tra Unfortunately, model training is often not rep practice, especially when working with non-conv ML Infrastructure Tests Table IV BRIEF LISTING OF THE SEVEN MONITORING TESTS 1 Dependency changes result in notification. 2 Data invariants hold for inputs. 3 Training and serving are not skewed. 4 Models are not too stale. 5 Models are numerically stable. 6 Computing performance has not regressed. 7 Prediction quality has not regressed. normally, when not in emergency conditions. Monitoring Tests Testing & Deployment - CI/Testing
  • 16. Full Stack Deep Learning CircleCI / Travis / etc • SaaS for continuous integration • Integrate with your repository: every push kicks off a cloud job • Job can be defined as commands in a Docker container, and can store results for later review • No GPUs • CircleCI has a free plan 16Testing & Deployment - CI/Testing
  • 17. Full Stack Deep Learning Jenkins / Buildkite • Nice option for running CI on your own hardware, in the cloud, or mixed • Good option for a scheduled training test (can use your GPUs) • Very flexible 17
  • 18. Full Stack Deep Learning Docker • Before proceeding, we have to understand Docker better. • Docker vs VM • Dockerfile and layer-based images • DockerHub and the ecosystem • Kubernetes and other orchestrators 18Testing & Deployment - Docker
  • 19. Full Stack Deep Learning 19 https://medium.freecodecamp.org/a-beginner-friendly-introduction-to-containers-vms-and-docker-79a9e3e119b No OS -> Light weight Testing & Deployment - Docker
  • 20. Full Stack Deep Learning • Spin up a container for every discrete task • For example, a web app might have four containers: • Web server • Database • Job queue • Worker 20 https://www.docker.com/what-container Light weight -> Heavy use Testing & Deployment - Docker
  • 21. Full Stack Deep Learning Dockerfile 21Testing & Deployment - Docker
  • 22. Full Stack Deep Learning Strong Ecosystem • Images are easy to find, modify, and contribute back to DockerHub • Private images easy to store in same place 22 https://docs.docker.com/engine/docker-overview Testing & Deployment - Docker
  • 23. Full Stack Deep Learning 23 https://www.docker.com/what-container#/package_software Testing & Deployment - Docker
  • 24. Full Stack Deep Learning Container Orchestration • Distributing containers onto underlying VMs or bare metal • Kubernetes is the open-source winner, but cloud providers have good offerings, too. 24Testing & Deployment - Docker
  • 25. Full Stack Deep Learning Questions? 25
  • 26. Full Stack Deep Learning “All-in-one” Deployment or or Development Training/Evaluation or Data or Experiment Management Labeling Interchange Storage Monitoring ? Hardware / Mobile Software Engineering Frameworks Database Hyperparameter TuningDistributed Training Web Versioning Processing CI / Testing ? Resource Management Testing & Deployment - Web Deployment
  • 27. Full Stack Deep Learning Terminology • Inference: making predictions (not training) • Model Serving: ML-specific deployment, usually focused on optimizing GPU usage 27Testing & Deployment - Web Deployment
  • 28. Full Stack Deep Learning Web Deployment • REST API • Serving predictions in response to canonically-formatted HTTP requests • The web server is running and calling the prediction system • Options • Deploying code to VMs, scale by adding instances • Deploy code as containers, scale via orchestration • Deploy code as a “serverless function” • Deploy via a model serving solution 28Testing & Deployment - Web Deployment
  • 29. Full Stack Deep Learning 29 Training System - Process raw data - Run experiments - Manage results Prediction System - Process input data - Construct networks with trained weights - Make predictions Serving System
 - Serve predictions - Scale to demand Training System Tests
 - Test full training pipeline - Start from raw data - Runs in < 1 day (nightly) - Catches upstream regressions Validation Tests
 - Test prediction system on validation set - Start from processed data - Runs in < 1 hour (every code push) - Catches model regressions Functionality Tests
 - Test prediction system on a few important examples - Runs in <5 minutes (during development) - Catches code regressions Monitoring
 - Alert to downtime, errors, distribution shifts - Catches service and data regressions Training/Validation Data Production Data Testing & Deployment - Web Deployment
  • 30. Full Stack Deep Learning 30 and then compare them to the data to avoid an anchoring 1 Feature expectations are captured in a schema. 2 All features are beneficial. 3 No feature’s cost is too much. 4 Features adhere to meta-level requirements. 5 The data pipeline has appropriate privacy controls. 6 New features can be added quickly. 7 All input feature code is tested. Table I BRIEF LISTING OF THE SEVEN DATA TESTS. out a wide variety of potential features to improv quality. How? Programmatically enforce these requ that all models in production properly adhere t Data 5: The data pipeline has appropr controls: Training data, validation data, and voc all have the potential to contain sensitive user teams often are aware of the need to remove per tifiable information (PII), during this type of e 1https://pair-code.github.io/facets/ Data Tests improve reproducibility. 1 Model specs are reviewed and submitted. 2 Offline and online metrics correlate. 3 All hyperparameters have been tuned. 4 The impact of model staleness is known. 5 A simpler model is not better. 6 Model quality is sufficient on important data slices. 7 The model is tested for considerations of inclusion. Table II BRIEF LISTING OF THE SEVEN MODEL TESTS Model Tests Table III BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS 1 Training is reproducible. 2 Model specs are unit tested. 3 The ML pipeline is Integration tested. 4 Model quality is validated before serving. 5 The model is debuggable. 6 Models are canaried before serving. 7 Serving models can be rolled back. country, users by frequency of use, or movies by genre. Examining sliced data avoids having fine-grained quality Infra 1: Training is reproducible: Idea twice on the same data should produce two ide els. Deterministic training dramatically simplifie about the whole system and can aid auditability ging. For example, optimizing feature generatio delicate process but verifying that the old and generation code will train to an identical model more confidence that the refactoring was correc of diff-testing relies entirely on deterministic tra Unfortunately, model training is often not rep practice, especially when working with non-conv ML Infrastructure Tests Table IV BRIEF LISTING OF THE SEVEN MONITORING TESTS 1 Dependency changes result in notification. 2 Data invariants hold for inputs. 3 Training and serving are not skewed. 4 Models are not too stale. 5 Models are numerically stable. 6 Computing performance has not regressed. 7 Prediction quality has not regressed. normally, when not in emergency conditions. Monitoring Tests Testing & Deployment - Web Deployment
  • 31. Full Stack Deep Learning 31 DEPLOY CODE TO YOUR OWN BARE METAL Testing & Deployment - Web Deployment
  • 32. Full Stack Deep Learning 32 DEPLOY CODE TO YOUR OWN BARE METAL DEPLOY CODE TO CLOUD INSTANCES Testing & Deployment - Web Deployment
  • 33. Full Stack Deep Learning Deploying code to cloud instances • Each instance has to be provisioned with dependencies and app code • Number of instances is managed manually or via auto-scaling • Load balancer sends user traffic to the instances 33
  • 34. Full Stack Deep Learning Deploying code to cloud instances • Cons: • Provisioning can be brittle • Paying for instances even when not using them (auto-scaling does help) 34Testing & Deployment - Web Deployment
  • 35. Full Stack Deep Learning 35 DEPLOY DOCKER CONTAINERS TO
 CLOUD INSTANCES DEPLOY CODE TO YOUR OWN BARE METAL DEPLOY CODE TO CLOUD INSTANCES Testing & Deployment - Web Deployment
  • 36. Full Stack Deep Learning Deploying containers • App code and dependencies are packaged into Docker containers • Kubernetes or alternative (AWS Fargate) orchestrates containers (DB / workers / web servers / etc. 36Testing & Deployment - Web Deployment
  • 37. Full Stack Deep Learning Deploying containers • Cons: • Still managing your own servers and paying for uptime, not compute- time 37Testing & Deployment - Web Deployment
  • 38. Full Stack Deep Learning 38 DEPLOY CODE TO YOUR OWN BARE METAL DEPLOY CODE TO CLOUD INSTANCES DEPLOY DOCKER CONTAINERS TO
 CLOUD INSTANCES DEPLOY SERVERLESS FUNCTIONS Testing & Deployment - Web Deployment
  • 39. Full Stack Deep Learning Deploying code as serverless functions • App code and dependencies are packaged into .zip files, with a single entry point function • AWS Lambda (or Google Cloud Functions, or Azure Functions) manages everything else: instant scaling to 10,000+ requests per second, load balancing, etc. • Only pay for compute-time. 39Testing & Deployment - Web Deployment
  • 40. Full Stack Deep Learning 40 Testing & Deployment - Web Deployment
  • 41. Full Stack Deep Learning Deploying code as serverless functions • Cons: • Entire deployment package has to fit within 500MB, <5 min execution, <3GB memory (on AWS Lambda) • Only CPU execution 41Testing & Deployment - Web Deployment
  • 42. Full Stack Deep Learning 42 and then compare them to the data to avoid an anchoring 1 Feature expectations are captured in a schema. 2 All features are beneficial. 3 No feature’s cost is too much. 4 Features adhere to meta-level requirements. 5 The data pipeline has appropriate privacy controls. 6 New features can be added quickly. 7 All input feature code is tested. Table I BRIEF LISTING OF THE SEVEN DATA TESTS. out a wide variety of potential features to improv quality. How? Programmatically enforce these requ that all models in production properly adhere t Data 5: The data pipeline has appropr controls: Training data, validation data, and voc all have the potential to contain sensitive user teams often are aware of the need to remove per tifiable information (PII), during this type of e 1https://pair-code.github.io/facets/ Data Tests improve reproducibility. 1 Model specs are reviewed and submitted. 2 Offline and online metrics correlate. 3 All hyperparameters have been tuned. 4 The impact of model staleness is known. 5 A simpler model is not better. 6 Model quality is sufficient on important data slices. 7 The model is tested for considerations of inclusion. Table II BRIEF LISTING OF THE SEVEN MODEL TESTS Model Tests Table III BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS 1 Training is reproducible. 2 Model specs are unit tested. 3 The ML pipeline is Integration tested. 4 Model quality is validated before serving. 5 The model is debuggable. 6 Models are canaried before serving. 7 Serving models can be rolled back. country, users by frequency of use, or movies by genre. Examining sliced data avoids having fine-grained quality Infra 1: Training is reproducible: Idea twice on the same data should produce two ide els. Deterministic training dramatically simplifie about the whole system and can aid auditability ging. For example, optimizing feature generatio delicate process but verifying that the old and generation code will train to an identical model more confidence that the refactoring was correc of diff-testing relies entirely on deterministic tra Unfortunately, model training is often not rep practice, especially when working with non-conv ML Infrastructure Tests Table IV BRIEF LISTING OF THE SEVEN MONITORING TESTS 1 Dependency changes result in notification. 2 Data invariants hold for inputs. 3 Training and serving are not skewed. 4 Models are not too stale. 5 Models are numerically stable. 6 Computing performance has not regressed. 7 Prediction quality has not regressed. normally, when not in emergency conditions. Monitoring Tests Testing & Deployment - Web Deployment
  • 43. Full Stack Deep Learning Deploying code as serverless functions • Canarying • Easy to have two versions of Lambda functions in production, and start sending low volume traffic to one. • Rollback • Easy to switch back to the previous version of the function. 43Testing & Deployment - Web Deployment
  • 44. Full Stack Deep Learning Model Serving • Web deployment options specialized for machine learning models • Tensorflow Serving (Google) • Model Server for MXNet (Amazon) • Clipper (Berkeley RISE Lab) • SaaS solutions like Algorithmia • The most important feature is batching requests for GPU inference 44Testing & Deployment - Web Deployment
  • 45. Full Stack Deep Learning Tensorflow Serving • Able to serve Tensorflow predictions at high throughput. • Used by Google and their “all-in-one” machine learning offerings • Overkill unless you know you need to use GPU for inference 45 Figure 2: TFS2 (hosted model serving) architecture. 2.2.1 Inter-Request Batching
  • 46. Full Stack Deep Learning MXNet Model Server • Amazon’s answer to Tensorflow Serving • Part of their SageMaker “all-in-one” machine learning offering. 46
  • 47. Full Stack Deep Learning Clipper • Open-source model serving via REST, using Docker. • Nice-to-haves on top of the basics • e.g “state-of-the-art bandit and ensemble methods to intelligently select and combine predictions” • Probably too early to be actually useful (unless you know why you need it) 47
  • 48. Full Stack Deep Learning Algorithmia • Some startups promise effortless model serving 48
  • 49. Full Stack Deep Learning Seldon 49 https://www.seldon.io/tech/products/core/ Testing & Deployment - Web Deployment
  • 50. Full Stack Deep Learning Takeaway • If you are doing CPU inference, can get away with scaling by launching more servers, or going serverless. • The dream is deploying Docker as easily as Lambda • Next best thing is either Lambda or Docker, depending on your model’s needs. • If using GPU inference, things like TF Serving and Clipper become useful by adaptive batching, etc. 50Testing & Deployment - Web Deployment
  • 51. Full Stack Deep Learning Questions? 51
  • 52. Full Stack Deep Learning “All-in-one” Deployment or or Development Training/Evaluation or Data or Experiment Management Labeling Interchange Storage Monitoring ? Hardware / Mobile Software Engineering Frameworks Database Hyperparameter TuningDistributed Training Web Versioning Processing CI / Testing ? Resource Management Testing & Deployment - Monitoring
  • 53. Full Stack Deep Learning 53 Training System - Process raw data - Run experiments - Manage results Prediction System - Process input data - Construct networks with trained weights - Make predictions Serving System
 - Serve predictions - Scale to demand Training System Tests
 - Test full training pipeline - Start from raw data - Runs in < 1 day (nightly) - Catches upstream regressions Validation Tests
 - Test prediction system on validation set - Start from processed data - Runs in < 1 hour (every code push) - Catches model regressions Functionality Tests
 - Test prediction system on a few important examples - Runs in <5 minutes (during development) - Catches code regressions Monitoring
 - Alert to downtime, errors, distribution shifts - Catches service and data regressions Training/Validation Data Production Data Testing & Deployment - Monitoring
  • 54. Full Stack Deep Learning 54 and then compare them to the data to avoid an anchoring 1 Feature expectations are captured in a schema. 2 All features are beneficial. 3 No feature’s cost is too much. 4 Features adhere to meta-level requirements. 5 The data pipeline has appropriate privacy controls. 6 New features can be added quickly. 7 All input feature code is tested. Table I BRIEF LISTING OF THE SEVEN DATA TESTS. out a wide variety of potential features to improv quality. How? Programmatically enforce these requ that all models in production properly adhere t Data 5: The data pipeline has appropr controls: Training data, validation data, and voc all have the potential to contain sensitive user teams often are aware of the need to remove per tifiable information (PII), during this type of e 1https://pair-code.github.io/facets/ Data Tests improve reproducibility. 1 Model specs are reviewed and submitted. 2 Offline and online metrics correlate. 3 All hyperparameters have been tuned. 4 The impact of model staleness is known. 5 A simpler model is not better. 6 Model quality is sufficient on important data slices. 7 The model is tested for considerations of inclusion. Table II BRIEF LISTING OF THE SEVEN MODEL TESTS Model Tests Table III BRIEF LISTING OF THE ML INFRASTRUCTURE TESTS 1 Training is reproducible. 2 Model specs are unit tested. 3 The ML pipeline is Integration tested. 4 Model quality is validated before serving. 5 The model is debuggable. 6 Models are canaried before serving. 7 Serving models can be rolled back. country, users by frequency of use, or movies by genre. Examining sliced data avoids having fine-grained quality Infra 1: Training is reproducible: Idea twice on the same data should produce two ide els. Deterministic training dramatically simplifie about the whole system and can aid auditability ging. For example, optimizing feature generatio delicate process but verifying that the old and generation code will train to an identical model more confidence that the refactoring was correc of diff-testing relies entirely on deterministic tra Unfortunately, model training is often not rep practice, especially when working with non-conv ML Infrastructure Tests Table IV BRIEF LISTING OF THE SEVEN MONITORING TESTS 1 Dependency changes result in notification. 2 Data invariants hold for inputs. 3 Training and serving are not skewed. 4 Models are not too stale. 5 Models are numerically stable. 6 Computing performance has not regressed. 7 Prediction quality has not regressed. normally, when not in emergency conditions. Monitoring Tests Testing & Deployment - Monitoring
  • 55. Full Stack Deep Learning Monitoring • Alarms for when things go wrong, and records for tuning things. • Cloud providers have decent monitoring solutions. • Anything that can be logged can be monitored (i.e. data skew) • Will explore in lab 55Testing & Deployment - Monitoring
  • 56. Full Stack Deep Learning Data Distribution Monitoring • Underserved need! 56 Domino Data Lab
  • 57. Full Stack Deep Learning Closing the Flywheel • Important to monitor the business uses of the model, not just its own statistics • Important to be able to contribute failures back to dataset 57Testing & Deployment - Monitoring
  • 58. Full Stack Deep Learning Closing the Flywheel 58Testing & Deployment - Monitoring
  • 59. Full Stack Deep Learning 59Testing & Deployment - Monitoring Closing the Flywheel
  • 60. Full Stack Deep Learning 60Testing & Deployment - Monitoring Closing the Flywheel
  • 61. Full Stack Deep Learning 61Testing & Deployment - Monitoring Closing the Flywheel
  • 62. Full Stack Deep Learning Questions? 62
  • 63. Full Stack Deep Learning “All-in-one” Deployment or or Development Training/Evaluation or Data or Experiment Management Labeling Interchange Storage Monitoring ? Hardware / Mobile Software Engineering Frameworks Database Hyperparameter TuningDistributed Training Web Versioning Processing CI / Testing ? Resource Management Testing & Deployment - Hardware/Mobile
  • 64. Full Stack Deep Learning Problems • Embedded and mobile frameworks are less fully featured than full PyTorch/Tensorflow • Have to be careful with architecture • Interchange format • Embedded and mobile devices have little memory and slow/expensive compute • Have to reduce network size / quantize weights / distill knowledge 64Testing & Deployment - Hardware/Mobile
  • 65. Full Stack Deep Learning Problems • Embedded and mobile frameworks are less fully featured than full PyTorch/Tensorflow • Have to be careful with architecture • Interchange format • Embedded and mobile devices have little memory and slow/expensive compute • Have to reduce network size / quantize weights / distill knowledge 65Testing & Deployment - Hardware/Mobile
  • 66. Full Stack Deep Learning Tensorflow Lite • Not all models will work, but much better than "Tensorflow Mobile" used to be 66Testing & Deployment - Hardware/Mobile
  • 67. Full Stack Deep Learning PyTorch Mobile • Optional quantization • TorchScript: static execution graph • Libraries for Android and iOS 67
  • 68. Full Stack Deep Learning 68 https://heartbeat.fritz.ai/core-ml-vs-ml-kit-which-mobile-machine-learning-framework-is-right-for-you-e25c5d34c765 CoreML - Released at Apple WWDC 2017 - Inference only - https://coreml.store/ - Supposed to work with both CoreML and MLKit - Announced at Google I/O 2018 - Either via API or on-device - Offers pre-trained models, or can upload Tensorflow Lite model Testing & Deployment - Hardware/Mobile
  • 69. Full Stack Deep Learning Problems 69Testing & Deployment - Hardware/Mobile • Embedded and mobile frameworks are less fully featured than full PyTorch/Tensorflow • Have to be careful with architecture • Interchange format • Embedded and mobile devices have little memory and slow/expensive compute • Have to reduce network size / quantize weights / distill knowledge
  • 70. Full Stack Deep Learning • Open Neural Network Exchange: open-source format for deep learning models • The dream is to mix different frameworks, such that frameworks that are good for development (PyTorch) don’t also have to be good at inference (Caffe2) 70Testing & Deployment - Hardware/Mobile Interchange Format
  • 71. Full Stack Deep Learning 71Testing & Deployment - Hardware/Mobile Interchange Format https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-onnx
  • 72. Full Stack Deep Learning Problems • Embedded and mobile frameworks are less fully featured than full PyTorch/Tensorflow • Have to be careful with architecture • Interchange format • Embedded and mobile devices have little memory and slow/expensive compute • Have to reduce network size / quantize weights / distill knowledge 72Testing & Deployment - Hardware/Mobile
  • 73. Full Stack Deep Learning NVIDIA for Embedded • Todo 73Testing & Deployment - Hardware/Mobile
  • 74. Full Stack Deep Learning Problems 74Testing & Deployment - Hardware/Mobile • Embedded and mobile frameworks are less fully featured than full PyTorch/Tensorflow • Have to be careful with architecture • Interchange format • Embedded and mobile devices have little memory and slow/expensive compute • Have to reduce network size / quantize weights / distill knowledge
  • 75. Full Stack Deep Learning 75 Standard 1x1 Inception https://medium.com/@yu4u/why-mobilenet-and-its-variants-e-g-shufflenet-are-fast-1c7048b9618d MobileNet Testing & Deployment - Hardware/Mobile MobileNets
  • 76. Full Stack Deep Learning 76 https://medium.com/@yu4u/why-mobilenet-and-its-variants-e-g-shufflenet-are-fast-1c7048b9618d Testing & Deployment - Hardware/Mobile
  • 77. Full Stack Deep Learning Problems 77Testing & Deployment - Hardware/Mobile • Embedded and mobile frameworks are less fully featured than full PyTorch/Tensorflow • Have to be careful with architecture • Interchange format • Embedded and mobile devices have little memory and slow/expensive compute • Have to reduce network size / quantize weights / distill knowledge
  • 78. Full Stack Deep Learning Knowledge distillation • First, train a large "teacher" network on the data directly • Next, train a smaller "student" network to match the "teacher" network's prediction distribution 78
  • 79. Full Stack Deep Learning Recommended case study: DistillBERT 79 https://medium.com/huggingface/distilbert-8cf3380435b5
  • 80. Full Stack Deep Learning Questions? 80
  • 81. Full Stack Deep Learning Thank you! 81