Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"

Continuous Delivery of ML Pipelines
to Production
Oleksii Moskalenko
WIX

In the late 1990s
software development was:
● Invaluable capability for improving business outcomes
● Under the transformation from solo practitioners/small teams to large collaborative
teams
● Undergoing a rapid evolution of best practices and tooling
● Heavily in demand for competent practitioners
Source: https://blog.dominodatalab.com/joel-test-data-science/

Joel Test
to rate the quality of a software team (August 2000)
1. Do you use source control?
2. Can you make a build in one step?
3. Do you make daily builds?
4. Do you have a bug database?
5. Do you ﬁx bugs before writing new code?
6. Do you have an up-to-date schedule?
7. Do you have a spec?
8. Do programmers have quiet working conditions?
9. Do you use the best tools money can buy?
10. Do you have testers?
11. Do new candidates write code during their interview?
12. Do you do hallway usability testing?
Source: https://www.joelonsoftware.com/2000/08/09/the-joel-test-12-steps-to-better-code/

Joel Test for Data Science
1. Are results reproducible?
2. Do you create a data pipeline that you can rebuild with one command?
3. Do you rebuild pipelines frequently?
4. Can Data Scientists deploy their models with minimal dependencies on
engineering and infrastructure?
5. Do you use source control?
6. Do you track bugs in your models and your pipeline code?
7. Do you translate model performance to commercial KPIs?
Source: http://guerrilla-analytics.net/joel-test-of-data-science-maturity/
(edited)
version
write auto tests for
eﬃciency

Code Reuse
What we do foremost while setting up a new project:
1. Create an environment
2. Get data
3. Copy/Paste data processing code from the previous project
Therefore, each Data Scientist has his own unique
carefully crafted environment which is also not aligned
with production.
(not exactly)

Data processing impact
PIL vs. OpenCV

Code Reuse
preprocessing_pipeline = ImagePipeline(
pipe=[
ops.validation.SizeLimit(max_pixels=9000000),
ops.image.Resize(target=(500, 500)),
ops.scale.Scale0to1(),
]
)
inputs = preprocessing_pipeline.tf_generator(input_path="raw/*.jpg")
labels =
preprocessing_pipeline.tf_generator(input_path="labels/*.jpg")
dataset = tf.data.Dataset.zip((
tf.data.Dataset.from_generator(inputs),
tf.data.Dataset.from_generator(labels)
))
● Recreate your prod environment
in research
● Choose eﬃcient dependencies
● Make data processing
serializable
● Universal solution for pytorch,
tensorﬂow

Meet the BA guild
01
Meet the BA guild
01Do you create a data pipeline you can
rebuild with one command?
2

Pipeline consists of
● Pre-processing (python) code
(usually shared between models / maintained by engineers)
● Estimator - eg. Tensorﬂow
(usually stored as binary ﬁle with weights and graph)
● Post-processing (python) code
(usually custom code that is frequently updated by Researcher)

Code Publishing
1. Organize your project as python package
2. You now have uniﬁed API to
install dependencies / run tests / etc
3. Your code could be installed anywhere
from
PYPI / github
Local Python Package Index
(PyPI)

Declare serializable Pipeline
● Good example: Spark MLPipeline
● Bad example: scikit-learn
serialization
(avoid pickling at any price)
● Better: custom solution
Python -> YAML -> Python
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(),
outputCol="features")
lr = LogisticRegression(maxIter=10, regParam=0.001)
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
from ds_auto_enhance.ops import increase_brightness
pipeline = Pipeline(
name='auto-enhance',
pipe=[
preprocessing_pipeline,
TensorflowEstimator.from_storage(model_path),
increase_brightness(),
ScaleToUInt()
]
)

Declare serializable Pipeline
with YAML
steps:
- func: ds_sdk.ops.images.LimitSize
params:
limit: !ref 'configs/MAX_PIXELS_COUNT'
- func: ds_sdk.ops.images.Resize
params:
method: LINEAR
target_size: !!python/object/new:ds_sdk.image.ImageSize
- 256
- 256
type: py_lib_func
- func: ds_sdk.ops.common.Standartize
params:
mean: [0.485, 0.456, 0.405]
std: [0.229, 0.224, 0.224]
type: py_lib_func
- path: gs://models/object-segmentation/
type: tf_model
- func: ds_object_segmentation.ops.PostProcess
lib_name: ds_object_segmentation
● Readable and ﬂexible format
● Straightforwardly serialize simple
objects (data classes)
● Can be strictly validated with
YAML scheme
● Easily extendable

Meet the BA guild
01
Meet the BA guild
01Do you rebuild pipelines frequently?
Can Data Scientists deploy their models
by themselves?
3/4

Model Registry
pipeline = Pipeline(
name='auto-enhance',
pipe=[
preprocessing_pipeline,
TensorflowEstimator.from_storage(model_path),
increase_brightness(),
ScaleToUInt()
]
)
pipeline.predict(test_image)
pipeline.deploy()
Model Registry
Evaluator

Meet the BA guild
01
Meet the BA guild
01
Do you use version control?
5

Each deploy bumps version automatically
model:
name: object-segmentation
version: 1.0
steps:
- ...
- func: ds_object_segmentation.ops.PostProcess
lib_name: ds_object_segmentation
lib_version: 2.1.1
● Model versioning enables A/B tests
● Pipeline version must be attached to code version
● Manage diﬀerent library versions with pkg_resource

Meet the BA guild
01
Meet the BA guild
01Do you write auto-tests for your
models and your pipeline code?
6

Testing approaches
● Smoke testing based on the pipeline type
(ImageToImage, ImageToBoundingBox, etc)
We collect test set for each type
● Unit tests for custom operations written by Researches
(sounds like utopia)
● “Behavior-Driven Development” style - divide job between
Researchers and Engineer

BDD for e2e tests
example 1
Feature: Face Detector
A service that returns coordinates of
bounding boxes around faces on given image
Scenario: Simple Check
Given loaded pipeline face detector-1.0
Given image stored at test_data/1.jpg
When I call pipeline
Then I should receive response with 1
bounding-box
@given(parsers.parse("image stored at {image_path}"))
def input_image(image_path):
return Image.open('tests/' + image_path)
@given(parsers.parse("loaded pipeline {name_and_version}"))
def pipeline(name_and_version):
name, version = name_and_version.rsplit('-', 1)
return find_pipeline(name, version)
@when(parsers.parse('I call {pipeline_name}'))
def call(input_image, pipeline):
pipeline(input_image)
@then(parsers.re(r'I should receive response with
(?P<expected>d+) (?P<object_type>S+)'),
converters={'expected': int})
def verify(pipeline, expected, object_type):
assert len(pipeline.result['faces']) == expected

@pytest.fixture
def database():
return {}
@when(parsers.cfparse("image {key} with labels
{labels:Label+}", {'Label': str}))
def input_image(database, key, labels):
database[key] = labels
@given(parsers.parse("loaded pipeline {name_and_version}"))
def pipeline(name_and_version):
name, version = name_and_version.rsplit('-', 1)
return find_pipeline(name, version)
@when(parsers.re(r'I search for (?P<search>.+)'))
def call(database, pipeline, search):
pipeline(search, context=database)
@then(parsers.parse(r'I should receive list that starts with
{expected}'))
def verify(pipeline, expected):
assert pipeline.result[0] == expected
Feature: Semantic Search
A service that finds the closest
(semantically) image to given paragraph
Scenario: Search
Given loaded pipeline semantic-search-1.0
When image 1.jpg with labels train,mountain
And image 2.jpg with labels car,highway
And I search for railway station
Then I should receive list that starts with
1.jpg
BDD for e2e tests
example 2

Behavior-Driven Development
for Data Science
● Good for end-to-end testing
● Separation of concerns between Researcher and Engineer
● Code can be easily reused across various models
● Easy to write scenarios & read reports

Do you translate model
eﬃciency to commercial KPIs?
7

Get maximum from your hardware
(obvious steps)
● Compile your kernels on target hardware (get benefits from SIMD, FMA)
● Fixed shapes enable intermediate compilation (eg XLA)
It’s sometimes much more efficient to have bucketing + padding + several fixed models
● Always (like seriously always) have separated graphs for training & inference
(especially if you want to compile model)
● (For TF) Use NCHW instead of NHWC, where C > 16
It actually will be converted HCHW[c]
Reason: CPU cache and vector operations
To fully support NCHW you need compile tensorflow with MKL

● Quantization:
○ TensorRT offers float16 for GPUs
○ Intel offers int8 for CPUs
○ Google offers bfloat16 for TPUs
● TVM - convert your model into low-level with LLVM
TF/PyTorch/ONNX -> Relay -> Bytecode
(open-sourced technology behind Amazon SageMaker Neo)
Get maximum from your hardware
(advanced steps)

Joel Test for Data Science
✓ Are results reproducible?
✓ Do you create a data pipeline that you can rebuild with one command?
✓ Do you rebuild pipelines frequently?
✓ Can Data Scientists deploy their models with minimal dependencies on
engineering and infrastructure?
✓ Do you use version control?
✓ Do you write-auto tests for your models and your pipeline code?
✓ Do you translate model eﬃciency to commercial KPIs?

Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"

Semelhante a Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production" (20)

Mais de Fwdays

Mais de Fwdays (20)

Último

Último (20)

Oleksii Moskalenko "Continuous Delivery of ML Pipelines to Production"