Operationalizing Machine Learning in the Enterprise

Operationalizing
Machine Learning
in the Enterprise
TDWI / BARC
Munich
June 22, 2019
Mark Madsen

Copyright Third Nature, Inc.
PSA: The reality behind most ML/AI production case studies
TL;DR Embedded ML and AI is
harder than people realize.
Most companies will not be
able to do it at the current state
of IT and market maturity.

Copyright Third Nature, Inc.Copyright Third Nature, Inc.
Overview
We won’t:
▪ Talk much about development and building of
models, since this is about operations
▪ Talk about technology or techniques, since this is
about operations
We will:
▪ Talk a lot about concepts, observations, and practices

What does analytics management care about?
There is a key stakeholder:
analytics management - the
CAO, CDO, VP of analytics, aka
“your boss” if you’re a data
scientist.
The perspective and problems
of the person responsible for
oversight of the team and
efforts is across the
organization and across
multiple projects

Explainability (or interpretability)

Job #1 - Repeatability

Job # 2 - Operational predictability

Job #3 - Reproducibility

Scientific
reproducibility
Can you get the same
results given the same
starting conditions?
▪ Experimental replication
may not be the same for
many reasons
▪ Detailed statistical
analysis is needed
▪ And critical assessment
of experiments

Data science reproducibility
Can you get the same
results on the same
data?
▪ Direct replication is
expected
▪ Assumption is that
same input = same
output
▪ You can have this with
an unexplainable box
▪ But there are also
confounding factors

Interpretability and reproducibility are driven by trust,
which only matters when there is enough risk
Regulation and compliance
Material decisions
Big penalties
Complication: the cost of false
positives and false negatives
are usually different, so model
characteristics matter here.
Cost of error
Frequency of
decision
Low High
High
Don’t care
Oh crap

The real questions
Can I support this answer at a later date?
▪ Do I care?
• Is the risk (cost) worth worrying about?
• Is the cost of reproducibility less than the risk and cost?
▪ Do I only need to justify it?
• interpretability and trust may be enough
▪ Do I need to reproduce the results?
• May not need interpretability
• Need a lot of other things

The real need is trust. Our trust is
based on all the elements that
are involved, not just the model.
The higher the stakes the more
you must think about all the ways
it could be wrong, because we all
want to be right.
Reliability and robustness of the
technology environment is as
important as the model.
Reproducibility is everyone’s
problem – it is an operational
concern.

The technology components are relevant to reproducibility

Starting with the process everyone does: building stuff
Slide 15
Define the
business problem
Translate the
problem into an
analytic context
Select appropriate
data
Learn the data
Create a model set
Fix problems with
data
Transform data
Build models
Assess models
Deploy models
Assess results
% of time spent
70% 30%
Source: Michael Berry, Data Miners Inc.

16
"Always design a thing by considering it in its next larger context - a
chair in a room, a room in a house, a house in an environment, an
environment in a city plan." – Eliel Saarinen

Expanding the perspective beyond the initial bit
• There are upstream parts to the development process:
collecting and managing data, both for dev and in prod.
• There are downstream parts, in deployment and then in
production operation.
• Data and artifacts are exchanged as part of the workflows
Collect DeployBuild Operate

Deploying autonomous ML is one of the biggest challenges

The operation workflow itself is complicated
Sense Process Interact Learn
Learning: could be human methods (manual adjustment) or
machine methods (e.g. reinforcement learning), which change the
sensing and processing.
Collect inputs
AKA clean data
“reasoning”
Execute model
to select and
post actions
“acting”
Perform the
desired actions
Measure,
compare,
adjust
Inputs Actions Obs
Operate

Feedback requires lots of data that you must record
Data volumes explode with all the telemetry:
1 execution = raw data in, inputs, the action, each metric used
(expected values), execution log, metric data (actuals), deltas,
model changes, technical resource information
Record the
inputs
Record the action,
the expected
outcome (tracking
metrics and OEC
Record the
execution.
Record the
metric deltas
Record the
changes
Inputs Actions Obs

Criteria for models: not just accuracy!
You must track performance in development and production relative
to the metrics that are most important, in addition to the OEC.
Predictive
Accuracy
Highly accurate
model
SpeedFast processing
time for training
and test, OR for
execution
Simplicity
Resulting model has few
parameters and is easy
to monitor and explain
Robustness
Results are
stable over time
Scalability
Model can handle growing
data volume and/or high
concurrency
Interpretability
Easy to
understand
model results

ML is not like code: it can get better in production
ML in production usually starts out at the expected level
and improves over time, if you are doing it right.
Continuous improvement is the norm.
ML always goes wrong at some point. The best way to
protect against that is to constantly monitor and test
models on real production data.
e.g. sometimes your training data is not representative in
unexpected ways.
Excessive Invariance Causes Adversarial Vulnerability
https://openreview.net/forum?id=BkfbpsAcF7

A (contrived) example: Detect dogs

Dog or Not Dog?

Dog or Not Dog: 100%! Um, why?

Beyond toy examples, this problem matters
The AI edge case error problems
will limit AI applicability until they
are solved (don’t hold your breath).
The uncertainty challenge means:
▪ Calculate error costs and apply them
to your model before (and after)
▪ Use AI for low cost errors or HitL

You need to protect against model execution problems
You have to track the actions / executions and their results,
including the OEC, in real time, to protect against failures.
This adds monitors and circuit breakers.
Inputs Actions Obs
MonitorBreaker

ML Principle: CACE, Change Anything Change Everything
In embedded or
autonomous ML
everything is
connected.
Events usually
happen in real time.
ML is very sensitive
to context and input.
“So what if I
changed NumPy
in dev?”

“A production ML system is never all green”
Much of the time, the ML app is a distributed system.
Distributed systems are hard.
Monolithic architectures are great if you can use them.
This is fine.
Everything is fine

Not just protection - diagnostics
You need telemetry about the entire environment for monitoring,
but you also need it for diagnostics.
This means you need to think about observability.
Inputs Actions Obs
MonitorBreaker

Ask yourself “What could go wrong?” and you’ll
probably be right.

Therefore: Test in Production?
Bad right? Goes against everything IT says about testing
▪ This is a culture change for IT, and a hard one to make.
But production is always-on, real time. Conditions are
constantly changing. You can’t replicate the environment.
If ML is highly sensitive to conditions, and you can’t
replicate the conditions exactly, then… test in production.
▪ On real data
▪ With real network and server configurations
▪ And real concurrency
But also:
▪ Keep testing in dev/QA and CI/CD environments.

ML is not like code: Monitoring in production
Unlike BI, ML has different metrics for “correct”
The metrics are relative and can change over time
You must monitor performance closely, which is
like doing BI on your AI.
“observability”, because a problem may not be
the model but the data, or the infrastructure.
▪ Reduce the time to diagnose, rather than
emphasizing the prevention of coding errors

Machine learning is the smallest part of the environment
ML
Code
Analysis Tools
Data
Collection
Machine
Resource
Management
Serving
Infrastructure
Feature
Extraction
Configuration
Data
Verification
Process
Management
Tools
Monitoring
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems
©2018 Teradata

How an organization works

Many applications, many activities

How do you get the full picture?
What does a person do when there’s a problem?
One report from each application isn’t a sustainable answer
?

The goal is to make decisions, not get reports
This is the “decision support” model
Analyze
Decide
Act
KPIs, metrics

We had batch (and stream) models decades ago
Analyze
Decide
Act
e.g. segmentation, NBO queue, churn, fraud
Usually batch, ran from the DW (but probably not on it),
resulting data loaded into the DW for use
Someone oversees and acts on the information
Analyze
KPIs, metrics

Applying analytics within a process context
Analyze
Decide Act
e.g. purchasing changes, upsell/cross-sell recommendations
Machines gain agency, humans lose it; “act” is curtailed
This is the human-in-the-loop model

When there’s a problem, the fix is a message
The model’s results should be visible via the KPIs and metrics
Act: People call people to see what’s happening
KPIs, metrics

Somebody built the models – the data scientist
More communication is required
The data scientist needs to observe and
change system behaviors

Enter the black boxes – the “autonomous” model

Black boxes still need oversight

Black boxes beget gray boxes because of speed

Three ML deployment categories
Decision support
(aka BI) is final
arbiter of success.
Autonomous
Human in
the loop
Decision
support

Distributed Agency
Today’s application
model of ML is usually
embedded in a fixed
central system.
Governing a model and
it’s application is more
complex when the ML
system is not controlled
via a central service.
This level of autonomy
is still in its infancy. Cf
Roomba

Three ML deployment categories
Autonomous
Human in
the loop
Decision
support
All of these
independent /
separate
architectures
are dependent
on some level
of shared
context.
That means
shared
operational
data, managed
over time.

ML and AI have a lot of requirements: no shortcuts
https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007

“Eliminate the time spent on data prep” – Nope
The work you do on the data is what makes it valuable.
You can’t eliminate the prep work without eliminating
good models. Instead, optimize workflows where most of
the time is spent.

The Lake + self-service model: Individuals get and
manage their own data, Yes but…

BYOT can lead to extreme behaviors

Self-service tradeoffs
Pay now or pay later,
but you will always pay.
The question is which
payment will be less.
Self-service gives
flexibility and agility, but
can reduce repeatability
and add duplicated
effort and conflicts, and
an increase in risk.

A Data Science Approach to data
One-Pipeline-
Per-Process
Redundant Effort /
Cost / Complexity /
etc.
WELL
HEAD
WELL
HEAD
Example use cases

Models in production at massive scale?
Most organizations have a project-based approach. This
makes it easy to deliver new projects.
With the silo/pipeline approach:
▪ If each model takes X% of effort to maintain, how many
models can you build before you use up 100% of your time?
▪ Automation helps, a little.
▪ Efficiency helps more.
The projects-as-silos
and pipeline approach
will not work when
running models in
production at the
massive scale required
for total automation
Numberofmodelsinproduction
2019 2029
Staffing

Moving from individual to shared environments
is harder than most vendors lead you to believe

It takes more than common tools to create a
functional environment

The enterprise focus needs to be on
repeatability - where it can be supported

The nature of data science and BI differs
• In data science, the data is unknown at the start. The process
creates a data model. The same schema may not be reusable.
• The equivalent to a report is not a model. That would be the
model’s output. The equivalent to a model is more like ETL.
• Data science may require access to more than one zone.
1 MargeInovera $150,000 Statistician
2 AnitaBath $120,000 Sewerinspector
3 IvanAwfulitch $160,000 Dermatologist
4 NadiaGeddit $36,000 DBA
Source data Model extract Models

Managing data is a bigger problem than bigness

Data can be maintained at multiple levels: not raw or DW
Ingredients
Goal: available
User needs a recipe
in order to make
use of the data.
Pre-mixed
Goal: discoverable
and integrateable
User needs a menu
to choose from the
data available
Meals
Goal: usable
User needs utensils
but is given a
finished meal

We need a discipline of AnalyticOps
We need to enable the
full end-to-end lifecycle.
No product will do this –
it’s a workflow, process,
and architecture problem.
external data
iteration
data-mining
statistics
value-driven
flexibility
exploration
discovery
modelling
blue-sky ideation
ANALYTICS
OPERATIONS
security
governance
compliance
curation
deployment
maintenance
integration
testing
engineering
process-driven
Plan and
Measure
Develop
and Test
Release
and
Deploy
Monitor
and
Optimize
©2018 Teradata

Culture
The hard problem
is changing the
organization so
that it more
readily challenges
the rationale for
decisions, uses
data to back up
the discussion, and
creates new
explanations.

Moving from predictable rule-based systems to complex
mathematical systems, and from there to systems that
exhibit stochasticity, makes the task harder, not different.
One thing worse than
a black box is a
random black box.

Culture: Experimental Mindset
Sometimes you can’t build the thing
you want (meet the required OEC)
▪ ML is experimental, you should fail
▪ Budget to experiment – and fail?
▪ Data: type, quality, amount
▪ Technique: theoretical limits, appropriateness
▪ Feasiblity: technical, resources and time
Useful background for online experiments
https://www.researchgate.net/publication/316116834_
Online_Controlled_Experiments_and_AB_Testing
https://ai.stanford.edu/~ronnyk/2007GuideControlledEx
periments.pdf

Analysts and engineers work from opposing directions
exploration
modeling
integration
applications
infrastructure
help people ask the right questions,
frame them, define measurable goals
define models that run to determine
answers or carry out actions
deliver the results / product in
production, at scale
build data science models into
applications and delivery systems
provide the systems and practices to
build and run the desired models

Mark Madsen is an engineering fellow
at Teradata. Prior to that he was
president of Third Nature, a research
and consulting firm focused on
analytics, data integration and data
management. Mark is an award-
winning architect, author, and CTO
whose work has been featured in
numerous industry publications. He is
an international speaker and is
involved with several conferences in
the data science and analytics industry.
Mark Madsen

Operationalizing Machine Learning in the Enterprise

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Operationalizing Machine Learning in the Enterprise

Semelhante a Operationalizing Machine Learning in the Enterprise (20)

Mais de mark madsen

Mais de mark madsen (17)

Último

Último (20)

Operationalizing Machine Learning in the Enterprise