Data Science Innovation Summit Philadelphia 2019 - pariveda

1 © Pariveda Solutions. Confidential & Proprietary.
Principal Architect, Chicago
Worked with a couple of the Fortune 500 to use data as an asset, and
others to build AWS solutions and strategies.
10 years of technology leadership with Pariveda Solutions and General
Electric
Cloud architect, big data wrangler, professional hobbyist, amateur
cyclist, and lifelong Pittsburgh Steelers fan. Entertainer to many, and
entertained by life’s mysteries.
RYAN
Gross

Table of Contents
Machine Learning and The Business
Lay the Foundation with Data Engineering
Operationalizing Data Science
Questions
Appendix

from automation, improved asset utilization, DevOps optimization, etc.
Net valuation of the world’s largest
companies

companies

L
U
S
E
O
N
L
Y
D
O
N
O
T
D
S
T
R
B
U
T
E

VAPORIZED
ML as a
Vaporizing Agent
What does it mean
to be ‘Vaporized’?
The Software-
Defined Society

Democratizing Machine Learning
Automatically build and evaluate
hundreds of models in parallel
Where state of the art algorithms are
always live and accessible to anyone
Accelerate the development and delivery of
models with infrastructure automation, seamless
collaboration, and automated reproducibility
Build, train, and deploy machine
learning models at scale

Democratizing Machine Learning

ORGANIZATIONS ARE
SOLVING REAL-WORLD
PROBLEMS WITH
MACHINE LEARNING
Pediatrics hospital predicts individual patients risk
of contracting Central Line-Associated Blood
Stream Infections (CLBASIs) up to 3 days earlier
than normal infection detection, allowing them to
take action to minimize this risk
Oilfield equipment provider predicts the win/loss probability
of bids based on market conditions, customer information,
and historical sales data. They can easily see the impact of
different pricing strategies, speeding the quoting process
and increasing likelihood of a sale
Retail energy company forecasts the electricity usage of
individual customers instead of aggregate groups.
Customers are provided with an increased awareness of
their upcoming energy costs and the energy company can
better hedge energy prices
Heavy equipment manufacturer predicts that a part is going to fail up to
3 weeks before failure. This allows customers avoid expensive lost
productivity due to unexpected downtime. Customers also reduce
maintenance costs by only replacing parts as they are about to fail,
rather than 20% of parts during each maintenance window

• Building Data Lake
Platforms & Data
Governance processes to
make data available
• Change management
being utilized to adopt ML
solutions across the
business
• Filling out roles on Data
Science team
• Business actively looking
• Executing against a value-
driven backlog of ML
opportunities
• Scaling up the data science
function, supplementing with
platforms
• Business understands data-
driven decision making,
• Utilizing controlled
experiments during roll-out
• Putting first Machine
Learning MVP solutions
into production
• Educating the business
on utilizing ML
predictions
• Sourcing data for ML
models ad-hoc
• Hiring leadership roles
on data science team
• Not currently in position to
build a production
machine learning solution.
• Require major Data
Engineering to get ready.
• Can also implement POCs
where data is ready to
show the benefits of ML to
executives
STORMINGFORMING NORMING PERFORMIN
G
Readiness for DS Value Realization

Software
Engineering
Data
Engineering
Data
Science
Data Scientists spend ~80% of their time on activities that are not core to their skills sets. A team with varied skills
can help focus scarce Data Science resources.
Building a Value Realization Team
$$$ $$ $$
Business
Analysis

Building a Value Realization Pipeline
Metrics
Prototypes
& LEARN
Identify, Assess,
and Prioritize
Experiment and
Learn
Insights
Opportunities
Deploy, Test,
and Run

Building a Value Realization Pipeline
Sprint
Release
Planning
Weekly
Work Item
Conversation
Design Develop
Test
Peer Review
Regression
Testing
Acceptance
Testing
Deploy to Dev
Code Analysis
Iteration
Planning
Backlog
Grooming
Status
Meeting
Deploy to Test
Visual Design
System Docs
& Test Scripts
Code Review
Facilitated
UAT
User
Validation
QA Testing
Potentially
Shippable
Product
Iteration
Review
Retro
Deploy to
Stage
Automated UI
Regression
Testing
API Load &
Performance
Testing
Integration&
Performance
Testing
Holistic
Usability
Testing
Deploy to
Production
Product Release
Conceptualization
Learn & Repeat
Identify,
Assess, and
Prioritize
Business
Focused
Leveraged Use
Cases
Opportunity
Mapping
Process
Impact
Technical
Complexity
Business
Value
Value
Mapping
Generate
Concept Cards
Research Spike
Learn & Repeat
Business
Assessment
Data
Availability
Assessment
Operationaliza
tion
Assessment
Data
Acquisition
Data
Understanding
Data
Validationand
Cleanup
Feature
Engineering
Model
Development
Model
Evaluation
Deployment Spike
Learn & Repeat
Testing
Deployment
Change
Management
Monitoring
Automation
Performance
Tuning
Experiment
and
Learn
Deploy,
Test, and
Run
Concept Card
Review
Research
Spike Results
Deployment
Spike Results
User Stories
Progress
Concept Cards
Pivot | Persevere | Promote | Quit Pivot | Persevere | Promote | Quit
Software
Engineering
Data
Engineering
Data
Science$$$ $$ $$
Business
Analysis

V V VVolume Velocity Variety

V V V V VVolume Velocity Variety Veracity Value

Data Forge: Modern Data Pipeline

LEVERAGING YOUR DATA
PLATFORM
ENABLING TECHNOLOGIES
ENABLING CAPABILITIES
OUTCOMES
A well structured and maintained data lake supports and integrates
better with enabling technologies and implemented use cases
Validation Engine
Data Platform
Analytics
Data Catalog Lineage Tracker Notifications / Logging
Directory Service
Integration
Data Provisioning Infrastructure as Code
Structure Formats
New Data Product
Development
Maintenance and
Operations
Data Governance Experimentation
AgilityData Quality Security
Return on
Investment
Visibility Compliance
Technology
People Process

Data Forge: Modern Data
Governance
GOVERNANCE IS NOT A PROJECT
It is a level of rigor that must be applied continuously.
Motivation to govern is driven by demonstrating the value of the data.
The activities of governance have not changed, but they are applied as
needed.
EXPERIMENTATION MUST BE ENABLED
Governance shouldn’t hinder the ability to experiment with new ideas quickly.
It should be applied after the value of the data has been established.
Raw data should be easy to discover, access, and understand
Traditional
Governance
(75% failure rate)
Value Driven
Governance

Data Forge: Experiment to Show
Value
Data Lab Experimentation Platform Prototypes
DATA LAKE

IDEA
Engage internal stakeholders, capture new business ideas and fill-in Concept Cards
Ideate - Create Concept Cards
GOOD
IDEAReview Concept Cards w/ Executive Leadership, prioritize and select concepts to research further
Review & Prioritize
REAL
IDEA
Engage external & related parties to formalize dependencies; Develop mock-ups and/or prototypes
Research & Prototype
VIABLE
IDEA
Fully spike prototype for subset of customers or users
Focused/Market Test
Production launch of product/service after incorporating lessons learned from Market Test
Release
CONCEPT
“LINE OF FEASIBILITY”
REALITY
TO
DEVELOPMENT PHASES
IDEA
MATURATIO
N
VALUABLE
IDEA
Moving Data Science to Production

Set up RESTful APIs to score new examples and retrain your model periodically if necessary
ModelOps: Deploying a Machine Learning Model to Production
Hey, I need a prediction!

Machine Learning Model
Step 1
Do some manual data
science and create a
predictive model

Score
Step 1
Do some manual data
predictive model
Step 2
Deploy a score API that can
return predictions for users
and calling applications

Score
Step 1
Do some manual data
predictive model
Time to update our model!
Train, Validate, Deploy
Step 2
Step 3
Deploy an API that can
retrain, validate, and deploy
models and potentially setup
a timer job to hit that API

Score
Step 1
Do some manual data
predictive model
Time to update our model!
Step 2
Step 3
Deploy an API that can
retrain, validate, and deploy
models and potentially setup
a timer job to hit that API

Enhanced Sales
Transactions
Weighted Average
Sales by customer
(Daily) Weighted Average
Sales (Weekly,
Monthly, etc.)Buying pattern by
customer (used for
fraud in Enhance)
Enhanced Product
Manufacturing
Records
Average cost per
product by
location
Weather report
and predictions
Weather impact
factor by customer
Predicted
Customer Revenue
(Daily)
Predicted
Customer Profit
(Daily)
…
ModelOps: Chaining a Machine Learning Model in Production
You can use the modern data platform to monitor problems and alert or act on them
Weighted Average
Sales by customer
(Hourly) Weighted Average
Sales (Daily,
Weekly, Monthly)

Prediction Accuracy:
Customer Revenue
(Weekly, Monthly)
Enhanced Sales
Transactions
Weighted Average
Sales by customer
Sales (Weekly,
customer (used for
fraud in Enhance)
Enhanced Product
Manufacturing
Records
Average cost per
product by
location
Weather report
and predictions
Weather impact
factor by customer
Predicted
Customer Revenue
(Daily) Predicted
Customer Revenue
(Weekly, Monthly)
Predicted
Customer Profit
(Daily)
…
ModelOps: Monitoring a Machine Learning Model in Production
Weighted Average
Sales by customer
Sales (Daily,
Weekly, Monthly)

Prediction Accuracy:
Customer Revenue
(Weekly, Monthly)
Enhanced Sales
Transactions
Weighted Average
Sales by customer
Sales (Weekly,
customer (used for
fraud in Enhance)
Enhanced Product
Manufacturing
Records
Average cost per
product by
location
Weather report
and predictions
Weather impact
factor by customer
Predicted
Customer Revenue
(Daily) Predicted
Customer Revenue
(Weekly, Monthly)
Predicted
Customer Profit
(Daily)
…
ModelOps: Monitoring a Machine Learning Model in Production
Alert the team!
Weighted Average
Sales by customer
Sales (Daily,
Weekly, Monthly)

ModelOps: Model-Building Flow w/ Artifact Management
Manage
notebook
development
in source
control
Use S3 Object
Versioning to Ensure
Data Consistency
Build Docker
Images for specific
algorithms in Code
Build, store in ECR
Manage model
artifacts in S3 using
artifact repository
structure (ivy, maven)
Capture data
transformations from
notebooks for
production as either
lambda step functions
or EMR Steps
Leverage canary
deployment features of
the platform to test real-
world effectiveness
before cutting over

Data Science Innovation Summit Philadelphia 2019 - pariveda

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Data Science Innovation Summit Philadelphia 2019 - pariveda

Semelhante a Data Science Innovation Summit Philadelphia 2019 - pariveda (20)

Último

Último (20)

Data Science Innovation Summit Philadelphia 2019 - pariveda

Notas do Editor