SlideShare uma empresa Scribd logo
1 de 39
1 © Pariveda Solutions. Confidential & Proprietary.
Principal Architect, Chicago
Worked with a couple of the Fortune 500 to use data as an asset, and
others to build AWS solutions and strategies.
10 years of technology leadership with Pariveda Solutions and General
Electric
Cloud architect, big data wrangler, professional hobbyist, amateur
cyclist, and lifelong Pittsburgh Steelers fan. Entertainer to many, and
entertained by life’s mysteries.
RYAN
Gross
Table of Contents
Machine Learning and The Business
Lay the Foundation with Data Engineering
Operationalizing Data Science
Questions
Appendix
3 © Pariveda Solutions. Confidential & Proprietary.
from automation, improved asset utilization, DevOps optimization, etc.
Net valuation of the world’s largest
companies
4 © Pariveda Solutions. Confidential & Proprietary.
from automation, improved asset utilization, DevOps optimization, etc.
Net valuation of the world’s largest
companies
5 © Pariveda Solutions. Confidential & Proprietary.
from automation, improved asset utilization, DevOps optimization, etc.
Net valuation of the world’s largest
companies
6 © Pariveda Solutions. Confidential & Proprietary.
from automation, improved asset utilization, DevOps optimization, etc.
L
U
S
E
O
N
L
Y
D
O
N
O
T
D
S
T
R
B
U
T
E
7 © Pariveda Solutions. Confidential & Proprietary.
VAPORIZED
ML as a
Vaporizing Agent
What does it mean
to be ‘Vaporized’?
The Software-
Defined Society
8 © Pariveda Solutions. Confidential & Proprietary.
Democratizing Machine Learning
Automatically build and evaluate
hundreds of models in parallel
Where state of the art algorithms are
always live and accessible to anyone
Accelerate the development and delivery of
models with infrastructure automation, seamless
collaboration, and automated reproducibility
Build, train, and deploy machine
learning models at scale
9 © Pariveda Solutions. Confidential & Proprietary.
Democratizing Machine Learning
10 © Pariveda Solutions. Confidential & Proprietary.
ORGANIZATIONS ARE
SOLVING REAL-WORLD
PROBLEMS WITH
MACHINE LEARNING
Pediatrics hospital predicts individual patients risk
of contracting Central Line-Associated Blood
Stream Infections (CLBASIs) up to 3 days earlier
than normal infection detection, allowing them to
take action to minimize this risk
Oilfield equipment provider predicts the win/loss probability
of bids based on market conditions, customer information,
and historical sales data. They can easily see the impact of
different pricing strategies, speeding the quoting process
and increasing likelihood of a sale
Retail energy company forecasts the electricity usage of
individual customers instead of aggregate groups.
Customers are provided with an increased awareness of
their upcoming energy costs and the energy company can
better hedge energy prices
Heavy equipment manufacturer predicts that a part is going to fail up to
3 weeks before failure. This allows customers avoid expensive lost
productivity due to unexpected downtime. Customers also reduce
maintenance costs by only replacing parts as they are about to fail,
rather than 20% of parts during each maintenance window
• Building Data Lake
Platforms & Data
Governance processes to
make data available
• Change management
being utilized to adopt ML
solutions across the
business
• Filling out roles on Data
Science team
• Business actively looking
• Executing against a value-
driven backlog of ML
opportunities
• Scaling up the data science
function, supplementing with
platforms
• Business understands data-
driven decision making,
• Utilizing controlled
experiments during roll-out
• Putting first Machine
Learning MVP solutions
into production
• Educating the business
on utilizing ML
predictions
• Sourcing data for ML
models ad-hoc
• Hiring leadership roles
on data science team
• Not currently in position to
build a production
machine learning solution.
• Require major Data
Engineering to get ready.
• Can also implement POCs
where data is ready to
show the benefits of ML to
executives
STORMINGFORMING NORMING PERFORMIN
G
Readiness for DS Value Realization
12 © Pariveda Solutions. Confidential & Proprietary.
Software
Engineering
Data
Engineering
Data
Science
Data Scientists spend ~80% of their time on activities that are not core to their skills sets. A team with varied skills
can help focus scarce Data Science resources.
Building a Value Realization Team
$$$ $$ $$
Business
Analysis
13 © Pariveda Solutions. Confidential & Proprietary.
Building a Value Realization Pipeline
Metrics
Prototypes
& LEARN
Identify, Assess,
and Prioritize
Experiment and
Learn
Insights
Opportunities
Deploy, Test,
and Run
14 © Pariveda Solutions. Confidential & Proprietary.
Building a Value Realization Pipeline
Sprint
Release
Planning
Weekly
Work Item
Conversation
Design Develop
Test
Peer Review
Regression
Testing
Acceptance
Testing
Deploy to Dev
Code Analysis
Iteration
Planning
Backlog
Grooming
Status
Meeting
Deploy to Test
Visual Design
System Docs
& Test Scripts
Code Review
Facilitated
UAT
User
Validation
QA Testing
Potentially
Shippable
Product
Iteration
Review
Retro
Deploy to
Stage
Automated UI
Regression
Testing
API Load &
Performance
Testing
Integration&
Performance
Testing
Holistic
Usability
Testing
Deploy to
Production
Product Release
Conceptualization
Learn & Repeat
Identify,
Assess, and
Prioritize
Business
Focused
Leveraged Use
Cases
Opportunity
Mapping
Process
Impact
Technical
Complexity
Business
Value
Value
Mapping
Generate
Concept Cards
Research Spike
Learn & Repeat
Business
Assessment
Data
Availability
Assessment
Operationaliza
tion
Assessment
Data
Acquisition
Data
Understanding
Data
Validationand
Cleanup
Feature
Engineering
Model
Development
Model
Evaluation
Deployment Spike
Learn & Repeat
Testing
Deployment
Change
Management
Monitoring
Automation
Performance
Tuning
Experiment
and
Learn
Deploy,
Test, and
Run
Concept Card
Review
Research
Spike Results
Deployment
Spike Results
User Stories
Progress
Concept Cards
Pivot | Persevere | Promote | Quit Pivot | Persevere | Promote | Quit
Software
Engineering
Data
Engineering
Data
Science$$$ $$ $$
Business
Analysis
Table of Contents
Machine Learning and The Business
Lay the Foundation with Data Engineering
Operationalizing Data Science
Questions
Appendix
16 © Pariveda Solutions. Confidential & Proprietary.
17 © Pariveda Solutions. Confidential & Proprietary.
18 © Pariveda Solutions. Confidential & Proprietary.
V V VVolume Velocity Variety
20 © Pariveda Solutions. Confidential & Proprietary.
21 © Pariveda Solutions. Confidential & Proprietary.
V V V V VVolume Velocity Variety Veracity Value
22 © Pariveda Solutions. Confidential & Proprietary.
Data Forge: Modern Data Pipeline
23 © Pariveda Solutions. Confidential & Proprietary.
LEVERAGING YOUR DATA
PLATFORM
ENABLING TECHNOLOGIES
ENABLING CAPABILITIES
OUTCOMES
A well structured and maintained data lake supports and integrates
better with enabling technologies and implemented use cases
Validation Engine
Data Platform
Analytics
Data Catalog Lineage Tracker Notifications / Logging
Directory Service
Integration
Data Provisioning Infrastructure as Code
Structure Formats
New Data Product
Development
Maintenance and
Operations
Data Governance Experimentation
AgilityData Quality Security
Return on
Investment
Visibility Compliance
Technology
People Process
24 © Pariveda Solutions. Confidential & Proprietary.
Data Forge: Modern Data
Governance
GOVERNANCE IS NOT A PROJECT
It is a level of rigor that must be applied continuously.
Motivation to govern is driven by demonstrating the value of the data.
The activities of governance have not changed, but they are applied as
needed.
EXPERIMENTATION MUST BE ENABLED
Governance shouldn’t hinder the ability to experiment with new ideas quickly.
It should be applied after the value of the data has been established.
Raw data should be easy to discover, access, and understand
Traditional
Governance
(75% failure rate)
Value Driven
Governance
25 © Pariveda Solutions. Confidential & Proprietary.
Data Forge: Experiment to Show
Value
Data Lab Experimentation Platform Prototypes
DATA LAKE
Table of Contents
Machine Learning and The Business
Lay the Foundation with Data Engineering
Operationalizing Data Science
Questions
Appendix
28 © Pariveda Solutions. Confidential & Proprietary.
29 © Pariveda Solutions. Confidential & Proprietary.
IDEA
Engage internal stakeholders, capture new business ideas and fill-in Concept Cards
Ideate - Create Concept Cards
GOOD
IDEAReview Concept Cards w/ Executive Leadership, prioritize and select concepts to research further
Review & Prioritize
REAL
IDEA
Engage external & related parties to formalize dependencies; Develop mock-ups and/or prototypes
Research & Prototype
VIABLE
IDEA
Fully spike prototype for subset of customers or users
Focused/Market Test
Production launch of product/service after incorporating lessons learned from Market Test
Release
CONCEPT
“LINE OF FEASIBILITY”
REALITY
TO
DEVELOPMENT PHASES
IDEA
MATURATIO
N
VALUABLE
IDEA
Moving Data Science to Production
30 © Pariveda Solutions. Confidential & Proprietary.
Set up RESTful APIs to score new examples and retrain your model periodically if necessary
ModelOps: Deploying a Machine Learning Model to Production
Hey, I need a prediction!
31 © Pariveda Solutions. Confidential & Proprietary.
Set up RESTful APIs to score new examples and retrain your model periodically if necessary
ModelOps: Deploying a Machine Learning Model to Production
Machine Learning Model
Hey, I need a prediction!
Step 1
Do some manual data
science and create a
predictive model
32 © Pariveda Solutions. Confidential & Proprietary.
Set up RESTful APIs to score new examples and retrain your model periodically if necessary
ModelOps: Deploying a Machine Learning Model to Production
Machine Learning Model
Hey, I need a prediction!
Score
Step 1
Do some manual data
science and create a
predictive model
Step 2
Deploy a score API that can
return predictions for users
and calling applications
33 © Pariveda Solutions. Confidential & Proprietary.
Set up RESTful APIs to score new examples and retrain your model periodically if necessary
ModelOps: Deploying a Machine Learning Model to Production
Machine Learning Model
Hey, I need a prediction!
Score
Step 1
Do some manual data
science and create a
predictive model
Time to update our model!
Train, Validate, Deploy
Step 2
Deploy a score API that can
return predictions for users
and calling applications
Step 3
Deploy an API that can
retrain, validate, and deploy
models and potentially setup
a timer job to hit that API
34 © Pariveda Solutions. Confidential & Proprietary.
Set up RESTful APIs to score new examples and retrain your model periodically if necessary
ModelOps: Deploying a Machine Learning Model to Production
Machine Learning Model
Hey, I need a prediction!
Score
Step 1
Do some manual data
science and create a
predictive model
Time to update our model!
Train, Validate, Deploy
Step 2
Deploy a score API that can
return predictions for users
and calling applications
Step 3
Deploy an API that can
retrain, validate, and deploy
models and potentially setup
a timer job to hit that API
35 © Pariveda Solutions. Confidential & Proprietary.
Enhanced Sales
Transactions
Weighted Average
Sales by customer
(Daily) Weighted Average
Sales (Weekly,
Monthly, etc.)Buying pattern by
customer (used for
fraud in Enhance)
Enhanced Product
Manufacturing
Records
Average cost per
product by
location
Weather report
and predictions
Weather impact
factor by customer
Predicted
Customer Revenue
(Daily)
Predicted
Customer Profit
(Daily)
…
ModelOps: Chaining a Machine Learning Model in Production
You can use the modern data platform to monitor problems and alert or act on them
Weighted Average
Sales by customer
(Hourly) Weighted Average
Sales (Daily,
Weekly, Monthly)
36 © Pariveda Solutions. Confidential & Proprietary.
Prediction Accuracy:
Customer Revenue
(Weekly, Monthly)
Enhanced Sales
Transactions
Weighted Average
Sales by customer
(Daily) Weighted Average
Sales (Weekly,
Monthly, etc.)Buying pattern by
customer (used for
fraud in Enhance)
Enhanced Product
Manufacturing
Records
Average cost per
product by
location
Weather report
and predictions
Weather impact
factor by customer
Predicted
Customer Revenue
(Daily) Predicted
Customer Revenue
(Weekly, Monthly)
Predicted
Customer Profit
(Daily)
…
ModelOps: Monitoring a Machine Learning Model in Production
You can use the modern data platform to monitor problems and alert or act on them
Weighted Average
Sales by customer
(Hourly) Weighted Average
Sales (Daily,
Weekly, Monthly)
37 © Pariveda Solutions. Confidential & Proprietary.
Prediction Accuracy:
Customer Revenue
(Weekly, Monthly)
Enhanced Sales
Transactions
Weighted Average
Sales by customer
(Daily) Weighted Average
Sales (Weekly,
Monthly, etc.)Buying pattern by
customer (used for
fraud in Enhance)
Enhanced Product
Manufacturing
Records
Average cost per
product by
location
Weather report
and predictions
Weather impact
factor by customer
Predicted
Customer Revenue
(Daily) Predicted
Customer Revenue
(Weekly, Monthly)
Predicted
Customer Profit
(Daily)
…
ModelOps: Monitoring a Machine Learning Model in Production
You can use the modern data platform to monitor problems and alert or act on them
Train, Validate, Deploy
Alert the team!
Weighted Average
Sales by customer
(Hourly) Weighted Average
Sales (Daily,
Weekly, Monthly)
38 © Pariveda Solutions. Confidential & Proprietary.
ModelOps: Model-Building Flow w/ Artifact Management
Manage
notebook
development
in source
control
Use S3 Object
Versioning to Ensure
Data Consistency
Build Docker
Images for specific
algorithms in Code
Build, store in ECR
Manage model
artifacts in S3 using
artifact repository
structure (ivy, maven)
Capture data
transformations from
notebooks for
production as either
lambda step functions
or EMR Steps
Leverage canary
deployment features of
the platform to test real-
world effectiveness
before cutting over
Table of Contents
Machine Learning and The Business
Lay the Foundation with Data Engineering
Operationalizing Data Science
Questions
Appendix
40 © Pariveda Solutions. Confidential & Proprietary.

Mais conteúdo relacionado

Mais procurados

Doors_Santosh.S Resume
Doors_Santosh.S ResumeDoors_Santosh.S Resume
Doors_Santosh.S ResumeSantosh Kumar
 
Introducing The Summit Point Group
Introducing The Summit Point GroupIntroducing The Summit Point Group
Introducing The Summit Point GroupDavid Coleman
 
National Skills Academy Product Presentation & Business Plan
National Skills Academy Product Presentation & Business PlanNational Skills Academy Product Presentation & Business Plan
National Skills Academy Product Presentation & Business Planjasonbirder
 
Prestige Institute of Management and Research, Indore
Prestige Institute of Management and Research, IndorePrestige Institute of Management and Research, Indore
Prestige Institute of Management and Research, IndoreMohit Soni
 
IT for Managers & Organisations
IT for Managers & OrganisationsIT for Managers & Organisations
IT for Managers & Organisationsmirabelo
 
Volunteer: Ideas to Boost Your Skills
Volunteer: Ideas to Boost Your SkillsVolunteer: Ideas to Boost Your Skills
Volunteer: Ideas to Boost Your SkillsJolene Bernhard
 
Erp and value chain management presentation priyansh kesarwani
Erp and value chain management presentation priyansh kesarwaniErp and value chain management presentation priyansh kesarwani
Erp and value chain management presentation priyansh kesarwaniPriyansh Kesarwani
 
Sample on Operations Management By Instant Essay Writing
Sample on Operations Management By Instant Essay WritingSample on Operations Management By Instant Essay Writing
Sample on Operations Management By Instant Essay WritingInstant Essay Writing
 
Hardcore SEO & Social Media Tools - SMX Advanced 2012
Hardcore SEO & Social Media Tools - SMX Advanced 2012Hardcore SEO & Social Media Tools - SMX Advanced 2012
Hardcore SEO & Social Media Tools - SMX Advanced 2012Rhea Drysdale
 
Ims04 ims modernization and integration - IMS UG May 2014 Sydney & Melbourne
Ims04  ims modernization and integration - IMS UG May 2014 Sydney & MelbourneIms04  ims modernization and integration - IMS UG May 2014 Sydney & Melbourne
Ims04 ims modernization and integration - IMS UG May 2014 Sydney & MelbourneRobert Hain
 
The Analytics CoE: Positioning your Business Analytics Program for Success
The Analytics CoE: Positioning your Business Analytics Program for SuccessThe Analytics CoE: Positioning your Business Analytics Program for Success
The Analytics CoE: Positioning your Business Analytics Program for SuccessCartegraph
 
Engineering mindset fort corporate management
Engineering mindset fort corporate managementEngineering mindset fort corporate management
Engineering mindset fort corporate managementXBOSoft
 
Integrating Postgres with ActiveMQ and Camel
Integrating Postgres with ActiveMQ and CamelIntegrating Postgres with ActiveMQ and Camel
Integrating Postgres with ActiveMQ and CamelJustin Reock
 
3 Pillars Reworking the Revolution
3 Pillars Reworking the Revolution3 Pillars Reworking the Revolution
3 Pillars Reworking the RevolutionTracey Williamson
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsCloudera, Inc.
 

Mais procurados (20)

Doors_Santosh.S Resume
Doors_Santosh.S ResumeDoors_Santosh.S Resume
Doors_Santosh.S Resume
 
Introducing The Summit Point Group
Introducing The Summit Point GroupIntroducing The Summit Point Group
Introducing The Summit Point Group
 
Mind of the Engineer
Mind of the EngineerMind of the Engineer
Mind of the Engineer
 
National Skills Academy Product Presentation & Business Plan
National Skills Academy Product Presentation & Business PlanNational Skills Academy Product Presentation & Business Plan
National Skills Academy Product Presentation & Business Plan
 
Prestige Institute of Management and Research, Indore
Prestige Institute of Management and Research, IndorePrestige Institute of Management and Research, Indore
Prestige Institute of Management and Research, Indore
 
IT for Managers & Organisations
IT for Managers & OrganisationsIT for Managers & Organisations
IT for Managers & Organisations
 
Volunteer: Ideas to Boost Your Skills
Volunteer: Ideas to Boost Your SkillsVolunteer: Ideas to Boost Your Skills
Volunteer: Ideas to Boost Your Skills
 
GREEN BUILDING ONE STOP SERVICE
GREEN BUILDING  ONE STOP SERVICEGREEN BUILDING  ONE STOP SERVICE
GREEN BUILDING ONE STOP SERVICE
 
Erp and value chain management presentation priyansh kesarwani
Erp and value chain management presentation priyansh kesarwaniErp and value chain management presentation priyansh kesarwani
Erp and value chain management presentation priyansh kesarwani
 
Group project report
Group project reportGroup project report
Group project report
 
Sample on Operations Management By Instant Essay Writing
Sample on Operations Management By Instant Essay WritingSample on Operations Management By Instant Essay Writing
Sample on Operations Management By Instant Essay Writing
 
Hardcore SEO & Social Media Tools - SMX Advanced 2012
Hardcore SEO & Social Media Tools - SMX Advanced 2012Hardcore SEO & Social Media Tools - SMX Advanced 2012
Hardcore SEO & Social Media Tools - SMX Advanced 2012
 
Ims04 ims modernization and integration - IMS UG May 2014 Sydney & Melbourne
Ims04  ims modernization and integration - IMS UG May 2014 Sydney & MelbourneIms04  ims modernization and integration - IMS UG May 2014 Sydney & Melbourne
Ims04 ims modernization and integration - IMS UG May 2014 Sydney & Melbourne
 
Digital Media - Muhammad Muaz Dubai
Digital Media - Muhammad Muaz  DubaiDigital Media - Muhammad Muaz  Dubai
Digital Media - Muhammad Muaz Dubai
 
The Analytics CoE: Positioning your Business Analytics Program for Success
The Analytics CoE: Positioning your Business Analytics Program for SuccessThe Analytics CoE: Positioning your Business Analytics Program for Success
The Analytics CoE: Positioning your Business Analytics Program for Success
 
Engineering mindset fort corporate management
Engineering mindset fort corporate managementEngineering mindset fort corporate management
Engineering mindset fort corporate management
 
Integrating Postgres with ActiveMQ and Camel
Integrating Postgres with ActiveMQ and CamelIntegrating Postgres with ActiveMQ and Camel
Integrating Postgres with ActiveMQ and Camel
 
Sap an enterprise application
Sap  an enterprise applicationSap  an enterprise application
Sap an enterprise application
 
3 Pillars Reworking the Revolution
3 Pillars Reworking the Revolution3 Pillars Reworking the Revolution
3 Pillars Reworking the Revolution
 
Optimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analyticsOptimize your cloud strategy for machine learning and analytics
Optimize your cloud strategy for machine learning and analytics
 

Semelhante a Data Science Innovation Summit Philadelphia 2019 - pariveda

Leverage Data Strategy as a Catalyst for Innovation
Leverage Data Strategy as a Catalyst for InnovationLeverage Data Strategy as a Catalyst for Innovation
Leverage Data Strategy as a Catalyst for InnovationGlorium Tech
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsRyan Gross
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsDATAVERSITY
 
The Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision ModelingThe Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision ModelingDecision Management Solutions
 
Calvin Wee Resume 2015 Apr 27
Calvin Wee Resume 2015 Apr 27Calvin Wee Resume 2015 Apr 27
Calvin Wee Resume 2015 Apr 27Calvin Wee
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
Four Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics StrategyFour Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics StrategyArcadia Data
 
Predictive Analytics: Extending asset management framework for multi-industry...
Predictive Analytics: Extending asset management framework for multi-industry...Predictive Analytics: Extending asset management framework for multi-industry...
Predictive Analytics: Extending asset management framework for multi-industry...Capgemini
 
Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data Pactera_US
 
Automating and Orchestrating Processes and Decisions Across the Enterprise
Automating and Orchestrating Processes and Decisions Across the EnterpriseAutomating and Orchestrating Processes and Decisions Across the Enterprise
Automating and Orchestrating Processes and Decisions Across the EnterpriseDenis Gagné
 
Virtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise ScaleVirtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise ScaleDenodo
 
Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 

Semelhante a Data Science Innovation Summit Philadelphia 2019 - pariveda (20)

Leverage Data Strategy as a Catalyst for Innovation
Leverage Data Strategy as a Catalyst for InnovationLeverage Data Strategy as a Catalyst for Innovation
Leverage Data Strategy as a Catalyst for Innovation
 
Data summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data opsData summit connect fall 2020 - rise of data ops
Data summit connect fall 2020 - rise of data ops
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced Analytics
 
The Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision ModelingThe Value of Predictive Analytics and Decision Modeling
The Value of Predictive Analytics and Decision Modeling
 
Big Data
Big DataBig Data
Big Data
 
How Businesses use Big Data to Impact the Bottom Line
How Businesses use Big Data to Impact the Bottom LineHow Businesses use Big Data to Impact the Bottom Line
How Businesses use Big Data to Impact the Bottom Line
 
Calvin Wee Resume 2015 Apr 27
Calvin Wee Resume 2015 Apr 27Calvin Wee Resume 2015 Apr 27
Calvin Wee Resume 2015 Apr 27
 
Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
Manufactures whats keeping you up
Manufactures   whats keeping you upManufactures   whats keeping you up
Manufactures whats keeping you up
 
Four Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics StrategyFour Key Considerations for your Big Data Analytics Strategy
Four Key Considerations for your Big Data Analytics Strategy
 
Predictive Analytics: Extending asset management framework for multi-industry...
Predictive Analytics: Extending asset management framework for multi-industry...Predictive Analytics: Extending asset management framework for multi-industry...
Predictive Analytics: Extending asset management framework for multi-industry...
 
Get your data analytics strategy right!
Get your data analytics strategy right!Get your data analytics strategy right!
Get your data analytics strategy right!
 
Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data Predicting Customer Behavior With Big Data
Predicting Customer Behavior With Big Data
 
Automating and Orchestrating Processes and Decisions Across the Enterprise
Automating and Orchestrating Processes and Decisions Across the EnterpriseAutomating and Orchestrating Processes and Decisions Across the Enterprise
Automating and Orchestrating Processes and Decisions Across the Enterprise
 
Virtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise ScaleVirtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise Scale
 
Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with Cloudera
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 

Último

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Último (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Data Science Innovation Summit Philadelphia 2019 - pariveda

  • 1. 1 © Pariveda Solutions. Confidential & Proprietary. Principal Architect, Chicago Worked with a couple of the Fortune 500 to use data as an asset, and others to build AWS solutions and strategies. 10 years of technology leadership with Pariveda Solutions and General Electric Cloud architect, big data wrangler, professional hobbyist, amateur cyclist, and lifelong Pittsburgh Steelers fan. Entertainer to many, and entertained by life’s mysteries. RYAN Gross
  • 2. Table of Contents Machine Learning and The Business Lay the Foundation with Data Engineering Operationalizing Data Science Questions Appendix
  • 3. 3 © Pariveda Solutions. Confidential & Proprietary. from automation, improved asset utilization, DevOps optimization, etc. Net valuation of the world’s largest companies
  • 4. 4 © Pariveda Solutions. Confidential & Proprietary. from automation, improved asset utilization, DevOps optimization, etc. Net valuation of the world’s largest companies
  • 5. 5 © Pariveda Solutions. Confidential & Proprietary. from automation, improved asset utilization, DevOps optimization, etc. Net valuation of the world’s largest companies
  • 6. 6 © Pariveda Solutions. Confidential & Proprietary. from automation, improved asset utilization, DevOps optimization, etc. L U S E O N L Y D O N O T D S T R B U T E
  • 7. 7 © Pariveda Solutions. Confidential & Proprietary. VAPORIZED ML as a Vaporizing Agent What does it mean to be ‘Vaporized’? The Software- Defined Society
  • 8. 8 © Pariveda Solutions. Confidential & Proprietary. Democratizing Machine Learning Automatically build and evaluate hundreds of models in parallel Where state of the art algorithms are always live and accessible to anyone Accelerate the development and delivery of models with infrastructure automation, seamless collaboration, and automated reproducibility Build, train, and deploy machine learning models at scale
  • 9. 9 © Pariveda Solutions. Confidential & Proprietary. Democratizing Machine Learning
  • 10. 10 © Pariveda Solutions. Confidential & Proprietary. ORGANIZATIONS ARE SOLVING REAL-WORLD PROBLEMS WITH MACHINE LEARNING Pediatrics hospital predicts individual patients risk of contracting Central Line-Associated Blood Stream Infections (CLBASIs) up to 3 days earlier than normal infection detection, allowing them to take action to minimize this risk Oilfield equipment provider predicts the win/loss probability of bids based on market conditions, customer information, and historical sales data. They can easily see the impact of different pricing strategies, speeding the quoting process and increasing likelihood of a sale Retail energy company forecasts the electricity usage of individual customers instead of aggregate groups. Customers are provided with an increased awareness of their upcoming energy costs and the energy company can better hedge energy prices Heavy equipment manufacturer predicts that a part is going to fail up to 3 weeks before failure. This allows customers avoid expensive lost productivity due to unexpected downtime. Customers also reduce maintenance costs by only replacing parts as they are about to fail, rather than 20% of parts during each maintenance window
  • 11. • Building Data Lake Platforms & Data Governance processes to make data available • Change management being utilized to adopt ML solutions across the business • Filling out roles on Data Science team • Business actively looking • Executing against a value- driven backlog of ML opportunities • Scaling up the data science function, supplementing with platforms • Business understands data- driven decision making, • Utilizing controlled experiments during roll-out • Putting first Machine Learning MVP solutions into production • Educating the business on utilizing ML predictions • Sourcing data for ML models ad-hoc • Hiring leadership roles on data science team • Not currently in position to build a production machine learning solution. • Require major Data Engineering to get ready. • Can also implement POCs where data is ready to show the benefits of ML to executives STORMINGFORMING NORMING PERFORMIN G Readiness for DS Value Realization
  • 12. 12 © Pariveda Solutions. Confidential & Proprietary. Software Engineering Data Engineering Data Science Data Scientists spend ~80% of their time on activities that are not core to their skills sets. A team with varied skills can help focus scarce Data Science resources. Building a Value Realization Team $$$ $$ $$ Business Analysis
  • 13. 13 © Pariveda Solutions. Confidential & Proprietary. Building a Value Realization Pipeline Metrics Prototypes & LEARN Identify, Assess, and Prioritize Experiment and Learn Insights Opportunities Deploy, Test, and Run
  • 14. 14 © Pariveda Solutions. Confidential & Proprietary. Building a Value Realization Pipeline Sprint Release Planning Weekly Work Item Conversation Design Develop Test Peer Review Regression Testing Acceptance Testing Deploy to Dev Code Analysis Iteration Planning Backlog Grooming Status Meeting Deploy to Test Visual Design System Docs & Test Scripts Code Review Facilitated UAT User Validation QA Testing Potentially Shippable Product Iteration Review Retro Deploy to Stage Automated UI Regression Testing API Load & Performance Testing Integration& Performance Testing Holistic Usability Testing Deploy to Production Product Release Conceptualization Learn & Repeat Identify, Assess, and Prioritize Business Focused Leveraged Use Cases Opportunity Mapping Process Impact Technical Complexity Business Value Value Mapping Generate Concept Cards Research Spike Learn & Repeat Business Assessment Data Availability Assessment Operationaliza tion Assessment Data Acquisition Data Understanding Data Validationand Cleanup Feature Engineering Model Development Model Evaluation Deployment Spike Learn & Repeat Testing Deployment Change Management Monitoring Automation Performance Tuning Experiment and Learn Deploy, Test, and Run Concept Card Review Research Spike Results Deployment Spike Results User Stories Progress Concept Cards Pivot | Persevere | Promote | Quit Pivot | Persevere | Promote | Quit Software Engineering Data Engineering Data Science$$$ $$ $$ Business Analysis
  • 15. Table of Contents Machine Learning and The Business Lay the Foundation with Data Engineering Operationalizing Data Science Questions Appendix
  • 16. 16 © Pariveda Solutions. Confidential & Proprietary.
  • 17. 17 © Pariveda Solutions. Confidential & Proprietary.
  • 18. 18 © Pariveda Solutions. Confidential & Proprietary. V V VVolume Velocity Variety
  • 19.
  • 20. 20 © Pariveda Solutions. Confidential & Proprietary.
  • 21. 21 © Pariveda Solutions. Confidential & Proprietary. V V V V VVolume Velocity Variety Veracity Value
  • 22. 22 © Pariveda Solutions. Confidential & Proprietary. Data Forge: Modern Data Pipeline
  • 23. 23 © Pariveda Solutions. Confidential & Proprietary. LEVERAGING YOUR DATA PLATFORM ENABLING TECHNOLOGIES ENABLING CAPABILITIES OUTCOMES A well structured and maintained data lake supports and integrates better with enabling technologies and implemented use cases Validation Engine Data Platform Analytics Data Catalog Lineage Tracker Notifications / Logging Directory Service Integration Data Provisioning Infrastructure as Code Structure Formats New Data Product Development Maintenance and Operations Data Governance Experimentation AgilityData Quality Security Return on Investment Visibility Compliance Technology People Process
  • 24. 24 © Pariveda Solutions. Confidential & Proprietary. Data Forge: Modern Data Governance GOVERNANCE IS NOT A PROJECT It is a level of rigor that must be applied continuously. Motivation to govern is driven by demonstrating the value of the data. The activities of governance have not changed, but they are applied as needed. EXPERIMENTATION MUST BE ENABLED Governance shouldn’t hinder the ability to experiment with new ideas quickly. It should be applied after the value of the data has been established. Raw data should be easy to discover, access, and understand Traditional Governance (75% failure rate) Value Driven Governance
  • 25. 25 © Pariveda Solutions. Confidential & Proprietary. Data Forge: Experiment to Show Value Data Lab Experimentation Platform Prototypes DATA LAKE
  • 26. Table of Contents Machine Learning and The Business Lay the Foundation with Data Engineering Operationalizing Data Science Questions Appendix
  • 27. 28 © Pariveda Solutions. Confidential & Proprietary.
  • 28. 29 © Pariveda Solutions. Confidential & Proprietary. IDEA Engage internal stakeholders, capture new business ideas and fill-in Concept Cards Ideate - Create Concept Cards GOOD IDEAReview Concept Cards w/ Executive Leadership, prioritize and select concepts to research further Review & Prioritize REAL IDEA Engage external & related parties to formalize dependencies; Develop mock-ups and/or prototypes Research & Prototype VIABLE IDEA Fully spike prototype for subset of customers or users Focused/Market Test Production launch of product/service after incorporating lessons learned from Market Test Release CONCEPT “LINE OF FEASIBILITY” REALITY TO DEVELOPMENT PHASES IDEA MATURATIO N VALUABLE IDEA Moving Data Science to Production
  • 29. 30 © Pariveda Solutions. Confidential & Proprietary. Set up RESTful APIs to score new examples and retrain your model periodically if necessary ModelOps: Deploying a Machine Learning Model to Production Hey, I need a prediction!
  • 30. 31 © Pariveda Solutions. Confidential & Proprietary. Set up RESTful APIs to score new examples and retrain your model periodically if necessary ModelOps: Deploying a Machine Learning Model to Production Machine Learning Model Hey, I need a prediction! Step 1 Do some manual data science and create a predictive model
  • 31. 32 © Pariveda Solutions. Confidential & Proprietary. Set up RESTful APIs to score new examples and retrain your model periodically if necessary ModelOps: Deploying a Machine Learning Model to Production Machine Learning Model Hey, I need a prediction! Score Step 1 Do some manual data science and create a predictive model Step 2 Deploy a score API that can return predictions for users and calling applications
  • 32. 33 © Pariveda Solutions. Confidential & Proprietary. Set up RESTful APIs to score new examples and retrain your model periodically if necessary ModelOps: Deploying a Machine Learning Model to Production Machine Learning Model Hey, I need a prediction! Score Step 1 Do some manual data science and create a predictive model Time to update our model! Train, Validate, Deploy Step 2 Deploy a score API that can return predictions for users and calling applications Step 3 Deploy an API that can retrain, validate, and deploy models and potentially setup a timer job to hit that API
  • 33. 34 © Pariveda Solutions. Confidential & Proprietary. Set up RESTful APIs to score new examples and retrain your model periodically if necessary ModelOps: Deploying a Machine Learning Model to Production Machine Learning Model Hey, I need a prediction! Score Step 1 Do some manual data science and create a predictive model Time to update our model! Train, Validate, Deploy Step 2 Deploy a score API that can return predictions for users and calling applications Step 3 Deploy an API that can retrain, validate, and deploy models and potentially setup a timer job to hit that API
  • 34. 35 © Pariveda Solutions. Confidential & Proprietary. Enhanced Sales Transactions Weighted Average Sales by customer (Daily) Weighted Average Sales (Weekly, Monthly, etc.)Buying pattern by customer (used for fraud in Enhance) Enhanced Product Manufacturing Records Average cost per product by location Weather report and predictions Weather impact factor by customer Predicted Customer Revenue (Daily) Predicted Customer Profit (Daily) … ModelOps: Chaining a Machine Learning Model in Production You can use the modern data platform to monitor problems and alert or act on them Weighted Average Sales by customer (Hourly) Weighted Average Sales (Daily, Weekly, Monthly)
  • 35. 36 © Pariveda Solutions. Confidential & Proprietary. Prediction Accuracy: Customer Revenue (Weekly, Monthly) Enhanced Sales Transactions Weighted Average Sales by customer (Daily) Weighted Average Sales (Weekly, Monthly, etc.)Buying pattern by customer (used for fraud in Enhance) Enhanced Product Manufacturing Records Average cost per product by location Weather report and predictions Weather impact factor by customer Predicted Customer Revenue (Daily) Predicted Customer Revenue (Weekly, Monthly) Predicted Customer Profit (Daily) … ModelOps: Monitoring a Machine Learning Model in Production You can use the modern data platform to monitor problems and alert or act on them Weighted Average Sales by customer (Hourly) Weighted Average Sales (Daily, Weekly, Monthly)
  • 36. 37 © Pariveda Solutions. Confidential & Proprietary. Prediction Accuracy: Customer Revenue (Weekly, Monthly) Enhanced Sales Transactions Weighted Average Sales by customer (Daily) Weighted Average Sales (Weekly, Monthly, etc.)Buying pattern by customer (used for fraud in Enhance) Enhanced Product Manufacturing Records Average cost per product by location Weather report and predictions Weather impact factor by customer Predicted Customer Revenue (Daily) Predicted Customer Revenue (Weekly, Monthly) Predicted Customer Profit (Daily) … ModelOps: Monitoring a Machine Learning Model in Production You can use the modern data platform to monitor problems and alert or act on them Train, Validate, Deploy Alert the team! Weighted Average Sales by customer (Hourly) Weighted Average Sales (Daily, Weekly, Monthly)
  • 37. 38 © Pariveda Solutions. Confidential & Proprietary. ModelOps: Model-Building Flow w/ Artifact Management Manage notebook development in source control Use S3 Object Versioning to Ensure Data Consistency Build Docker Images for specific algorithms in Code Build, store in ECR Manage model artifacts in S3 using artifact repository structure (ivy, maven) Capture data transformations from notebooks for production as either lambda step functions or EMR Steps Leverage canary deployment features of the platform to test real- world effectiveness before cutting over
  • 38. Table of Contents Machine Learning and The Business Lay the Foundation with Data Engineering Operationalizing Data Science Questions Appendix
  • 39. 40 © Pariveda Solutions. Confidential & Proprietary.

Notas do Editor

  1. http://mattturck.com/the-power-of-data-network-effects/
  2. Organizations now collect an enormous volume of data about their customers, products, processes Those organizations with the capability to turn data into actionable insights leap ahead of their competitors – delighting customers, reducing costs, opening new markets Human experts and traditional decision support tools become overwhelmed as the amount of data increases Distinguishing subtle differences across hundreds or thousands of interacting factors is difficult The sheer volume of data to be analyzed becomes a limiting factor Machine learning is being used by a rapidly increasing number of organizations to overcome the challenge of generating insights from large data sets Cheap computing power and easy accessibility to advanced algorithms has reduced the barrier to entry – ML is not just for academics or bleeding edge companies any more
  3. Come up with the examples of what each group should be doing
  4. There are three kinds of data problems that really push the big data envelope. They’re commonly known as the Vs of Big Data. Most people agree on at least 3 of them because they help us categorize the technical problem, while others tend to be dimensions of the solution or business problem.
  5. The value is to decouple the storage from the process
  6. There are three kinds of data problems that really push the big data envelope. They’re commonly known as the Vs of Big Data. Most people agree on at least 3 of them because they help us categorize the technical problem, while others tend to be dimensions of the solution or business problem.
  7. Pariveda Solutions has developed a Big Data/AWS reference architecture referred to as the “analytics pipeline” Expands on the familiar [ Extract ]  [ Transform ]  [ Load ] pattern Terminology-wise, this maps directly to the Analytics Pipeline Extract = Ingest/Collect Transform = Store (Model)/Process (Enhance/Transform) Load = Consume/Visualize (Distribute)
  8. Pariveda Solutions has developed a Big Data/AWS reference architecture referred to as the “analytics pipeline” Expands on the familiar [ Extract ]  [ Transform ]  [ Load ] pattern Terminology-wise, this maps directly to the Analytics Pipeline Extract = Ingest/Collect Transform = Store (Model)/Process (Enhance/Transform) Load = Consume/Visualize (Distribute)
  9. Pariveda Solutions has developed a Big Data/AWS reference architecture referred to as the “analytics pipeline” Expands on the familiar [ Extract ]  [ Transform ]  [ Load ] pattern Terminology-wise, this maps directly to the Analytics Pipeline Extract = Ingest/Collect Transform = Store (Model)/Process (Enhance/Transform) Load = Consume/Visualize (Distribute)
  10. Pariveda Solutions has developed a Big Data/AWS reference architecture referred to as the “analytics pipeline” Expands on the familiar [ Extract ]  [ Transform ]  [ Load ] pattern Terminology-wise, this maps directly to the Analytics Pipeline Extract = Ingest/Collect Transform = Store (Model)/Process (Enhance/Transform) Load = Consume/Visualize (Distribute)
  11. Pariveda Solutions has developed a Big Data/AWS reference architecture referred to as the “analytics pipeline” Expands on the familiar [ Extract ]  [ Transform ]  [ Load ] pattern Terminology-wise, this maps directly to the Analytics Pipeline Extract = Ingest/Collect Transform = Store (Model)/Process (Enhance/Transform) Load = Consume/Visualize (Distribute)
  12. The Data Science work provides insight and value, but how do operationalize the work. This is the challenge. We haven’t necessarily completely figured out how the data science work is part of the agile software development process
  13. More Info: Jon Landers, Ryan Gross One thing we didn’t cover in the prior section is how to get your models into production Most of the marketecture around Machine Learning will tell you that it’s this easy to deploy your first model as a RESTful service! But then you remember that you’re modeling the real world, and you’ve read some HBR articles about how the real world changes fast, so you’ll need to retrain the model to keep up So you build an API to trigger retraining and validation, and trigger it on a timer Then you build automated deployment of the new model and everyone is happy, the marketecture was right!
  14. More Info: Jon Landers, Ryan Gross One thing we didn’t cover in the prior section is how to get your models into production Most of the marketecture around Machine Learning will tell you that it’s this easy to deploy your first model as a RESTful service! But then you remember that you’re modeling the real world, and you’ve read some HBR articles about how the real world changes fast, so you’ll need to retrain the model to keep up So you build an API to trigger retraining and validation, and trigger it on a timer Then you build automated deployment of the new model and everyone is happy, the marketecture was right!
  15. More Info: Jon Landers, Ryan Gross One thing we didn’t cover in the prior section is how to get your models into production Most of the marketecture around Machine Learning will tell you that it’s this easy to deploy your first model as a RESTful service! But then you remember that you’re modeling the real world, and you’ve read some HBR articles about how the real world changes fast, so you’ll need to retrain the model to keep up So you build an API to trigger retraining and validation, and trigger it on a timer Then you build automated deployment of the new model and everyone is happy, the marketecture was right!
  16. More Info: Jon Landers, Ryan Gross One thing we didn’t cover in the prior section is how to get your models into production Most of the marketecture around Machine Learning will tell you that it’s this easy to deploy your first model as a RESTful service! But then you remember that you’re modeling the real world, and you’ve read some HBR articles about how the real world changes fast, so you’ll need to retrain the model to keep up So you build an API to trigger retraining and validation, and trigger it on a timer Then you build automated deployment of the new model and everyone is happy, the marketecture was right!
  17. More Info: Jon Landers, Ryan Gross One thing we didn’t cover in the prior section is how to get your models into production Most of the marketecture around Machine Learning will tell you that it’s this easy to deploy your first model as a RESTful service! But then you remember that you’re modeling the real world, and you’ve read some HBR articles about how the real world changes fast, so you’ll need to retrain the model to keep up So you build an API to trigger retraining and validation, and trigger it on a timer Then you build automated deployment of the new model and everyone is happy, the marketecture was right!
  18. More Info: Ryan Gross Let’s go back to your prediction pipeline When the next transaction, day, week, or month goes by, we can check our predictions against the actual values, detecting the need to re-train our models using the new actuals. If the prediction is off by too much, we can alert the team so they can figure out why. If (as is common), it’s just drift because the real world changed, we can re-train the model
  19. More Info: Ryan Gross Let’s go back to your prediction pipeline When the next transaction, day, week, or month goes by, we can check our predictions against the actual values, detecting the need to re-train our models using the new actuals. If the prediction is off by too much, we can alert the team so they can figure out why. If (as is common), it’s just drift because the real world changed, we can re-train the model
  20. More Info: Ryan Gross Let’s go back to your prediction pipeline When the next transaction, day, week, or month goes by, we can check our predictions against the actual values, detecting the need to re-train our models using the new actuals. If the prediction is off by too much, we can alert the team so they can figure out why. If (as is common), it’s just drift because the real world changed, we can re-train the model