SlideShare uma empresa Scribd logo
1 de 36
Baixar para ler offline
1
Continuous
Intelligence
Keeping your AI Application
in Production
Arif Wider
Emily Gorcenski
NDC Porto, April 23-24, 2020
7000+ technologists with 43 offices in 14 countries
Partner for technology driven business transformation
Barcelona - Madrid - London - Manchester - Berlin - Hamburg - Munich - Cologne
#1
in Agile and
Continuous Delivery
100+
books written
©ThoughtWorks 2020
©ThoughtWorks 2020
TECHNIQUES
Continuous delivery
for machine
learning (CD4ML)
models
#8
TRIAL
8
©ThoughtWorks 2020
TECHNIQUES
Continuous delivery
for machine
learning (CD4ML)
models
#8
TRIAL
8
©ThoughtWorks 2020
7
WHERE
IT ALL
STARTED
©ThoughtWorks 2020
8
Data Scientists
Developers
9
Data Scientists
Developers
prediction
model
Feedback,
requests
10
Data Scientists
Developers
prediction
model
Feedback,
requests
SOUNDS LIKE DEV/OPS?
11
Data Scientists
Developers
12
LET’S DO
CONTINUOUS
DELIVERY!
CONTINUOUS DELIVERY PIPELINES
13
Prediction Model Pipeline
Web Application Pipeline
NEW CHALLENGES
14
NEW CHALLENGES
15
→ How to version control training data?
→ Training data and prediction models don’t fit into Git :-(
→ Model re-training slows down the entire continuous delivery server!
→ Data scientists want to evaluate several solutions at the same time...
→ ...and they use analytics notebooks which are hard to version control!
→ How to unit test data science code that is tied to changing data?
→ How to prevent behaviour changes of the model to break the application?
WHAT ARE THE REASONS?
16
Continuous Delivery is the ability to get
changes of all types — including new features,
configuration changes, bug fixes and
experiments — into production, or into the
hands of users, safely and quickly in a
sustainable way.
Jez Humble & Dave Farley
MANY SOURCES OF CHANGE
18
ModelData Code
+ +
Schema
Sampling over Time
Volume
...
Research, Experiments
Training on New Data
Performance
...
New Features
Bug Fixes
Dependencies
...
Icons created by Noura Mbarki and I Putu Kharismayadi from Noun Project
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
19
Code is a broad concept. It can refer to
the source code of our services and
systems. It can represent our
infrastructure. It can refer to the code
used to access, transform, or prepare
data. It can refer to the code used to
train, validate, and assess models.
Code
HOW DOES CODE
CHANGE VERSION?
Bug fixes
New features
Dependencies
Operating systems
Scope change
Etc.
This axis is the best-understood axis of
version change for machine learning
systems.
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
20
Model
Model here refers to both the modeling
approach (e.g. random forest
classification) and the actual artifact
produced (e.g. a pkl file). A model is a
product of hyperparameters (not varied
during training) and parameters
(computed during training).
HOW DOES A MODEL
CHANGE VERSION?
Research — trying new approaches
Dependencies
New data
Performance improvements
Etc.
This axis is not as well-understood and
machine learning, as a field, is changing so
rapidly that many models can be expected
to be obsolete in short order.
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
21
Data
DATA IS MORE COMPLICATED
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
22
Data
DATA IS MORE COMPLICATED
Random Sample by Becris from the Noun Project, Puzzle by
shashank singh from the Noun Project
Sampling Schema
Data is an assessment of the state of the universe at a point in time. It has a measurement and a shape.
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
23
Sampling
Sampling refers to the values in the data
themselves. We can’t measure everything,
but what we do measure is a sample of
the universe. These values have
properties like support, distribution,
cardinality, etc.
HOW DOES SAMPLING
CHANGE VERSION?
Time—it marches ever on
Extrinsic changes in the universe
Inserts, Deletions, Updates
Etc.
It is very difficult to specify or detect shifts
in “versions” here except by example. We
often think of this as moving forward with
time, but that versions are not necessarily
time-dependent.
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
24
Schema
Schema refers to the shape of our data.
This can be a database schema, e.g. a
table/column/constraint definition, or it
could just refer to shape of the incoming
data (e.g. JSON fields, XML schema, etc).
HOW DOES SCHEMA
CHANGE VERSION?
Software updates
Requirements changes
Migrations
Etc.
It is very difficult to specify or detect shifts
in “versions” here except by example. We
often think of this as moving forward with
time, but that versions are not necessarily
time-dependent.
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
25
ML SYSTEMS HAVE FOUR “VERSION AXES”
Any data-driven service can experience version drift along any of these axes independently or jointly.
Schema
Sampling
Model
Code
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
DIFFERENT WORKFLOWS
26
master
DEVELOPERS
DIFFERENT WORKFLOWS
27
master
change-max-depth
try-random-forest
©ThoughtWorks 2020
DATA SCIENTISTS
MORE TYPES OF PIPELINES
DATA PIPELINE
ML PIPELINE DEPLOYMENT
PIPELINE
PUTTING EVERYTHING TOGETHER
29
Data Science,
Model
Building
Training Data
Source Code
+
Executables
Model
Evaluation
Productionize
Model
Integration
Testing
Deployment
Test Data
Model +
parameters
CD Tools and Repositories
DiscoverableandAccessibleData
Monitoring
©ThoughtWorks 2020
Production Data
WHAT DO WE NEED IN OUR STACK?
30
Doing CD with Machine Learning is still a hard problem
MODEL
PERFORMANCE
ASSESSMENT
VERSION
CONTROL AND
ARTIFACT
REPOSITORIES
©ThoughtWorks 2020
MONITORING
AND
OBSERVABILITY
DISCOVERABLE
AND
ACCESSIBLE
DATA
CONTINUOUS
DELIVERY
ORCHESTRATION
TO COMBINE
PIPELINES
INFRASTRUCTURE
FOR MULTIPLE
ENVIRONMENTS
AND
EXPERIMENTS
WHAT DO WE NEED IN OUR STACK?
31
Doing CD with Machine Learning is still a hard problem
MODEL
PERFORMANCE
ASSESSMENT
VERSION
CONTROL AND
ARTIFACT
REPOSITORIES
©ThoughtWorks 2020
MONITORING
AND
OBSERVABILITY
DISCOVERABLE
AND
ACCESSIBLE
DATA
CONTINUOUS
DELIVERY
ORCHESTRATION
TO COMBINE
PIPELINES
INFRASTRUCTURE
FOR MULTIPLE
ENVIRONMENTS
AND
EXPERIMENTS
WHAT DO WE NEED IN OUR STACK?
32
Doing CD with Machine Learning is still a hard problem
MODEL
PERFORMANCE
ASSESSMENT
VERSION
CONTROL AND
ARTIFACT
REPOSITORIES
©ThoughtWorks 2020
MONITORING
AND
OBSERVABILITY
DISCOVERABLE
AND
ACCESSIBLE
DATA
CONTINUOUS
DELIVERY
ORCHESTRATION
TO COMBINE
PIPELINES
INFRASTRUCTURE
FOR MULTIPLE
ENVIRONMENTS
AND
EXPERIMENTS
DEMO
DEMO
33
Data Scientist
Develops ML Model
Test and productionize
the model
Deploy to production
servers©ThoughtWorks 2020
Application
34
YOUR OBJECTIVE FUNCTION MUST LINK TO
BUSINESS VALUE
35
©ThoughtWorks 2020
NDC Porto (Virtual!), April 23, 2020
3636
THANK YOU!
Arif Wider
awider@thoughtworks.com
Emily Gorcenski
egorcens@thoughtworks.com
©ThoughtWorks 2020
join.thoughtworks.com

Mais conteúdo relacionado

Mais procurados

The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsThe Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
Denodo
 

Mais procurados (20)

The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsThe Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
 
Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7Take your Data Management Practice to the Next Level with Denodo 7
Take your Data Management Practice to the Next Level with Denodo 7
 
Minimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data VirtualizationMinimizing the Complexities of Machine Learning with Data Virtualization
Minimizing the Complexities of Machine Learning with Data Virtualization
 
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
Big Data Fabric for At-Scale Real-Time Analysis by Edwin Robbins
 
Data Virtualization: From Zero to Hero (Middle East)
Data Virtualization: From Zero to Hero (Middle East)Data Virtualization: From Zero to Hero (Middle East)
Data Virtualization: From Zero to Hero (Middle East)
 
Multi-Cloud-Datenintegration mit Datenvirtualisierung
Multi-Cloud-Datenintegration mit DatenvirtualisierungMulti-Cloud-Datenintegration mit Datenvirtualisierung
Multi-Cloud-Datenintegration mit Datenvirtualisierung
 
Data Virtualization: From Zero to Hero
Data Virtualization: From Zero to HeroData Virtualization: From Zero to Hero
Data Virtualization: From Zero to Hero
 
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data StrategyDenodo DataFest 2017: Business Needs for a Fast Data Strategy
Denodo DataFest 2017: Business Needs for a Fast Data Strategy
 
The Virtualization of Clouds - The New Enterprise Data Architecture Opportunity
The Virtualization of Clouds - The New Enterprise Data Architecture OpportunityThe Virtualization of Clouds - The New Enterprise Data Architecture Opportunity
The Virtualization of Clouds - The New Enterprise Data Architecture Opportunity
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
Datamesh community meetup 28th jan 2021
Datamesh community meetup 28th jan 2021Datamesh community meetup 28th jan 2021
Datamesh community meetup 28th jan 2021
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Why Data Virtualization? An Introduction.
Why Data Virtualization? An Introduction.Why Data Virtualization? An Introduction.
Why Data Virtualization? An Introduction.
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data VirtualizationDenodo DataFest 2017: Conquering the Edge with Data Virtualization
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
 
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
 
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 

Semelhante a Continuous Intelligence: Keeping your AI Application in Production

Legacy Migration Overview
Legacy Migration OverviewLegacy Migration Overview
Legacy Migration Overview
Bambordé Baldé
 
Legacy Migration
Legacy MigrationLegacy Migration
Legacy Migration
WORPCLOUD LTD
 

Semelhante a Continuous Intelligence: Keeping your AI Application in Production (20)

Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
Limited Budget but Effective End to End MLOps Practices (Machine Learning Mod...
 
CD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systemsCD4ML and the challenges of testing and quality in ML systems
CD4ML and the challenges of testing and quality in ML systems
 
CAST Imaging: Map & Master Your Software
CAST Imaging: Map & Master Your SoftwareCAST Imaging: Map & Master Your Software
CAST Imaging: Map & Master Your Software
 
Production machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scaleProduction machine learning: Managing models, workflows and risk at scale
Production machine learning: Managing models, workflows and risk at scale
 
Legacy Migration Overview
Legacy Migration OverviewLegacy Migration Overview
Legacy Migration Overview
 
Legacy Migration
Legacy MigrationLegacy Migration
Legacy Migration
 
IBM Think Milano
IBM Think MilanoIBM Think Milano
IBM Think Milano
 
The REMICS model-driven process for migrating legacy applications to the cloud
The REMICS model-driven process for migrating legacy applications to the cloudThe REMICS model-driven process for migrating legacy applications to the cloud
The REMICS model-driven process for migrating legacy applications to the cloud
 
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...Extending open source and hybrid cloud to drive OT transformation - Future Oi...
Extending open source and hybrid cloud to drive OT transformation - Future Oi...
 
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdfDagster - DataOps and MLOps for Machine Learning Engineers.pdf
Dagster - DataOps and MLOps for Machine Learning Engineers.pdf
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
DataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestrationDataOps: Control-M's role in data pipeline orchestration
DataOps: Control-M's role in data pipeline orchestration
 
From Model-based to Model and Simulation-based Systems Architectures
From Model-based to Model and Simulation-based Systems ArchitecturesFrom Model-based to Model and Simulation-based Systems Architectures
From Model-based to Model and Simulation-based Systems Architectures
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Multi datastores - CLOSER'14
Multi datastores - CLOSER'14Multi datastores - CLOSER'14
Multi datastores - CLOSER'14
 
Notes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at ScaleNotes on Deploying Machine-learning Models at Scale
Notes on Deploying Machine-learning Models at Scale
 
Engineering 4.0: Digitization through task automation and reuse
Engineering 4.0:  Digitization through task automation and reuseEngineering 4.0:  Digitization through task automation and reuse
Engineering 4.0: Digitization through task automation and reuse
 
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar
[DSC Europe 22] Reproducibility and Versioning of ML Systems - Spela Poklukar
 

Mais de Dr. Arif Wider

Mais de Dr. Arif Wider (10)

Data Mesh - It's not about technology, it's about people
Data Mesh - It's not about technology, it's about peopleData Mesh - It's not about technology, it's about people
Data Mesh - It's not about technology, it's about people
 
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
 
Continuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production ReliablyContinuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production Reliably
 
Continuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionContinuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in Production
 
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
 
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
 
DataDevOps - A Manifesto on Shared Data Responsibility in Times of Microservices
DataDevOps - A Manifesto on Shared Data Responsibility in Times of MicroservicesDataDevOps - A Manifesto on Shared Data Responsibility in Times of Microservices
DataDevOps - A Manifesto on Shared Data Responsibility in Times of Microservices
 
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
 
A High-Performance Solution to Microservice UI Composition @ XConf Hamburg
A High-Performance Solution to Microservice UI Composition @ XConf HamburgA High-Performance Solution to Microservice UI Composition @ XConf Hamburg
A High-Performance Solution to Microservice UI Composition @ XConf Hamburg
 
An Unexpected Solution to Microservices UI Composition
An Unexpected Solution to Microservices UI CompositionAn Unexpected Solution to Microservices UI Composition
An Unexpected Solution to Microservices UI Composition
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 

Continuous Intelligence: Keeping your AI Application in Production

  • 1. 1 Continuous Intelligence Keeping your AI Application in Production Arif Wider Emily Gorcenski NDC Porto, April 23-24, 2020
  • 2. 7000+ technologists with 43 offices in 14 countries Partner for technology driven business transformation Barcelona - Madrid - London - Manchester - Berlin - Hamburg - Munich - Cologne
  • 3. #1 in Agile and Continuous Delivery 100+ books written ©ThoughtWorks 2020
  • 5. TECHNIQUES Continuous delivery for machine learning (CD4ML) models #8 TRIAL 8 ©ThoughtWorks 2020
  • 6. TECHNIQUES Continuous delivery for machine learning (CD4ML) models #8 TRIAL 8 ©ThoughtWorks 2020
  • 13. CONTINUOUS DELIVERY PIPELINES 13 Prediction Model Pipeline Web Application Pipeline
  • 15. NEW CHALLENGES 15 → How to version control training data? → Training data and prediction models don’t fit into Git :-( → Model re-training slows down the entire continuous delivery server! → Data scientists want to evaluate several solutions at the same time... → ...and they use analytics notebooks which are hard to version control! → How to unit test data science code that is tied to changing data? → How to prevent behaviour changes of the model to break the application?
  • 16. WHAT ARE THE REASONS? 16
  • 17. Continuous Delivery is the ability to get changes of all types — including new features, configuration changes, bug fixes and experiments — into production, or into the hands of users, safely and quickly in a sustainable way. Jez Humble & Dave Farley
  • 18. MANY SOURCES OF CHANGE 18 ModelData Code + + Schema Sampling over Time Volume ... Research, Experiments Training on New Data Performance ... New Features Bug Fixes Dependencies ... Icons created by Noura Mbarki and I Putu Kharismayadi from Noun Project ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 19. 19 Code is a broad concept. It can refer to the source code of our services and systems. It can represent our infrastructure. It can refer to the code used to access, transform, or prepare data. It can refer to the code used to train, validate, and assess models. Code HOW DOES CODE CHANGE VERSION? Bug fixes New features Dependencies Operating systems Scope change Etc. This axis is the best-understood axis of version change for machine learning systems. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 20. 20 Model Model here refers to both the modeling approach (e.g. random forest classification) and the actual artifact produced (e.g. a pkl file). A model is a product of hyperparameters (not varied during training) and parameters (computed during training). HOW DOES A MODEL CHANGE VERSION? Research — trying new approaches Dependencies New data Performance improvements Etc. This axis is not as well-understood and machine learning, as a field, is changing so rapidly that many models can be expected to be obsolete in short order. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 21. 21 Data DATA IS MORE COMPLICATED ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 22. 22 Data DATA IS MORE COMPLICATED Random Sample by Becris from the Noun Project, Puzzle by shashank singh from the Noun Project Sampling Schema Data is an assessment of the state of the universe at a point in time. It has a measurement and a shape. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 23. 23 Sampling Sampling refers to the values in the data themselves. We can’t measure everything, but what we do measure is a sample of the universe. These values have properties like support, distribution, cardinality, etc. HOW DOES SAMPLING CHANGE VERSION? Time—it marches ever on Extrinsic changes in the universe Inserts, Deletions, Updates Etc. It is very difficult to specify or detect shifts in “versions” here except by example. We often think of this as moving forward with time, but that versions are not necessarily time-dependent. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 24. 24 Schema Schema refers to the shape of our data. This can be a database schema, e.g. a table/column/constraint definition, or it could just refer to shape of the incoming data (e.g. JSON fields, XML schema, etc). HOW DOES SCHEMA CHANGE VERSION? Software updates Requirements changes Migrations Etc. It is very difficult to specify or detect shifts in “versions” here except by example. We often think of this as moving forward with time, but that versions are not necessarily time-dependent. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 25. 25 ML SYSTEMS HAVE FOUR “VERSION AXES” Any data-driven service can experience version drift along any of these axes independently or jointly. Schema Sampling Model Code ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 28. MORE TYPES OF PIPELINES DATA PIPELINE ML PIPELINE DEPLOYMENT PIPELINE
  • 29. PUTTING EVERYTHING TOGETHER 29 Data Science, Model Building Training Data Source Code + Executables Model Evaluation Productionize Model Integration Testing Deployment Test Data Model + parameters CD Tools and Repositories DiscoverableandAccessibleData Monitoring ©ThoughtWorks 2020 Production Data
  • 30. WHAT DO WE NEED IN OUR STACK? 30 Doing CD with Machine Learning is still a hard problem MODEL PERFORMANCE ASSESSMENT VERSION CONTROL AND ARTIFACT REPOSITORIES ©ThoughtWorks 2020 MONITORING AND OBSERVABILITY DISCOVERABLE AND ACCESSIBLE DATA CONTINUOUS DELIVERY ORCHESTRATION TO COMBINE PIPELINES INFRASTRUCTURE FOR MULTIPLE ENVIRONMENTS AND EXPERIMENTS
  • 31. WHAT DO WE NEED IN OUR STACK? 31 Doing CD with Machine Learning is still a hard problem MODEL PERFORMANCE ASSESSMENT VERSION CONTROL AND ARTIFACT REPOSITORIES ©ThoughtWorks 2020 MONITORING AND OBSERVABILITY DISCOVERABLE AND ACCESSIBLE DATA CONTINUOUS DELIVERY ORCHESTRATION TO COMBINE PIPELINES INFRASTRUCTURE FOR MULTIPLE ENVIRONMENTS AND EXPERIMENTS
  • 32. WHAT DO WE NEED IN OUR STACK? 32 Doing CD with Machine Learning is still a hard problem MODEL PERFORMANCE ASSESSMENT VERSION CONTROL AND ARTIFACT REPOSITORIES ©ThoughtWorks 2020 MONITORING AND OBSERVABILITY DISCOVERABLE AND ACCESSIBLE DATA CONTINUOUS DELIVERY ORCHESTRATION TO COMBINE PIPELINES INFRASTRUCTURE FOR MULTIPLE ENVIRONMENTS AND EXPERIMENTS DEMO
  • 33. DEMO 33 Data Scientist Develops ML Model Test and productionize the model Deploy to production servers©ThoughtWorks 2020 Application
  • 34. 34
  • 35. YOUR OBJECTIVE FUNCTION MUST LINK TO BUSINESS VALUE 35 ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  • 36. 3636 THANK YOU! Arif Wider awider@thoughtworks.com Emily Gorcenski egorcens@thoughtworks.com ©ThoughtWorks 2020 join.thoughtworks.com