O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Continuous Intelligence: Keeping your AI Application in Production

A talk by Arif Wider & Emily Gorcenski presented at NDC Porto '20

Abstract:
It is already challenging to transition a machine learning model or AI system from the research space to production, and maintaining that system alongside ever-changing data is an even greater challenge. In software engineering, Continuous Delivery practices have been developed to ensure that developers can adapt, maintain, and update software and systems cheaply and quickly, enabling release cycles on the scale of hours or days instead of weeks or months. Nevertheless, in the data science world Continuous Delivery is rarely been applied holistically.

This is partly due to different workflows: data scientists regularly work on whole sets of hypotheses, whereas software engineers work more linearly even when evaluating multiple implementation alternatives. Therefore, existing software engineering practices cannot be applied as-is to machine learning projects. Learn how we used our expertise in both fields to adapt practices and tools to allow for Continuous Intelligence–the practice of delivering AI applications continuously.

  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Continuous Intelligence: Keeping your AI Application in Production

  1. 1. 1 Continuous Intelligence Keeping your AI Application in Production Arif Wider Emily Gorcenski NDC Porto, April 23-24, 2020
  2. 2. 7000+ technologists with 43 offices in 14 countries Partner for technology driven business transformation Barcelona - Madrid - London - Manchester - Berlin - Hamburg - Munich - Cologne
  3. 3. #1 in Agile and Continuous Delivery 100+ books written ©ThoughtWorks 2020
  4. 4. ©ThoughtWorks 2020
  5. 5. TECHNIQUES Continuous delivery for machine learning (CD4ML) models #8 TRIAL 8 ©ThoughtWorks 2020
  6. 6. TECHNIQUES Continuous delivery for machine learning (CD4ML) models #8 TRIAL 8 ©ThoughtWorks 2020
  7. 7. 7 WHERE IT ALL STARTED ©ThoughtWorks 2020
  8. 8. 8 Data Scientists Developers
  9. 9. 9 Data Scientists Developers prediction model Feedback, requests
  10. 10. 10 Data Scientists Developers prediction model Feedback, requests SOUNDS LIKE DEV/OPS?
  11. 11. 11 Data Scientists Developers
  12. 12. 12 LET’S DO CONTINUOUS DELIVERY!
  13. 13. CONTINUOUS DELIVERY PIPELINES 13 Prediction Model Pipeline Web Application Pipeline
  14. 14. NEW CHALLENGES 14
  15. 15. NEW CHALLENGES 15 → How to version control training data? → Training data and prediction models don’t fit into Git :-( → Model re-training slows down the entire continuous delivery server! → Data scientists want to evaluate several solutions at the same time... → ...and they use analytics notebooks which are hard to version control! → How to unit test data science code that is tied to changing data? → How to prevent behaviour changes of the model to break the application?
  16. 16. WHAT ARE THE REASONS? 16
  17. 17. Continuous Delivery is the ability to get changes of all types — including new features, configuration changes, bug fixes and experiments — into production, or into the hands of users, safely and quickly in a sustainable way. Jez Humble & Dave Farley
  18. 18. MANY SOURCES OF CHANGE 18 ModelData Code + + Schema Sampling over Time Volume ... Research, Experiments Training on New Data Performance ... New Features Bug Fixes Dependencies ... Icons created by Noura Mbarki and I Putu Kharismayadi from Noun Project ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  19. 19. 19 Code is a broad concept. It can refer to the source code of our services and systems. It can represent our infrastructure. It can refer to the code used to access, transform, or prepare data. It can refer to the code used to train, validate, and assess models. Code HOW DOES CODE CHANGE VERSION? Bug fixes New features Dependencies Operating systems Scope change Etc. This axis is the best-understood axis of version change for machine learning systems. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  20. 20. 20 Model Model here refers to both the modeling approach (e.g. random forest classification) and the actual artifact produced (e.g. a pkl file). A model is a product of hyperparameters (not varied during training) and parameters (computed during training). HOW DOES A MODEL CHANGE VERSION? Research — trying new approaches Dependencies New data Performance improvements Etc. This axis is not as well-understood and machine learning, as a field, is changing so rapidly that many models can be expected to be obsolete in short order. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  21. 21. 21 Data DATA IS MORE COMPLICATED ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  22. 22. 22 Data DATA IS MORE COMPLICATED Random Sample by Becris from the Noun Project, Puzzle by shashank singh from the Noun Project Sampling Schema Data is an assessment of the state of the universe at a point in time. It has a measurement and a shape. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  23. 23. 23 Sampling Sampling refers to the values in the data themselves. We can’t measure everything, but what we do measure is a sample of the universe. These values have properties like support, distribution, cardinality, etc. HOW DOES SAMPLING CHANGE VERSION? Time—it marches ever on Extrinsic changes in the universe Inserts, Deletions, Updates Etc. It is very difficult to specify or detect shifts in “versions” here except by example. We often think of this as moving forward with time, but that versions are not necessarily time-dependent. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  24. 24. 24 Schema Schema refers to the shape of our data. This can be a database schema, e.g. a table/column/constraint definition, or it could just refer to shape of the incoming data (e.g. JSON fields, XML schema, etc). HOW DOES SCHEMA CHANGE VERSION? Software updates Requirements changes Migrations Etc. It is very difficult to specify or detect shifts in “versions” here except by example. We often think of this as moving forward with time, but that versions are not necessarily time-dependent. ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  25. 25. 25 ML SYSTEMS HAVE FOUR “VERSION AXES” Any data-driven service can experience version drift along any of these axes independently or jointly. Schema Sampling Model Code ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  26. 26. DIFFERENT WORKFLOWS 26 master DEVELOPERS
  27. 27. DIFFERENT WORKFLOWS 27 master change-max-depth try-random-forest ©ThoughtWorks 2020 DATA SCIENTISTS
  28. 28. MORE TYPES OF PIPELINES DATA PIPELINE ML PIPELINE DEPLOYMENT PIPELINE
  29. 29. PUTTING EVERYTHING TOGETHER 29 Data Science, Model Building Training Data Source Code + Executables Model Evaluation Productionize Model Integration Testing Deployment Test Data Model + parameters CD Tools and Repositories DiscoverableandAccessibleData Monitoring ©ThoughtWorks 2020 Production Data
  30. 30. WHAT DO WE NEED IN OUR STACK? 30 Doing CD with Machine Learning is still a hard problem MODEL PERFORMANCE ASSESSMENT VERSION CONTROL AND ARTIFACT REPOSITORIES ©ThoughtWorks 2020 MONITORING AND OBSERVABILITY DISCOVERABLE AND ACCESSIBLE DATA CONTINUOUS DELIVERY ORCHESTRATION TO COMBINE PIPELINES INFRASTRUCTURE FOR MULTIPLE ENVIRONMENTS AND EXPERIMENTS
  31. 31. WHAT DO WE NEED IN OUR STACK? 31 Doing CD with Machine Learning is still a hard problem MODEL PERFORMANCE ASSESSMENT VERSION CONTROL AND ARTIFACT REPOSITORIES ©ThoughtWorks 2020 MONITORING AND OBSERVABILITY DISCOVERABLE AND ACCESSIBLE DATA CONTINUOUS DELIVERY ORCHESTRATION TO COMBINE PIPELINES INFRASTRUCTURE FOR MULTIPLE ENVIRONMENTS AND EXPERIMENTS
  32. 32. WHAT DO WE NEED IN OUR STACK? 32 Doing CD with Machine Learning is still a hard problem MODEL PERFORMANCE ASSESSMENT VERSION CONTROL AND ARTIFACT REPOSITORIES ©ThoughtWorks 2020 MONITORING AND OBSERVABILITY DISCOVERABLE AND ACCESSIBLE DATA CONTINUOUS DELIVERY ORCHESTRATION TO COMBINE PIPELINES INFRASTRUCTURE FOR MULTIPLE ENVIRONMENTS AND EXPERIMENTS DEMO
  33. 33. DEMO 33 Data Scientist Develops ML Model Test and productionize the model Deploy to production servers©ThoughtWorks 2020 Application
  34. 34. 34
  35. 35. YOUR OBJECTIVE FUNCTION MUST LINK TO BUSINESS VALUE 35 ©ThoughtWorks 2020 NDC Porto (Virtual!), April 23, 2020
  36. 36. 3636 THANK YOU! Arif Wider awider@thoughtworks.com Emily Gorcenski egorcens@thoughtworks.com ©ThoughtWorks 2020 join.thoughtworks.com

×