O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Continuous Intelligence: Keeping your AI Application in Production

A talk by Emily Gorcenski and Arif Wider presented a Strata Data Conference 2019 in London.

Abstract:
It’s already challenging to transition a machine learning model or AI system from the research space to production, and maintaining that system alongside ever-changing data is an even greater challenge. In software engineering, continuous delivery practices have been developed to ensure that developers can adapt, maintain, and update software and systems cheaply and quickly, enabling release cycles on the scale of hours or days instead of weeks or months.

Nevertheless, in the data science world, continuous delivery is rarely applied holistically—due in part to different workflows: data scientists regularly work on whole sets of hypotheses, whereas software engineers work more linearly even when evaluating multiple implementation alternatives. Therefore, existing software engineering practices cannot be applied as is to machine learning projects.

Arif Wider and Emily Gorcenski explore continuous delivery (CD) for AI/ML along with case studies for applying CD principles to data science workflows. Join in to learn how they drew on their expertise to adapt practices and tools to allow for continuous intelligence—the practice of delivering AI applications continuously.

Livros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo

Audiolivros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Continuous Intelligence: Keeping your AI Application in Production

  1. 1. 5500 technologists with 40 offices in 14 countries Partner for technology driven business transformation London - Hamburg - Manchester - Berlin - Barcelona - Munich - Cologne - Madrid - Stuttgart
  2. 2. #1 in Agile and Continuous Delivery 100+ books written
  3. 3. #8 TRIAL 8
  4. 4. “Continuous Delivery is the ability to get — including new features, configuration changes, bug fixes and experiments — into production, or into the hands of users, in a ”
  5. 5. → → → → → → →
  6. 6. CI CD
  7. 7. TO MAKE MY LIFE EASIER!
  8. 8. 11
  9. 9. Continuous Delivery isn’t a technology or a tool; it is a practice and a set of principles. Quality is built into software and improvement is always possible. But machine learning systems have unique challenges; unlike deterministic software, it is difficult—or impossible—to understand the behavior of data-driven intelligent systems. This poses a huge challenge when it comes to deploying machine learning systems in accordance with CD principles. ● ● ● ● ● ● ● ●
  10. 10. ModelData Code + + Schema Sampling over Time Volume ... Research, Experiments Training on New Data Performance ... New Features Bug Fixes Dependencies ... Icons created by Noura Mbarki and I Putu Kharismayadi from Noun Project
  11. 11. Code ● ● ● ● ● ●
  12. 12. Model hyperparameters parameters ● ● ● ● ●
  13. 13. Data Sampling Schema Data is an assessment of the state of the universe at a point in time. It has a measurement and a shape.
  14. 14. Sampling support distribution cardinality ● ● ● ●
  15. 15. Schema ● ● ● ●
  16. 16. Any data-driven service can experience version drift along any of these axes independently or jointly. Schema Sampling Model Code
  17. 17. Our code, data, and models should be versioned and shareable without unnecessary work. ■ Data and models can be very large ■ Data can vary invisibly ■ Data scientists need to share work and it must be repeatable What are the challenges? ■ Large artifacts stored in arbitrary storage linked to source repo ■ Data scientists can encode work process at repeat in one step What does the ideal solution look like? ■ Storage: S3, HDFS, etc ■ Git LFS ■ Shell scripts ■ dvc ■ Pachyderm ■ jupyterhub What solutions are out there now?
  18. 18. We should be able to scale model development to try multiple modeling approaches simultaneously. ■ Hyperparameter tuning and model selection is hard ■ Tracking performance depends on other moving parts (e.g. data) What are the challenges? ■ Links models to specific training sets and parameter sets ■ Is differentiable ■ Allows visualization of results What does the ideal solution look like? ■ dvc ■ MLFlow ■ etc... What solutions are out there now?
  19. 19. Our code, data, and models should be versioned and shareable without unnecessary work. ■ We should be monitoring the health and quality of our production ML system ■ Deployment is usually deeply coupled to code What are the challenges? ■ One-click deployment ■ Easy, context-aware logging ■ Performance can be benchmarked to research results What does the ideal solution look like? ■ Elasticsearch ■ Kibana ■ Logstash ■ CircleCI ■ GoCD ■ etc. What solutions are out there now?
  20. 20. Machine learning may automate things but it does not automatically generate value. ■ It is hard to map data science investment to business value ■ We should be extracting continuous insights from data What are the challenges? ■ KPIs can be visualized and linked to work efforts ■ Patterns should emerge ■ New questions arise What does the ideal solution look like? ■ BI tools (e.g. Tabelau, Chartio, SAS, etc.) ■ Work tracking and documentation tools (Github, JIRA, Confluence) What solutions are out there now?
  21. 21. Doing CD with Machine Learning is still a hard problem MODEL PERFORMANCE ASSESSMENT TRACKING VERSION CONTROL AND ARTIFACT REPOSITORIES MODEL MONITORING AND OBSERVABILITY DISCOVERABLE AND ACCESSIBLE DATA CONTINUOUS DELIVERY ORCHESTRATION TO COMBINE PIPELINES INFRASTRUCTURE FOR MULTIPLE ENVIRONMENTS AND EXPERIMENTS
  22. 22. Data Science, Model Building Training Data Source Code + Executables Model Evaluation Productionize Model Integration Testing Deployment Test Data Model + parameters CD Tools and Repositories DiscoverableandAccessibleData Monitoring Production Data
  23. 23. There are many options for tools and technologies to implement CD4ML

×