O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics

A talk by Sebastian Herold & Dr. Arif Wider at TDWI 2018 Munich.

Abstract:
More and more companies migrate their monolithic applications to a microservices architecture. However, maintaining a consistent and usable data landscape has only become more challenging by this: huge amounts of structured and unstructured data, and hundreds of data sources.
Furthermore, data-driven product development multiplies the analytics requirements: every product team needs constantly updated and specially tailored metrics which often combine product specific data with company wide data.
Having a centralized data team does not scale in this setting as it becomes the bottleneck between data producers and data consumers.
We created a Manifesto based on five general themes which break with traditional separation of roles and show a path how to deal with distributed data in a federal and scalable fashion. This leads to DataDev: a culture shift similar to DevOps in which application developers own their data and take over responsibilities for data & analytics.
Learn about our experiences and best practices with facilitating this cultural transformation at Zalando, one of Europe's largest online fashion platforms.

  • Seja o primeiro a comentar

DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics

  1. 1. #DataDevOps A MANIFESTO FOR A DEVOPS-LIKE CULTURE SHIFT IN DATA & ANALYTICS SEBASTIAN HEROLD DR. ARIF WIDER 2018-06-26 MUNICH
  2. 2. 2 Sebastian Herold Big Data Architect @ Zalando @heroldamus Previously 7 years @ Scout24 TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  3. 3. 4 Data Challenges Data Manifesto AI Empowerment Data Architecture Data-Driven Company AGENDA DataDevOps Culture TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  4. 4. 5 > 300,000 product choices as at June 2018 ZALANDO IN NUMBERS ~4.5billion EURO revenue 2017 > 75% of visits via mobile devices > 200 million visits per month > 23 millionactive customers > 15,000 employees in Europe 17 countries ~ 2,000 brands TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  5. 5. 6 ZALANDO IN NUMBERS GB/s on Kafka read >2 People in Tech >2000 Dev Teams >250 MSTR User >2000 AWS Accounts >260 Data Scientists >150 TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  6. 6. 7 DATA CHALLENGES FRAUD TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  7. 7. 8 DATA CHALLENGES PRICING & FORECASTING TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  8. 8. 9 DATA CHALLENGES PERSONALISATION TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  9. 9. 10 DATA CHALLENGES SIZING TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  10. 10. 11 DATA CHALLENGESVISUAL SEARCH TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  11. 11. 12 DATA CHALLENGES MANY MORE
  12. 12. 13 AI EMPOWERMENT INSTITUTIONALISING MACHINE LEARNING AT SCALE TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  13. 13. INNOVATION TOP-DOWN vs BOTTOM-UP
  14. 14. 15 TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider BEGINNERS EXPERTSAI MATURITY 2017 2018 2019 AI SKILLS SHIFT TIME
  15. 15. 16 FACETS OF INSTITUTIONALISING Processes InfrastructureData Quality Education Marketing Serving EventsData Metadata Sharability Compliance ConnectivityGuilds TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  16. 16. 17 MSTR Learn TEAMAIMATURITY Define Explore Extract Model Serve Observe “LEVEL ZERO” ANALYTICS & REPORTING AI EXPERTS DATA PRODUCT JOURNEY Basic Training Offers BI Consulting AI Consulting AI Literacy Training Expert Training Data Science Guild MS Excel Data Catalog incl. meta data SQL Engine / SuperSet Kafka Jupyter Notebook Hub ETL RStudio Shiny Spark DIFFERENT OFFERS FOR DIFFERENT PEOPLE & STEPS TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  17. 17. 18 WHERE WE CAME FROM: DISTRIBUTED DATA PLATFORMS ZALON’S DATA PLATFORM FASHION STORE’S DATA PLATFORM OTHER BUSINESS UNIT’S DATA PLATFORMBI PLATFORM OTHER BUSINESS UNIT’S DATA PLATFORM CENTRAL DATA PLATFORM TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  18. 18. 19 WHERE WE WANT TO GO: INTEGRATED DATA PLATFORM FASHION STORE’S DATA PLATFORM OTHER BUSINESS UNIT’S DATA PLATFORM OTHER BUSINESS UNIT’S DATA PLATFORM CENTRAL DATA PLATFORM ZALON’S DATA PLATFORM TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  19. 19. 20 DATA PLATFORM DESIGN PRINCIPLES CLOUD FIRST DATA FRESHNESS & QUALITY MERGE ANALYTICS AND DATA SCIENCE EMPOWER CONSUMERS AND PRODUCERS INNOVATION SCALABILITY FLEXIBILITY STREAMING MICRO-BATCHING BI AI SELF-SERVICE RESPONSIBILITIES TOOLING ROLES METADATA PROCESSES TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  20. 20. 21 DATA PLATFORM ARCHITECTURE DBs MicroServices GAP/Others Data Sources
  21. 21. 22 DATA PLATFORM ARCHITECTURE DBs MicroServices GAP/Others Data Sources Ingestion Event-Bus Batch or Delta Loads + CDC Connectors Data Gateway
  22. 22. 23 DBs MicroServices GAP/Others Data Sources Ingestion Storage Event-Bus Batch or Delta Loads + CDC Connectors Data Storage Data Catalog Data Gateway Model Repo DATA PLATFORM ARCHITECTURE Metadata Flow Data Flow
  23. 23. 24 DBs MicroServices GAP/Others Data Sources Ingestion Storage Processing Event-Bus Batch or Delta Loads + CDC Connectors Data Storage Data Catalog Orchestration Batch Process. Acceleration Layer SQL Engine Stream Process. Data Gateway Model Repo Model Serving DATA PLATFORM ARCHITECTURE Metadata Flow Data Flow
  24. 24. 25 DATA PLATFORM ARCHITECTURE DBs MicroServices GAP/Others Data Sources Ingestion Storage Processing Event-Bus Batch or Delta Loads + CDC Connectors Data Storage Data Catalog Orchestration Batch Process. Acceleration Layer BI Tools SQL/Apps Notebooks Data Catalog UI SQL Engine Stream Process. Data Gateway Model Repo Model Serving Access Metadata Flow Data Flow
  25. 25. 26 DATA PLATFORM ARCHITECTURE Governance Processes & Glossary DBs MicroServices GAP/Others Data Sources Ingestion Storage Processing Event-Bus Batch or Delta Loads + CDC Connectors Data Storage Data Catalog Orchestration Batch Process. Acceleration Layer BI Tools SQL/Apps Notebooks Data Catalog UI SQL Engine Stream Process. Data Gateway Model Repo Model Serving Access Metadata Flow Data Flow
  26. 26. 27 MERGE OF BI AND DATA SCIENCE JOURNEY BI Product Journey AI Product Journey≈ Learn Define Explore Extract Model Serve Observe TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  27. 27. 28 MERGE OF BI AND DATA SCIENCE JOURNEY BI Product Journey AI Product Journey≈ Explore Extract Model Serve ObserveLearn Define LET’S FOCUS ON THE TECHNICAL PART ! TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  28. 28. 29 BI Product Journey TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  29. 29. 30 BI PRODUCT JOURNEY EXPLORATION Explore Extract Model Serve Observe Data Catalog UI
  30. 30. 31 BI PRODUCT JOURNEY EXTRACTION Explore Extract Model Serve Observe
  31. 31. 32 BI PRODUCT JOURNEY MODELING Explore Extract Model Serve Observe
  32. 32. 33 BI PRODUCT JOURNEY SERVING Explore Extract Model Serve Observe
  33. 33. 34 BI PRODUCT JOURNEY OBSERVING Explore Extract Model Serve Observe
  34. 34. 35 AI Product Journey TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  35. 35. 36 AI PRODUCT JOURNEY EXPLORATION Explore Extract Model Serve Observe Data Catalog UI
  36. 36. 37 AI PRODUCT JOURNEY EXTRACTION Explore Extract Model Serve Observe
  37. 37. 38 AI PRODUCT JOURNEY Explore Extract Model Serve Observe Model Repo MODELING
  38. 38. 39 Panda Serving AI PRODUCT JOURNEY SERVING Explore Extract Model Serve Observe
  39. 39. 40 AI PRODUCT JOURNEY OBSERVING Explore Extract Model Serve Observe
  40. 40. 41 TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider THE DATA DRIVEN COMPANY “The McKinsey Global Institute indicates that data driven organizations are 23 times more likely to acquire customers, 6 times as likely to retain those customers, and 19 times as likely to be profitable as a result.” What does “data driven” mean ● Data is a key asset of the company ● All decisions in the company (products and processes) are data-driven, i.e. based on objective data insights ● Data Analytics and Data Science are common place in the company ● Company-wide data-architecture in place ● Company-wide data governance rules in place Source: https://www.mckinsey.com/business-functions/marketing-and-sales/our-insights/five-facts-how-customer-analytics-boosts-corporate-performance
  41. 41. 42 DATA MANIFESTO THEMES FOR A DATA-DRIVEN COMPANY AT SCALE TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  42. 42. 43 M ETRIC CONSUMER DATA LANDSCAPE DATA PRODUCER THEMES FOR DATA AT SCALE TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  43. 43. 44 M ETRIC CONSUMER DATA LANDSCAPE DATA PRODUCER THEMES FOR DATA AT SCALE AutonomyAutonomy Alignment Ownership Platform Transparency TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  44. 44. 45 M ETRIC CONSUMER DATA LANDSCAPE DATA PRODUCER THEMES FOR DATA AT SCALE Autonomy TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  45. 45. 46 M ETRIC CONSUMER DATA LANDSCAPE DATA PRODUCER THEMES FOR DATA AT SCALE Autonomy Alignment TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  46. 46. 47 M ETRIC CONSUMER DATA LANDSCAPE DATA PRODUCER THEMES FOR DATA AT SCALE Autonomy Alignment Ownership TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  47. 47. 48 M ETRIC CONSUMER DATA PLATFORM DATA LANDSCAPE DATA PRODUCER THEMES FOR DATA AT SCALE Autonomy Alignment Ownership Platform TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  48. 48. 49 M ETRIC CONSUMER DATA PLATFORM DATA LANDSCAPE DATA PRODUCER THEMES FOR DATA AT SCALE Autonomy Alignment Ownership Platform Transparency TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  49. 49. 50 AWSCENTRAL DATA LAKE ON S3 ROLES & RESPONSIBILITIES DATA CATALOG DATA INFRA CHECKOUT SERVICE PRODUCER SPECIAL OFFER SERVICE CONSUMER TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  50. 50. 51 AWSCENTRAL DATA LAKE ON S3 ROLES & RESPONSIBILITIES DATA CATALOG DATA INFRA ORDER EVENTS EVENT METADATA CHECKOUT SERVICE PRODUCER SPECIAL OFFER SERVICE CONSUMER TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  51. 51. 52 AWSCENTRAL DATA LAKE ON S3 ROLES & RESPONSIBILITIES ORDER EVENTS EVENT METADATA CHECKOUT SERVICE DATA CATALOG PRODUCER DATA INFRA INGESTION TEMPLATE SPECIAL OFFER SERVICE CONSUMER TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  52. 52. 53 AWSCENTRAL DATA LAKE ON S3 ROLES & RESPONSIBILITIES ORDER EVENTS EVENT METADATA CHECKOUT SERVICE DATA CATALOG PRODUCER DATA INFRA INGESTION TEMPLATE VIEW: ORDER HISTORY BY USER SPECIAL OFFER SERVICE CONSUMER TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  53. 53. 54 DATADEVOPS A CULTURE OF DISTRIBUTED RESPONSIBILITIES ABOUT DATA & ANALYTICS TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  54. 54. 55 DATADEVOPS WHAT IS DEVOPS? Distributed Ops skills Shared Ops responsibilities Self-service platforms Cross-functional dev teams TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  55. 55. 56 DATADEVOPS WHAT IS DATADEVOPS? Distributed Data skills Shared Data responsibilities Self-service Data platform Cross-functional product teams TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  56. 56. 57 Consequences for Product Teams ‣ Think about data & reporting ‣ Deliver your data to the lake ‣ Provide meta data ‣ Eat your own dog food: Consume your own data TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  57. 57. 58 Benefits for Product Teams ‣ Independently work with data ‣ No dependencies to data teams ‣ It’s easy to consume data produced by other teams ‣ Faster product & measurement iterations TDWI’18 Munich - DataDevOps - Sebastian Herold & Arif Wider
  58. 58. THANKS! QUESTIONS? A MANIFESTO FOR A DEVOPS-LIKE CULTURE SHIFT IN DATA & ANALYTICS SEBASTIAN HEROLD DR. ARIF WIDER 2018-06-26 MUNICH

×