O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics

A talk given by Dr. Arif Wider (ThoughtWorks) and Sebastian Herold (Zalando) at OOP 2018 in Munich.

Abstract:
More and more companies migrate their monolithic applications to a microservices architecture. However, maintaining a consistent and usable data landscape has only become more challenging by this: huge amounts of structured and unstructured data, and hundreds of data sources.
Furthermore, data-driven product development multiplies the analytics requirements: every product team needs constantly updated and specially tailored metrics which often combine product specific data with company wide data.
Having a centralized data team does not scale in this setting as it becomes the bottleneck between data producers and data consumers.
We created a Manifesto of seven principles which break with traditional separation of roles and show a path how to deal with distributed data in a federal and scalable fashion. This leads to DataDev: a culture shift similar to DevOps in which application developers own their data and take over responsibilities for data & analytics.
Learn about our experiences and best practices with facilitating this cultural transformation at Scout24, the provider of Europe’s largest online markets for cars and real estate.

  • Seja o primeiro a comentar

DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics

  1. 1. DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics Dr. Arif Wider & Sebastian Herold Munich, Feb 7th, 2018
  2. 2. Seite 2 Dr. Arif Wider - Senior Consultant/Dev - Scala/FP enthusiast - ThoughtWorks Germany data strategy group @arifwider Sebastian Herold - Chief Data Architect @Scout @Scout24 until Dec - BigData Architect @Zalando from Jan - Data Evangelist @heroldamus
  3. 3. Seite 3 Road to MicroService Architecture – How we started in 2007 BI Tool Middle Tier DWH Staging Core DB CRM DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider 2007 Web Tier Analyst BI Dev
  4. 4. Seite 4 Road to MicroService Architecture – How things got complicated in 2011 BI Tool Middle Tier DWH Staging Core DB CRM Web 2011 API APP $$$ APPMySQL Analyst BI Dev DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  5. 5. APPMySQL APPMySQL APPMySQL Seite 5 Road to MicroService Architecture – How we sliced the monolith in 2013 BI Tool DWH StagingCRM Web 2013 API APPMySQL Core DB EXP Mongo SEA Elastic Sync APP APIAPI API HADOOP REST API Analyst BI Dev DE DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  6. 6. AWS APP APP APP APPMySQL APPMySQL APPMySQL Seite 6 Road to MicroService Architecture – How a central data team doesn’t scale BI Tool DWH StagingCRM Web 2015 API APPMySQL Core DB EXP Mongo SEA Elastic Sync APP APIAPIAPI HADOOP REST API APPAPP Analyst BI Dev DE DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  7. 7. Core DB APPAPPAPPAPPAPPAPPAPPAPPAPP AWS Seite 7 Road to MicroService Architecture – How we rearchitectured our Data Landscape BI Tool DWH Central Data Lake on S3 CRM 2017 Core DB APP REST API Analyst DE BI Dev APPAPPAPP DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  8. 8. Seite 8 Scout24 wants to become a truly data-driven company Fast & easy data-driven product development… …supported by Data & Analytics DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  9. 9. Seite 9 Scout24 wants to become a truly data-driven company Everywhere in the company... ...without bloating up D‘n‘A Image source: https://www.oddsemiconductorservices.com/ DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  10. 10. Seite 10 SCOUT24 DATA LANDSCAPE MANIFESTO ROLES, RESPONSIBILITIES, AND VALUES FOR A DATA-DRIVEN COMPANY AT SCALE DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  11. 11. Seite 11 SCOUT24 DATA LANDSCAPE MANIFESTO #1 Preamble Data is a key asset of our company. SCOUT24 DATA LANDSCAPE MANIFESTO DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  12. 12. Seite 12 #2 Our Responsibility We, Data & Analytics, are responsible for providing a solid Data Platform as well as clear guidelines and training how to participate in the Data Landscape. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform D’n’A Data Landscape DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  13. 13. Seite 13 SCOUT24 DATA LANDSCAPE MANIFESTO #3 Data Autonomy, Not Anarchy Data autonomy puts data producers & data consumers in control of their data & of their metrics and thereby allows us to be data-driven at scale, but this comes with responsibility. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Data Producer Consumer D’n’A Data Landscape DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  14. 14. Seite 14 Roles & Responsibilities Central Data Lake on S3 Checkout service DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider Special offer service D’N’A Producer Consumer Data Catalog D’n’A
  15. 15. Seite 15 SCOUT24 DATA LANDSCAPE MANIFESTO #4 Producer’s Responsibility Data producers are responsible for publishing data to the central Data Lake, for the data's quality, and for publishing metadata that makes it easy to find and consume the data. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Metadata Data Producer D’n’A Data Landscape DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  16. 16. Data Catalog Seite 16 Roles & Responsibilities Central Data Lake on S3 Checkout service DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider order events Special offer service Producer Consumer D’n’A
  17. 17. Data Catalog Seite 17 Roles & Responsibilities Central Data Lake on S3 Checkout service DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider order events Special offer service Ingestion Template Producer Consumer D’n’A
  18. 18. Seite 18 SCOUT24 DATA LANDSCAPE MANIFESTO #5 Consumer’s Responsibility Data consumers are responsible for the definition & visualization of metrics and for driving the imple- mentation and maintenance of these metrics. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Producer Consumer D’n’A Data Landscape DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  19. 19. Data Catalog Seite 19 Roles & Responsibilities Central Data Lake on S3 Checkout service DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider order events Special offer service View: order history by userIngestion Template Producer Consumer D’n’A
  20. 20. Seite 20 SCOUT24 DATA LANDSCAPE MANIFESTO #6 Exception: Core KPIs We, Data & Analytics, take the full ownership and responsibility of the few top company-wide core KPIs. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Producer Consumer D’n’A Data Landscape Core metric DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  21. 21. Data Catalog Seite 21 Roles & Responsibilities BI Tool Central Data Lake on S3 Analyst Checkout service DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider order events Special offer service View: order history by user View: revenue generated from orders by segments Ingestion Template Producer Consumer D’n’A
  22. 22. Seite 22 SCOUT24 DATA LANDSCAPE MANIFESTO #7 Transparency Over Continuity We value data transparency over data continuity, which means we may break metric comparability if it is for the cause of enabling better insights. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Producer Consumer D’n’A Data Landscape Core metric DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  23. 23. Seite 23 SCOUT24 DATA LANDSCAPE MANIFESTO The Ultimate Goal SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Metadata Data Producer Consumer D’n’A Data Landscape Core metric Data products A federal landscape of data producers and consumers with just enough rules to ensure seamless co- operation without severely impeding autonomy. DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  24. 24. Seite 24 Consequences for Product Development Teams? - Think about data & reporting - Deliver your data to the lake - Provide meta data (schema, descriptions, versions) - Eat your own dog food: Consume your own data for reporting -> take responsibility for data quality DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  25. 25. Seite 25 Benefits for Product Development Teams? - Independently work with data - No dependencies to data teams - Company data is curated and it’s easy to consume data produced by other teams DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  26. 26. DevOps Seite 26 #DataDevOps DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  27. 27. Seite 27 Learnings and lessons  Publish exhaustive, general, and denormalized event data  Avoid consumer-specific tailoring of data you publish  Consume your own data, e.g. for KPI reports  Try out ad-hoc analytics notebooks to get better insights  Inform data producers, if you rely on their data  Invest in documentation and guidelines for your data platform to keep your effort for support low DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
  28. 28. www.scout24.com Thanks! Questions? Sebastian Herold Arif Wider

×