Zalando transitioned from a centralized data platform to a data mesh architecture. This decentralized their data infrastructure by having individual domains own datasets and pipelines rather than a central team. It provided self-service data infrastructure tools and governance to enable domains to operate independently while maintaining global interoperability. This improved data quality by making domains responsible for their data and empowering them through the data mesh approach.
SMOTE and K-Fold Cross Validation-Presentation.pptx
How Europe's Leading Fashion Site Implements Data Mesh
1. Data Mesh in Practice
Max Schultze - max.schultze@zalando.de
Arif Wider - awider@thoughtworks.com
17-11-2020
How Europe’s Leading
Online Platform for Fashion
Goes Beyond the Data Lake
@mcs1408 @arifwider
2. 2
Max Schultze
● Lead Data Engineer
● MSc in Computer Science
● Took part in early
development of Apache Flink
● Retired semi-professional
Magic: the Gathering player
Who are we?
Arif Wider
● Software engineering professor (full)
at HTW Berlin, Germany
● Fellow technology consultant with
ThoughtWorks Germany (part-time)
● Former Head of AI at ThoughtWorks
● Coffee geek
11. 11
Centralization Challenges
Datasets provided by central data infrastructure team
● Lack of ownership
Data pipelines operated by central data infrastructure team
● Lack of quality
Organizational scaling
● Central team becomes the bottleneck
18. 18
What is Data Mesh?
Old wine applied to new bottles…
→ Product Thinking
→ Domain-Driven Distributed Architecture
→ Infrastructure as a Platform
… creates value from Data
https://martinfowler.com/articles/data-monolith-to-mesh.html by Zhamak Dehghani
19. 19
Data as a Product
Data
Product
What is my market?
What are the desires of
my customers?
What “price” is justified?
How to do marketing?
What’s the USP?
Are my customers happy?
23. 23
...backed by domain-agnostic self-service data infrastructure
Data Infra as a Platform
Discoverable
Addressable
Self-describing
Trustworthy
Interoperable
Secure
Domain
→
Aggregated
Domain
24. 24
It’s a mindset shift
FROM TO
Centralized ownership Decentralized ownership
Pipelines as first class concern Domain Data as first class concern
Data as a by-product Data as a Product
Siloed Data Engineering Team Cross-functional Domain-Data Teams
Centralized Data Lake / Warehouse Ecosystem of Data Products
27. 27
Recap:
● From Bottleneck to Infra Platform
● From Data Monolith to Interoperable Services
Data Mesh in Practice
Data Infra as a Platform
central
data
platform
32. 32
Central Services with Global Interoperability
Decentralized ownership does not imply decentralized infrastructure!
Interoperability is created through convenient solutions of a self service platform.
Decentral Storage Central Infrastructure
Decentral Ownership Central Governance
40. 40
Some Numbers
● 40 teams using BYOB
● 100 teams using the processing platform
Processing Platform
41. 41
Some Numbers
● 40 teams using BYOB
● 100 teams using the processing platform
● First curated data teams
Data Products
On Data Products
On Data Products
Processing Platform
42. 42
Some Numbers
● 40 teams using BYOB
● 100 teams using the processing platform
● First curated data teams
● 0 operational effort for the central team
Data Products
On Data Products
On Data Products
Processing Platform
43. 43
Some Numbers
● 40 teams using BYOB
● 100 teams using the processing platform
● First curated data teams
● 0 operational effort for the central team
Data Products
On Data Products
On Data Products
Processing Platform
It’s a journey ;)
47. 47
“Off the shelf” data tooling
De-centralized archiving
De-centralized GDPR deletion tooling
48. 48
“Off the shelf” data tooling
Template driven data preparation
De-centralized archiving
De-centralized GDPR deletion tooling
49. 49
Data Mesh in Practice
How Europe’s Leading
Online Platform for Fashion
Goes Beyond the Data Lake
Max Schultze
max.schultze@zalando.de
@mcs1408
Arif Wider
awider@thoughtworks.com
@arifwider