How a Data Mesh is Driving our Platform | Trey Hicks, Gloo

How the Data Mesh
is Driving Our
Platform
Trey Hicks
Director of Engineering

• Mentors
• Faith
• Recovery Centers
• Resources
Applications That Help People
Building Technologies To Connect People

• Diverse application types and purpose
• Serving several verticals
• Varying resource needs
• Apps are built internally by Gloo
or with partners
• Common means of connectivity to
data and services
Supporting The Mission
Common Platform Must Consider

Technical Landscape
• Microservices
• Datastores per service or
application domains
• Domain based services
• Event Driven
• Domain Driven
• Kubernetes
• AWS
• Confluent Cloud
Ø Kafka
Ø KsqlDB
• Kafka Connect cluster
• Docker
Our Approach Consists of
Architectural Infrastructure

• Heterogeneous apps
• Resource contention
• Gravitational pull to put application use-cases lower in the stack
• Tight coupling due to customization of shared services
• Blocking development due to cross-team dependencies
• Limits to our ability to scale the organization
Challenges
Challenges in Building the Platform

v Our value prop isn’t the applications, it’s the data
v Application specific use-cases low in the stack
causes problems
Platform Facts

Enter Data Mesh
• Domain-driven architecture
• Data as a product
• Self-serve architecture
• Governance
Zhamak Dehghani
https://martinfowler.com/articles/data-monolith-
to-mesh.html
Perhaps the ideas have existed before
• Data Emphasis
• Domain Driven Design
• Service Oriented Architectures
Provides terminology to shift the
conversation UPWARDS to form a
BROAD data strategy as opposed to
being a technical concern
Principles
Data Mesh Paradigm

Solving the Challenges
Domain-Driven
Architecture
Principle Appeal Solves
Data As a Product
Self-Serve
Infrastructure
Governance
• Microservice
architecture
• Primary value
• Apps are transient
• Easy connectivity to
data and domains
• Secure data ports
• Community trust
• Privacy
• Many apps
• Resource contention
• App requirements in
core services
• Blocking development
• Tight coupling
• App requirements in
stack
• Tight coupling

Adopting The Principles
• Establish common terminology and language
• Promote a data first philosophy
• Embrace democratized ownership and the associated responsibilities
• Acceptance of eventual consistency
• In our case, embracing event streams
Culture Shift

Data As a Product
How We Define Data Products
• Our data is our unique value
• Foundation for apps and services that drive success
• Requires governance
Ø Security
Ø Availability
Ø Accessibility
Ø Change controls
• Free of application use-cases
• Integrity

• Person
• Organization
• Catalysts
• Relationships
Data Product Examples
Core Data Objects
Secondary Objects
• Cohorts/Collections
• Growth Intelligence
• Assessments

Sharing the Data
• Distributed Data Products
• Domain boundaries
• Process/Application domains apply
their use-cases
• Domains may use sub-sets or
combinations
• Derived Data Products
Conceptual Architecture

Examples
• Campaign Data
• Event Sourcing

Connecting to the Data Mesh
Sharing the Data Product
• Governed data available
• Options for Access
Ø Download with ETL or ELT
Ø Kafka
• Both have complications
Ø Manual processes
Ø Lack of consuming process
Ø Skillsets not aligned

Enter Kafka Ecosystem
Data Mesh Platform Using Kafka
• Kafka is perfect for one to many
• Event streams/batches provide a means keeping the consuming
domains in sync with the data product
• Kafka Connect is perfect for turning datastores into event streams
• Kafka Connect is perfect for sinking the streams into a datastore
• KsqlDB is perfect for selecting subsets of data or combining streams to
shape the data

Kafka Connect
Building the Mesh
• Connect Data Product
Ø S3 Source Connector
• Connect Consumers
Ø JDBC Sinks
Ø ES Sink

Kafka Connect
S3 Source Connector
• S3 connection
• Policies
Ø Polling
Ø Subdirectories
• JSON = more approachable
* Mario Molina

Kafka Connect
JDBC Sink Connector
• DB Connection
• Dealing with Schema
Ø Table.name.format
Ø Auto.create and evolve
• Single Message Transform
Ø Inject timestamp

Kafka Connect
ES Sink Connector
• Uses REST client
• Single Message Transform
Ø Document id
Ø Index name

Implementation:
Event Sourcing

• Bloated infrastructure
Ø Expensive footprint
Ø K8s is great, maybe too easy to spin up new instances
• Experimentation leaves dead instances and other bones
• Complicated data model and APIs
Revisiting Technical Landscape
New Concerns

• Simplify the overall footprint
Ø Fewer and simpler services
Ø Smaller clusters
Ø Fewer instances
• Improve database schema
• Rethink our APIs
Going Forward In Reverse
Rethinking Parts of the Platform

Event Sourcing
● Major changes without
interruption
Ø Tables restructure
Ø Elements combined or removed
● Existing streams via
Connectors
● Need additional JDBC sinks
Changing the Schema

More On Infrastructure
• Structured like other engineering “pods”
Ø Engineers
Ø Product
• Charter is to build the self-serve connectivity
• Responsible for Data Mesh infrastructure
• Create reference configs for all Kafka Connectors
• Make it super simple to define, add, and govern new data products
• One team responsible for connectivity and data movement
Creation of Data Mesh Engineering

Discovery
• Provide a catalog of all data products
Ø Documentation or manual catalogs are DOA
Ø Must be automatic
• All data products
• Communication channels
• Consuming domains
• Provide schemas
• Data ports
Keeping Track of All the Things

Deployment
• Kafka Configs Project
Ø Project for all Connectors, KsqlDB, and topic configurations
Ø Updates trigger deployment
• Uses REST Proxies to deploy updates
• Open Source?
• Kafka JMX Exporter to collect metrics used in Grafana
dashboards
Continuous Deployment

Closure
• Data first organization
• Data mesh paradigm helps us solves problems
• Kafka ecosystem is the core of the data mesh driving the platform
• Serving our application domains by using Kafka Connect and KsqlDB
• Future
Ø Improve self-serve
Ø Discovery App à If you have experienced this problem, let’s chat!
Summary

Acknowledgments
● Collin Shaafsma – Leadership
● Ken Griesi – Inspiration, guidance, and discovering the articles
Alex Lauderbaugh
All things Data and ghost writer
Scott Symmank
Technical lead
Hannah Manry
Amazing engineer
Mitch Ertle
Resident BA expert and principal consumer
Chicken
Mascot
* We’re Hiring

How a Data Mesh is Driving our Platform | Trey Hicks, Gloo

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a How a Data Mesh is Driving our Platform | Trey Hicks, Gloo

Semelhante a How a Data Mesh is Driving our Platform | Trey Hicks, Gloo (20)

Mais de HostedbyConfluent

Mais de HostedbyConfluent (20)

Último

Último (20)

How a Data Mesh is Driving our Platform | Trey Hicks, Gloo