How Kroger embraced a "schema first" philosophy in building real-time data pipelines (Rob Hoeting, Rob Hammonds& Lauren McDonald, Kroger ) Kafka Summit SF 2019
Early attempts at real-time business event streaming at Kroger was based on JSON formatted events. Modifications to the event formats occasionally broke downstream consumers, causing costly downtime. In the course of reimagining what an industrial strength streaming platform would look like, we decided to focus heavily on schema lifecycle and management as a foundation. The schema registry is a great service, but it's only one part of the schema lifecycle management process. Here are the core principles around schema management: (1) Event schema are expressed in Avro (2) New versions will be fully compatible with older versions (3) Event producers create, manage, and fully document event schemas (4) Avro Schemas are managed in git and represent the source of truth (5) Complex schemas can be broken into smaller reusable component schemas and referenced in larger schemas The CI/CD Build process, in conjunction with customized gradle plugins, perform the following: (1) Constructs the full event schemas from components into larger registerable schemas (2) Generates Java source code based on the event schemas (3) Checks compatibility with prior registered versions (4) Registers the new/updated version in the schema registry (5) Publishes generated JAR File into artifactory for producers & consumers (6) Other Source Code Generation (Future) (7) Publishes the schema into other metadata tools to help make them more discoverable (future)
2024: Domino Containers - The Next Step. News from the Domino Container commu...
How Kroger embraced a "schema first" philosophy in building real-time data pipelines (Rob Hoeting, Rob Hammonds& Lauren McDonald, Kroger ) Kafka Summit SF 2019
1. 1
How Kroger Embraced a
“Schema First”
Philosophy in Building
Real-time Data Pipelines
5. This is just a data
swamp.
My report just broke!
How do I use the data?
Where do I find the data?
I want this new business
feature by Friday.
Let's use AI and streaming
analytics to solve all our
problems!
I’m spending too
much time spraying
data to everyone!
I had to roll back my
prod release because
my data change broke
someone!
14. My pipeline
isn’t
registering the
schemas
I register
schemas but
my jar isn’t
being
published
I’m failing
compatibility
but I HAVE to
change this
field name
The bootstrap
script isn’t
working for
me!
16. Schema Editor
Event Schema Lifecycle
V1
V2
V2
TIME
{
"type": "record",
"name": "Store",
"doc": "This is an avro schema for a store",
"namespace": "com.kroger",
"fields": [
{
"name": "StoreName",
"type": "string"
},
{
"name": "SID",
"type": "int"
}
]
}
StoreCreated.avsc – v1
{
"type": "record",
"name": "Store",
"doc": "This is an avro schema for a store",
"namespace": "com.kroger",
"fields": [
{
"name": "StoreName",
"type": "string"
},
{
"name": "SID",
"type": "int"
},
{
"name": "Location",
"type": "string",
"default": "Cincinnati"
}
]
}
StoreCreated.avsc – v2
PRODUCTION
Schema Registry
Full Compatibility
STAGE
Schema Registry
Full Compatibility
DEVELOPMENT
Schema Registry
Full Compatibility
17. Schema Editor
Event Schema Lifecycle
V1
V2
V2
TIME
{
"type": "record",
"name": "Store",
"doc": "This is an avro schema for a store",
"namespace": "com.kroger",
"fields": [
{
"name": "StoreName",
"type": "string"
},
{
"name": "SID",
"type": "int"
}
]
}
StoreCreated.avsc – v1
{
"type": "record",
"name": "Store",
"doc": "This is an avro schema for a store",
"namespace": "com.kroger",
"fields": [
{
"name": "StoreName",
"type": "string"
},
{
"name": "SID",
"type": "int"
},
{
"name": "Location",
"type": "string",
"default": "Cincinnati"
}
]
}
StoreCreated.avsc – v2
V3
{
"type": "record",
"name": "Store",
"doc": "This is an avro schema for a store",
"namespace": "com.kroger",
"fields": [
{
"name": "StoreName",
"type": "string"
},
{
"name": "StoreID",
"type": "int"
},
{
"name": "Location",
"type": "string",
"default": "Cincinnati"
}
]
}
StoreCreated.avsc – v3
Changing the name of an
attribute breaks compatibility!
PRODUCTION
Schema Registry
Full Compatibility
STAGE
Schema Registry
Full Compatibility
DEVELOPMENT
Schema Registry
Full Compatibility
18. I accidently named a
field wrong. Seems
rather harsh, don’t you
think?
I’m blocked.
Hey Robs and Lauren,
can you delete this
schema and clear my
topic?
19. Schema Editor
Event Schema Lifecycle
V1
V2
TIME
{
"type": "record",
"name": "Store",
"doc": "This is an avro schema for a store",
"namespace": "com.kroger",
"fields": [
{
"name": "StoreName",
"type": "string"
},
{
"name": "StoreID",
"type": "int"
},
{
"name": "Location",
"type": ”string"
}
]
}
V2 V2 V3
StoreCreated.avsc – v2
{
"type": "record",
"name": "Store",
"doc": "This is an avro schema for a store",
"namespace": "com.kroger",
"fields": [
{
"name": "StoreName",
"type": "string"
},
{
"name": "StoreID",
"type": "int"
},
{
"name": "Location",
"type": "com.kroger.commons.Location"
}
]
}
StoreCreated.avsc – v3
You can’t change the type of an
attribute!
PRODUCTION
Schema Registry
Full Compatibility
STAGE
Schema Registry
Full Compatibility
DEVELOPMENT
Schema Registry
Full Compatibility
20. How do we protect clients
in production, but improve
the DevX of schema
development?
21. Schema Editor
{
"type": "record",
"name": "Store",
"doc": "This is an avro schema for a store",
"namespace": "com.kroger",
"fields": [{
"name": "StoreName",
"type": "string"
}, {
"name": "StoreID",
"type": "int"
},{
"name": "Location",
"type": ”com.kroger.commons.Location"
} }
StoreCreated.avsc – v2.0.0
TIME
Major Version Compatibility (MVC)
2.0.0
PRODUCTION
Schema Registry
Full Transitive
Compatibility
STAGE
Schema Registry
No Compatibility
DEVELOPMENT
Schema Registry
No Compatibility
2.0.0
2.0.0
{
"type": "record",
"name": "Store",
"doc": "This is an avro schema for a store",
"namespace": "com.kroger",
"fields": [{
"name": "StoreName",
"type": "string"
}, {
"name": "StoreID",
"type": "int"
},{
"name": "Location",
"type": "com.kroger.commons.Location"
},{
"name": "StoreManager",
"type": "com.kroger.commons.Person",
"doc": "Manager of the store"
} }
StoreCreated.avsc – v2.0.1
2.0.1
When adding an attribute, make
it nullable and add a default!
22. Schema Editor
TIME
2.0.0
2.0.1
2.1.0
2.0.0
2.0.0
2.1.0
Major Version Compatibility (MVC)
{
"type": "record",
"name": "Store",
"doc": "This is an avro schema for a store",
"namespace": "com.kroger",
"fields": [{
"name": "StoreName",
"type": "string"
}, {
"name": "StoreID",
"type": "int"
},{
"name": "Location",
"type": "com.kroger.commons.Location"
},{
"name": "StoreManager",
"type": ["com.kroger.commons.Person",
"null"],
"default": "null",
"doc": "Manager of the store"
} }
StoreCreated.avsc – v2.0.1
PRODUCTION
Schema Registry
Full Transitive
Compatibility
STAGE
Schema Registry
No Compatibility
DEVELOPMENT
Schema Registry
No Compatibility
23. Schema Editor
TIME
2.0.0
2.0.1
2.1.0
2.0.0
2.0.0
2.1.0 2.1.1
Major Version Compatibility (MVC)
{
"type": "record",
"name": "Store",
"doc": "This is an avro schema for a store",
"namespace": "com.kroger",
"fields": [{
"name": "Store",
"type": "string"
}, {
"name": "StoreID",
"type": "int"
},{
"name": "Location",
"type": ”com.kroger.commons.Location"
},{
"name": "StoreManager",
"type": ["com.kroger.commons.Person",
"null"],
"default": "null",
"doc": "Manager of the store"
} }
StoreCreated.avsc – v2.1.1
You can’t change the name of an
attribute, but you can add an alias!
PRODUCTION
Schema Registry
Full Transitive
Compatibility
STAGE
Schema Registry
No Compatibility
DEVELOPMENT
Schema Registry
No Compatibility
24. Schema Editor
TIME
2.0.0
2.0.1
2.1.0
2.0.0
2.0.0
2.1.0
Major Version Compatibility (MVC)
{
"type": "record",
"name": "Store",
"doc": "This is an avro schema for a store",
"namespace": "com.kroger",
"fields": [{
"name": "StoreName",
"alias" : "Store",
"type": "string"
}, {
"name": "StoreID",
"type": "int"
},{
"name": "Location",
"type": " com.kroger.commons.Location"
},{
"name": "StoreManager",
"type": ["com.kroger.commons.Person",
"null"],
"default": "null",
"doc": "Manager of the store”
} }
StoreCreated.avsc – v2.1.1
2.1.1
PRODUCTION
Schema Registry
Full Transitive
Compatibility
STAGE
Schema Registry
No Compatibility
DEVELOPMENT
Schema Registry
No Compatibility
25. Schema Editor
TIME
2.0.0
2.0.1
2.1.0
2.0.0
2.0.0
2.1.0
Major Version Compatibility (MVC)
{
"type": "record",
"name": "Store",
"doc": "This is an avro schema for a store",
"namespace": "com.kroger",
"fields": [{
"name": "StoreName",
"alias": "Store",
"type": "string"
}, {
"name": "StoreID",
"type": "int"
},{
"name": "Location",
"type": "com.kroger.commons.Location"
},{
"name": "StoreManager",
"type": ["com.kroger.commons.Person",
"null"],
"default": "null",
"doc": "Manager of the store"
} }
StoreCreated.avsc – v2.1.1
2.1.1
PRODUCTION
Schema Registry
Full Transitive
Compatibility
STAGE
Schema Registry
No Compatibility
DEVELOPMENT
Schema Registry
No Compatibility
32. No problem, I just sent
you a 500 line Avro
schema file which has
everything you need.
Can you help me
understand all the
fields in those
inventory events?