SlideShare uma empresa Scribd logo
1 de 75
Baixar para ler offline
Data Modelling at Scale
David Simons | @SwamWithTurtles
W H O A M I ?
• David Simons (@SwamWithTurtles)
• Data Architect at Ovo Energy
• Technical All-Rounder
• Kafka implementation & Cloud
integration at Citi
• Linking of the court and prison
services with the Ministry of Justice
• Organised our wedding seating plan
with Python.
A B R I E F ( B U T I M P O R TA N T ) A S I D E
Black Lives
Matter
(UK orgs, Worldwide Orgs)
Trans Rights are
Human Rights
(LGBT orgs, Trans specific orgs,
International orgs)
Open Source Projects that could use contributions
• Github Collection of Open Source Projects for Social Good
• Data Kind
• Kaggle (Data Science Volunteering that often has Social Good causes)
• Police Brutality Register
• Data Police Subreddit (Increasing Accessibility of Policing Data)
• Data for Black Lives
A G E N D A
• What is Data Modelling & Why
should I care?
• Data Modelling with Kafka
• Scaling your Data Model
W H AT I S D ATA M O D E L L I N G A N D
W H Y S H O U L D I C A R E ?
C H A P T E R O N E
Recommended Reading:
Data & Reality
— William Kent
C AU T I O N
P H I L O S O P H Y A H E A D
w h a t i s t h e
w o r l d ?
A C O M P L E X M E S S O F
H U M A N S A N D J O K E S
A N D E M O T I O N S A N D
F I C T I O N A N D I L L O G I C A N D
F A K E N E W S A N D T I M E A N D
E N T R O P Y A N D C O N F U S I O N A N D C O L O U R
A N D W O N D E R A N D FAT E A N D H A T R E D A N D L O V E A N D
WA R A N D N U A N C E A N D J E A L O U S Y A N D D O G S ( W H O A R E G O O D B O Y S )
A N D C AT S ( W H O A R E N O T ) A N D C O U N T R I E S A N D B O R D E R S A N D M U S I C A N D S O U N D A N D T I M E
A N D T I D E A N D D A R K N E S S A N D B O O K S A N D W O R D S A N D C O M P L E X I T I E S A N D R O L L E R C O A S T E R S A N D FAV O U R I T E F L AV O U R S O F I C E C R E A M A N D J O H N L E N N O N A N D G E N D E R ( O R M AY B E N O T. )
T H AT N O C O M P U T E R
C A N E V E R H O P E T O
C A P T U R E
d o e s o u r
s o f t w a r e n e e d t o
e x i s t i n t h e
w o r l d ?
M AY B E N O T T H E
W O R L D …
B U T A N Y S O F T WA R E
O F S U F F I C I E N T
C O M P L E X I T Y E X I S T S
W I T H I N A
{ b u s i n e s s | p r o b l e m |
w o r l d | d o m a i n |
i n d u s t r y }
w h a t s u b s e t o f t h e
w o r l d d o e s o u r
s o f t w a r e e x i s t s
i n ?
data model, n:
an agreed set of
assumptions and features
that distill the world (in
which our software
exists) into something we
can hope to capture
programatically
W H AT D O E S T H AT
M E A N F O R
T E C H N I C A L P E O P L E ?
B U T I ’ M N O T A P H I L O S O P H E R …
T Y P E S O F M O D E L
C O N C E P T U A L M O D E L
T H E R E A L W O R L D
L O G I C A L M O D E L
P H Y S I C A L M O D E L
• Expresses the subset of the domain in
terms of concepts and relations
independent of design concerns.
• Explicitly expresses how we have stored
our data in systems (column names, DB)
• Expresses the concepts in terms of data
structures or underlying technologies
More:
Usable, Low-Level,
Requires Technical
Expertise
More:
Accurate, Generic,
Conceptual
T H E Q U E S T I O N S
W E A S K …
• What kind of things do we deal with?
• For each kind of things, what aspects
of it do we care about? What are the
constraints of these aspects?
• When are two things the same thing?
• As something evolves, when does it
stop being the same thing?
• How do two things relate to each
other?
I S N ’ T T H I S E A S Y ?
D ATA M O D E L L I N G …
Are two of these the same thing?
Are two of these the same thing?
{
amount: 500,
currency: “USD”
}
{
amount: 500,
currency: “USD”
}
Are two of these the same thing?
Sugababes
2009
Sugababes
1998
Are two of these the same thing?
MBS
2011
Sugababes
1998
M A K E I T S T O P !
W H AT I S T H E
C O R R E C T D ATA
M O D E L ?
S I G N S O F A G O O D
D ATA M O D E L
• It is simple. It models what you need
and nothing else.
• It is built with your technology and
software system in mind
• It does not contradict the actual world
• It is extensible
• Non-technical people understand it.
Domain experts even chip in.
D ATA M O D E L L I N G W I T H K A F K A
C H A P T E R T W O
Recommended Reading:
Designing Event-Driven
Systems
— Ben Stopford
T Y P E S O F M O D E L
C O N C E P T U A L M O D E L
T H E R E A L W O R L D
L O G I C A L M O D E L
P H Y S I C A L M O D E L
• Expresses the subset of the domain in
terms of concepts and relations
independent of design concerns.
• Explicitly expresses how we have stored
our data in systems (column names, DB)
• Expresses the concepts in terms of data
structures or underlying technologies
More:
Usable, Low-Level,
Requires Technical
Expertise
More:
Accurate, Generic,
Conceptual
T Y P E S O F M O D E L
L O G I C A L M O D E L
• Expresses the concepts in terms of data
structures or underlying technologies
Kafka
???
SQL/RDBMS
What are the tables & for
which
entities? What are the
keys/
constraints? How do we
normalise everything?
Neo4j/Graph DBs
Graph modelling - what
are our entities? what are
their properties? how do
they relate (and what are
the relations’ properties)
(more details here)
Mongo/Document Stores
What are our entities?
Which ones get top-level
documents? What
document validations
should we enforce?
W H AT I S K A F K A ?
J U S T I N C A S E …
W H AT I S K A F K A ?
• Immutable log data store, with a
multicast/pub-sub message
interface
• It’s technically not these things but they may
be a helpful abstraction:
• Message Queue with a DB store
behind it
• Real-time Streaming with a catch-up
facility
• ESB without the rules and bloatedness
that make it bad.
W H Y D O E S T H I S S H A P E
O U R D ATA M O D E L S ?
• Easy Answer: It’s a different data
store and therefore our low-level
models will be shaped by its
implementation details
• But Kafka has taken off not despite
its implementation details but
because of it.
A N E W D ATA
M O D E L L I N G
PA R A D I G M
E V E N T S T R E A M I N G
E V E N T S T R E A M I N G
• Do not store the “state” of an object
as your primary model
• Instead store a sequence of events
that have transpired that will build up
that state.
M O T I VAT I O N
• You can construct this state in
different ways for different purposes
• Back-up/Restoration for free!
• You can reconstruct the state of the
system at a given moment in time
• Better support for distributed/highly
concurrent systems
S O M E FA M I L I A R
E X A M P L E S
• Git
• Bank Statements
• Blockchain
• Accounting Ledgers
W H AT D O E S T H I S
L O O K L I K E I N T H E
R E A L W O R L D ?
B U T …
E X A M P L E : D E C RY P T O
https://github.com/SwamWithTurtles/decrypto-be
https://github.com/SwamWithTurtles/decrypto-fe
• A multi-player board game.
• Players can see a subset of words and
must communicate them to their team
mates without being intercepted (by
being too literal).
• Challenge: Make a web-app version
of this game for people to play over
hangouts during lockdown.
E X A M P L E : D E C RY P T O
E X A M P L E : D E C RY P T O
E X A M P L E : D E C RY P T O
T Y P E S O F M O D E L
C O N C E P T U A L M O D E L
T H E R E A L W O R L D
L O G I C A L M O D E L
P H Y S I C A L M O D E L
• The list of events & their attributes
• The constraints of when they can occur
• The impact they have
• The classes/text keys for events
• The name and types of their attributes
?
W H AT G O O D I S A N
S T R E A M O F E V E N T S ?
B U T …
W H AT G O O D I S A N
S T R E A M O F E V E N T S ?
• (Some) Domain Experts
• Front-end Applications
• Human Reasoning
• Data Science/Analytics Teams
C A N I S T O R E M Y
D ATA E L S E W H E R E ?
• Yes!
• Pull into whatever high-fidelity data
store you want - e.g. Neo4j,
DynamoDB, ElasticSearch
Firebase……*
• * KSQLDB is attempting to solve this problem
• You can even use Kafka Connect but
be careful about the coupling of
physical data models.
P R A C T I C A L T I P S
G I V E M E S O M E …
S H O U L D I D O I T ?
• There is an overhead involved
(development, performance, resiliency). It
is not right for every team.
• Highly stateful services
• Highly concurrent service
• High throughput inputs
• Futureproofing
• Many different consumers of data.
[SPOILERS!]
S H O U L D I D O I T:
PA R T I I
• Event sourcing wants you to keep
events in an immutable log forever.
• GDPR frowns upon keeping personal
data forever.
H O W D O I D E F I N E
E V E N T S
• Events should be driven by domain
understanding from domain experts.
They should not be simple CRUD
statements (“UserAccountMapping
Created”)
• Events should correspond to actual,
definitive changes in state - not requests
to do so.
• Event Storming is the name given to a
domain modelling session in the event
sourcing world.
T H E T E C H N I C A L
B I T S
• Architectural Patterns: CQS/CQRS,
Event-Driven or Reactive
Programming
• Tooling to look into: RxJS (JS/front-
end), Akka (backend), Kafka, Event
Store (data layer)
• Recommended Videos :
• https://www.infoq.com/presentations/event-sourcing-jvm/
• https://www.infoq.com/presentations/event-driven-benefits-pitfalls/
• https://www.infoq.com/presentations/systems-event-driven/
S C A L I N G Y O U R D ATA M O D E L
C H A P T E R T H R E E
Recommended Reading:
Domain-Driven Design
— Eric Evans
W H AT A R E T H E
P R O B L E M S A S Y O U
G E T B I G G E R ?
W H AT A R E T H E
P R O B L E M S A S Y O U G E T
B I G G E R ?
• As your scope grows, the complexity
of your model increases
(“user”, “person” or “account” will often be the worst offender.)
• As your software grows,
discoverability, traceability and lineage
grows harder
• As your team grows, you will either
have many more meetings or will suffer
from breaking changes and poor
communication around your model.
S P L I T I T U P
T H E S O L U T I O N …
C AU T I O N
O P I N I O N P R E S E N T E D A S FA C T A H E A D
D ATA S C O P I N G U T O P I A
The rules
• Each piece of data must have exactly one
point of truth on your system.
• Models within other contexts can
duplicate concepts from other contexts as
long as they know who is the boss.
• Models within a concept should be
encapsulated and should not be impacted
by changes to other teams models*.
M A S T E RY O F D ATA
• Each piece of data must have exactly
one point of truth on your system
• Does everyone know who the
point of truth is? Is it defined or
documented?
• How do we ensure all changes
are registered in this system.
D E N O R M A L I S E D
M O D E L S
• Models within other contexts can —
perhaps even should — duplicate
concepts from other contexts in the
format they want. As long as they know
who is the boss.
• This means they must stay in sync
(including respecting of alterations)
• They should only get the data they
need but they should feel free to
transform it.
M O D E L
E N C A P S U L AT I O N
• Models within a concept should be
encapsulated and should not be
impacted by changes to other teams
models*.
• This includes validation - this
should only be applied where
data is mastered.
• *Possible exception: Changes to translation layers may
need to happen due to changes in physical model.
B O U N D E D C O N T E X T U T O P I A
Prisons
In-Court
Transcription
Court
Scheduling
Defendant
• Name
• CrimeType
• Availability
• Special Needs
Hearing
• Time
• Court Room
Inmate
• Name
• CrimeType
Person
• Name
Role
Type e.g. Defense Barrister,
Judge, Defendant
T H E P R O B L E M S … ( N O M O R E ! )
PROBLEM SOLVED BY…
• As your scope grows, the complexity of
your model increases
• Each piece of data must have exactly one
point of truth on your system.
• As your software grows,
discoverability, traceability and lineage
grows harder
• Models within other contexts can
duplicate concepts from other contexts
as long as they know who is the boss.
• As your team grows, you will either
have many more meetings or will suffer
from breaking changes and poor
communication around your model.
• Models within a concept should be
encapsulated and should not be
impacted by changes to other teams
models.
W H E R E A R E T H E
B O U N D A R I E S ?
B U T …
– E R I C E VA N S
“A bounded context delimits the applicability of a particular
model so that team members have a clear and shared understanding
of what has to be consistent and how it relates to other contexts.
Within that context, work to keep the model logically unified but
do not worry about applicability outside those bounds.”
“ B O U N D E D
C O N T E X T ” S M E L L S
• Too Big:
• Polysemes/False Cognates
• Duplicate Concepts
• Too Small:
• Data/Feature Envy
• Incomplete Model
W I T H I N Y O U R
C O N T E X T…
• See Chapters 1 and 2 of this talk.
T H I S M AY B E A G O O D
WAY T O S T R U C T U R E
Y O U R T E A M S
A N A S I D E …
T H I S I S M A D E M U C H
E A S I E R I N A N E V E N T
S O U R C I N G W O R L D .
I C L A I M T H AT …
I N T H E O L D S TAT E -
W O R L D
• How would we notify and push
changes?
• How do we translate information
between the different services?
• How do we decouple the physical
models?
I N S T E A D …
• All state changes are business-driven
events. Contexts can listen to these
and do what they want with them.
• New contexts can be spun up and
construct their state from past
events.
• Events are perfect candidates for
MQs or Kafka to unlock a push-based
system.
P U T T I N G I T A L L
T O G E T H E R
Split your domain model up into
“bounded contexts”.
This may incorporate multiple
teams or systems but should be a
reasonable size.
All stakeholders of this bounded
context should define the
boundary and understand what
the conceptual model (state,
entities, relationships). This should
be documented and
discoverable.
After that, they should event
storm event storm to drive
understanding of their data
model:
What (in the real world) can make the entities/
relationships in the conceptual data model change?
When can they happen? What info is needed to
action them?
Enter… Kafka.
These events should be published on Kafka. They represent
your team’s (internally) public interface and should be
documented/publicised.
This should be the source of truth.
You and any other team that cares about this event can now
use it to update their readable/high-fidelity state (e.g. RDBMS,
Elastic, Neo4j).
And it ended happily ever after.
I N S U M M A RY
• Data Modelling is the crystallisation of the assumptions we have made
about the real world within our domain. It is an imprecise science, but a
good model will allow frictionless progress.
• Event Sourcing asks what if we build our domain model around state
changes instead of state. Kafka is a great backbone for this kind of
architecture that is reliable, and futureproof and highly scalable.
• As we scale up, data modelling as a whole org is unsustainable. We break
our model into independently changeable sections that are called
bounded contexts. Kafka can acts well as a central nervous system.
Questions?David Simons | @SwamWithTurtles

Mais conteúdo relacionado

Mais procurados

Big Data and Small Devices: What will it do for us and to us
Big Data and Small Devices: What will it do for us and to usBig Data and Small Devices: What will it do for us and to us
Big Data and Small Devices: What will it do for us and to usJohn Tomizuka
 
The Ethics of Everybody Else
The Ethics of Everybody ElseThe Ethics of Everybody Else
The Ethics of Everybody ElseTyler Schnoebelen
 
From the right process to a solid cultural change
From the right process to a solid cultural changeFrom the right process to a solid cultural change
From the right process to a solid cultural changeFrancesco Zaia
 
Metadata and the Power of Pattern-Finding
Metadata and the Power of Pattern-FindingMetadata and the Power of Pattern-Finding
Metadata and the Power of Pattern-FindingDATAVERSITY
 
GW Intro to Digital Communications Class 1
GW Intro to Digital Communications Class 1GW Intro to Digital Communications Class 1
GW Intro to Digital Communications Class 1Geoff Livingston
 
#Winning at Instagram, or How to Learn to Stop Worrying and Love the Algorithm
#Winning at Instagram, or How to Learn to Stop Worrying and Love the Algorithm#Winning at Instagram, or How to Learn to Stop Worrying and Love the Algorithm
#Winning at Instagram, or How to Learn to Stop Worrying and Love the AlgorithmKate O'Neill
 

Mais procurados (6)

Big Data and Small Devices: What will it do for us and to us
Big Data and Small Devices: What will it do for us and to usBig Data and Small Devices: What will it do for us and to us
Big Data and Small Devices: What will it do for us and to us
 
The Ethics of Everybody Else
The Ethics of Everybody ElseThe Ethics of Everybody Else
The Ethics of Everybody Else
 
From the right process to a solid cultural change
From the right process to a solid cultural changeFrom the right process to a solid cultural change
From the right process to a solid cultural change
 
Metadata and the Power of Pattern-Finding
Metadata and the Power of Pattern-FindingMetadata and the Power of Pattern-Finding
Metadata and the Power of Pattern-Finding
 
GW Intro to Digital Communications Class 1
GW Intro to Digital Communications Class 1GW Intro to Digital Communications Class 1
GW Intro to Digital Communications Class 1
 
#Winning at Instagram, or How to Learn to Stop Worrying and Love the Algorithm
#Winning at Instagram, or How to Learn to Stop Worrying and Love the Algorithm#Winning at Instagram, or How to Learn to Stop Worrying and Love the Algorithm
#Winning at Instagram, or How to Learn to Stop Worrying and Love the Algorithm
 

Semelhante a Data Modelling at Scale

Choosing the right database
Choosing the right databaseChoosing the right database
Choosing the right databaseDavid Simons
 
From Content Strategy to Drupal Site Building - Connecting the dots
From Content Strategy to Drupal Site Building - Connecting the dotsFrom Content Strategy to Drupal Site Building - Connecting the dots
From Content Strategy to Drupal Site Building - Connecting the dotsRonald Ashri
 
From Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsFrom Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsRonald Ashri
 
Four Architectural Patterns
Four Architectural Patterns Four Architectural Patterns
Four Architectural Patterns David Simons
 
SharePoint Saturday Redmond - Building solutions with the future in mind
SharePoint Saturday Redmond - Building solutions with the future in mindSharePoint Saturday Redmond - Building solutions with the future in mind
SharePoint Saturday Redmond - Building solutions with the future in mindChris Johnson
 
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017Amazon Web Services
 
The Expanding Boundaries of CSS
The Expanding Boundaries of CSSThe Expanding Boundaries of CSS
The Expanding Boundaries of CSSchriseppstein
 
Data Scientist's Daily Life
Data Scientist's Daily LifeData Scientist's Daily Life
Data Scientist's Daily LifeBryan Yang
 
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...Massimiliano Crosato
 
The net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James BennettThe net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James BennettLeo Zhou
 
Gain Maximum Visibility into Your Applications
Gain Maximum Visibility into Your Applications Gain Maximum Visibility into Your Applications
Gain Maximum Visibility into Your Applications Amazon Web Services
 
How Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global TravelHow Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global TravelNeo4j
 
100% Visibility - Jason Yee - Codemotion Amsterdam 2018
100% Visibility - Jason Yee - Codemotion Amsterdam 2018100% Visibility - Jason Yee - Codemotion Amsterdam 2018
100% Visibility - Jason Yee - Codemotion Amsterdam 2018Codemotion
 
Serverless WordPress & next Interface of WordPress
Serverless WordPress & next Interface of WordPressServerless WordPress & next Interface of WordPress
Serverless WordPress & next Interface of WordPressHidetaka Okamoto
 
Reduce, Reuse, Refactor
Reduce, Reuse, RefactorReduce, Reuse, Refactor
Reduce, Reuse, Refactorcklosowski
 
Canary Deployments on Amazon EKS with Istio - SRV305 - Chicago AWS Summit
Canary Deployments on Amazon EKS with Istio - SRV305 - Chicago AWS SummitCanary Deployments on Amazon EKS with Istio - SRV305 - Chicago AWS Summit
Canary Deployments on Amazon EKS with Istio - SRV305 - Chicago AWS SummitAmazon Web Services
 
Decoupled APIs through Microservices
Decoupled APIs through MicroservicesDecoupled APIs through Microservices
Decoupled APIs through MicroservicesDavid Simons
 
Nuno Job - what's next for software - ANDdigital tech summit
Nuno Job - what's next for software - ANDdigital tech summitNuno Job - what's next for software - ANDdigital tech summit
Nuno Job - what's next for software - ANDdigital tech summitGreta Strolyte
 
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
BigDL Deep Learning in Apache Spark - AWS re:invent 2017BigDL Deep Learning in Apache Spark - AWS re:invent 2017
BigDL Deep Learning in Apache Spark - AWS re:invent 2017Dave Nielsen
 
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS SummitGain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS SummitAmazon Web Services
 

Semelhante a Data Modelling at Scale (20)

Choosing the right database
Choosing the right databaseChoosing the right database
Choosing the right database
 
From Content Strategy to Drupal Site Building - Connecting the dots
From Content Strategy to Drupal Site Building - Connecting the dotsFrom Content Strategy to Drupal Site Building - Connecting the dots
From Content Strategy to Drupal Site Building - Connecting the dots
 
From Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the DotsFrom Content Strategy to Drupal Site Building - Connecting the Dots
From Content Strategy to Drupal Site Building - Connecting the Dots
 
Four Architectural Patterns
Four Architectural Patterns Four Architectural Patterns
Four Architectural Patterns
 
SharePoint Saturday Redmond - Building solutions with the future in mind
SharePoint Saturday Redmond - Building solutions with the future in mindSharePoint Saturday Redmond - Building solutions with the future in mind
SharePoint Saturday Redmond - Building solutions with the future in mind
 
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
BigDL: Image Recognition Using Apache Spark with BigDL - MCL358 - re:Invent 2017
 
The Expanding Boundaries of CSS
The Expanding Boundaries of CSSThe Expanding Boundaries of CSS
The Expanding Boundaries of CSS
 
Data Scientist's Daily Life
Data Scientist's Daily LifeData Scientist's Daily Life
Data Scientist's Daily Life
 
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
Mirko Lorenz Data Driven Journalism Overview Seminar Ordine dei Giornalisti d...
 
The net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James BennettThe net is dark and full of terrors - James Bennett
The net is dark and full of terrors - James Bennett
 
Gain Maximum Visibility into Your Applications
Gain Maximum Visibility into Your Applications Gain Maximum Visibility into Your Applications
Gain Maximum Visibility into Your Applications
 
How Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global TravelHow Expedia’s Entity Graph Powers Global Travel
How Expedia’s Entity Graph Powers Global Travel
 
100% Visibility - Jason Yee - Codemotion Amsterdam 2018
100% Visibility - Jason Yee - Codemotion Amsterdam 2018100% Visibility - Jason Yee - Codemotion Amsterdam 2018
100% Visibility - Jason Yee - Codemotion Amsterdam 2018
 
Serverless WordPress & next Interface of WordPress
Serverless WordPress & next Interface of WordPressServerless WordPress & next Interface of WordPress
Serverless WordPress & next Interface of WordPress
 
Reduce, Reuse, Refactor
Reduce, Reuse, RefactorReduce, Reuse, Refactor
Reduce, Reuse, Refactor
 
Canary Deployments on Amazon EKS with Istio - SRV305 - Chicago AWS Summit
Canary Deployments on Amazon EKS with Istio - SRV305 - Chicago AWS SummitCanary Deployments on Amazon EKS with Istio - SRV305 - Chicago AWS Summit
Canary Deployments on Amazon EKS with Istio - SRV305 - Chicago AWS Summit
 
Decoupled APIs through Microservices
Decoupled APIs through MicroservicesDecoupled APIs through Microservices
Decoupled APIs through Microservices
 
Nuno Job - what's next for software - ANDdigital tech summit
Nuno Job - what's next for software - ANDdigital tech summitNuno Job - what's next for software - ANDdigital tech summit
Nuno Job - what's next for software - ANDdigital tech summit
 
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
BigDL Deep Learning in Apache Spark - AWS re:invent 2017BigDL Deep Learning in Apache Spark - AWS re:invent 2017
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
 
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS SummitGain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
Gain Maximum Visibility into Your Applications - DEM03 - Chicago AWS Summit
 

Mais de David Simons

Non-Functional Requirements
Non-Functional RequirementsNon-Functional Requirements
Non-Functional RequirementsDavid Simons
 
Build Tools & Maven
Build Tools & MavenBuild Tools & Maven
Build Tools & MavenDavid Simons
 
Graph theory in Practise
Graph theory in PractiseGraph theory in Practise
Graph theory in PractiseDavid Simons
 
Decoupled APIs through microservices
Decoupled APIs through microservicesDecoupled APIs through microservices
Decoupled APIs through microservicesDavid Simons
 
TDD: What is it good for?
TDD: What is it good for?TDD: What is it good for?
TDD: What is it good for?David Simons
 
Domain Driven Design: A Precis
Domain Driven Design: A PrecisDomain Driven Design: A Precis
Domain Driven Design: A PrecisDavid Simons
 
10 d bs in 30 minutes
10 d bs in 30 minutes10 d bs in 30 minutes
10 d bs in 30 minutesDavid Simons
 
Using Clojure to Marry Neo4j and Open Democracy
Using Clojure to Marry Neo4j and Open DemocracyUsing Clojure to Marry Neo4j and Open Democracy
Using Clojure to Marry Neo4j and Open DemocracyDavid Simons
 
Exploring Election Results with Neo4J
Exploring Election Results with Neo4JExploring Election Results with Neo4J
Exploring Election Results with Neo4JDavid Simons
 

Mais de David Simons (10)

Non-Functional Requirements
Non-Functional RequirementsNon-Functional Requirements
Non-Functional Requirements
 
Build Tools & Maven
Build Tools & MavenBuild Tools & Maven
Build Tools & Maven
 
Graph Modelling
Graph ModellingGraph Modelling
Graph Modelling
 
Graph theory in Practise
Graph theory in PractiseGraph theory in Practise
Graph theory in Practise
 
Decoupled APIs through microservices
Decoupled APIs through microservicesDecoupled APIs through microservices
Decoupled APIs through microservices
 
TDD: What is it good for?
TDD: What is it good for?TDD: What is it good for?
TDD: What is it good for?
 
Domain Driven Design: A Precis
Domain Driven Design: A PrecisDomain Driven Design: A Precis
Domain Driven Design: A Precis
 
10 d bs in 30 minutes
10 d bs in 30 minutes10 d bs in 30 minutes
10 d bs in 30 minutes
 
Using Clojure to Marry Neo4j and Open Democracy
Using Clojure to Marry Neo4j and Open DemocracyUsing Clojure to Marry Neo4j and Open Democracy
Using Clojure to Marry Neo4j and Open Democracy
 
Exploring Election Results with Neo4J
Exploring Election Results with Neo4JExploring Election Results with Neo4J
Exploring Election Results with Neo4J
 

Último

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Último (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Data Modelling at Scale

  • 1. Data Modelling at Scale David Simons | @SwamWithTurtles
  • 2. W H O A M I ? • David Simons (@SwamWithTurtles) • Data Architect at Ovo Energy • Technical All-Rounder • Kafka implementation & Cloud integration at Citi • Linking of the court and prison services with the Ministry of Justice • Organised our wedding seating plan with Python.
  • 3. A B R I E F ( B U T I M P O R TA N T ) A S I D E Black Lives Matter (UK orgs, Worldwide Orgs) Trans Rights are Human Rights (LGBT orgs, Trans specific orgs, International orgs) Open Source Projects that could use contributions • Github Collection of Open Source Projects for Social Good • Data Kind • Kaggle (Data Science Volunteering that often has Social Good causes) • Police Brutality Register • Data Police Subreddit (Increasing Accessibility of Policing Data) • Data for Black Lives
  • 4. A G E N D A • What is Data Modelling & Why should I care? • Data Modelling with Kafka • Scaling your Data Model
  • 5. W H AT I S D ATA M O D E L L I N G A N D W H Y S H O U L D I C A R E ? C H A P T E R O N E Recommended Reading: Data & Reality — William Kent
  • 6. C AU T I O N P H I L O S O P H Y A H E A D
  • 7. w h a t i s t h e w o r l d ?
  • 8. A C O M P L E X M E S S O F H U M A N S A N D J O K E S A N D E M O T I O N S A N D F I C T I O N A N D I L L O G I C A N D F A K E N E W S A N D T I M E A N D E N T R O P Y A N D C O N F U S I O N A N D C O L O U R A N D W O N D E R A N D FAT E A N D H A T R E D A N D L O V E A N D WA R A N D N U A N C E A N D J E A L O U S Y A N D D O G S ( W H O A R E G O O D B O Y S ) A N D C AT S ( W H O A R E N O T ) A N D C O U N T R I E S A N D B O R D E R S A N D M U S I C A N D S O U N D A N D T I M E A N D T I D E A N D D A R K N E S S A N D B O O K S A N D W O R D S A N D C O M P L E X I T I E S A N D R O L L E R C O A S T E R S A N D FAV O U R I T E F L AV O U R S O F I C E C R E A M A N D J O H N L E N N O N A N D G E N D E R ( O R M AY B E N O T. ) T H AT N O C O M P U T E R C A N E V E R H O P E T O C A P T U R E
  • 9. d o e s o u r s o f t w a r e n e e d t o e x i s t i n t h e w o r l d ?
  • 10. M AY B E N O T T H E W O R L D … B U T A N Y S O F T WA R E O F S U F F I C I E N T C O M P L E X I T Y E X I S T S W I T H I N A { b u s i n e s s | p r o b l e m | w o r l d | d o m a i n | i n d u s t r y }
  • 11. w h a t s u b s e t o f t h e w o r l d d o e s o u r s o f t w a r e e x i s t s i n ?
  • 12. data model, n: an agreed set of assumptions and features that distill the world (in which our software exists) into something we can hope to capture programatically
  • 13. W H AT D O E S T H AT M E A N F O R T E C H N I C A L P E O P L E ? B U T I ’ M N O T A P H I L O S O P H E R …
  • 14. T Y P E S O F M O D E L C O N C E P T U A L M O D E L T H E R E A L W O R L D L O G I C A L M O D E L P H Y S I C A L M O D E L • Expresses the subset of the domain in terms of concepts and relations independent of design concerns. • Explicitly expresses how we have stored our data in systems (column names, DB) • Expresses the concepts in terms of data structures or underlying technologies More: Usable, Low-Level, Requires Technical Expertise More: Accurate, Generic, Conceptual
  • 15. T H E Q U E S T I O N S W E A S K … • What kind of things do we deal with? • For each kind of things, what aspects of it do we care about? What are the constraints of these aspects? • When are two things the same thing? • As something evolves, when does it stop being the same thing? • How do two things relate to each other?
  • 16. I S N ’ T T H I S E A S Y ? D ATA M O D E L L I N G …
  • 17. Are two of these the same thing?
  • 18. Are two of these the same thing? { amount: 500, currency: “USD” } { amount: 500, currency: “USD” }
  • 19. Are two of these the same thing? Sugababes 2009 Sugababes 1998
  • 20. Are two of these the same thing? MBS 2011 Sugababes 1998
  • 21. M A K E I T S T O P !
  • 22. W H AT I S T H E C O R R E C T D ATA M O D E L ?
  • 23. S I G N S O F A G O O D D ATA M O D E L • It is simple. It models what you need and nothing else. • It is built with your technology and software system in mind • It does not contradict the actual world • It is extensible • Non-technical people understand it. Domain experts even chip in.
  • 24. D ATA M O D E L L I N G W I T H K A F K A C H A P T E R T W O Recommended Reading: Designing Event-Driven Systems — Ben Stopford
  • 25. T Y P E S O F M O D E L C O N C E P T U A L M O D E L T H E R E A L W O R L D L O G I C A L M O D E L P H Y S I C A L M O D E L • Expresses the subset of the domain in terms of concepts and relations independent of design concerns. • Explicitly expresses how we have stored our data in systems (column names, DB) • Expresses the concepts in terms of data structures or underlying technologies More: Usable, Low-Level, Requires Technical Expertise More: Accurate, Generic, Conceptual
  • 26. T Y P E S O F M O D E L L O G I C A L M O D E L • Expresses the concepts in terms of data structures or underlying technologies Kafka ??? SQL/RDBMS What are the tables & for which entities? What are the keys/ constraints? How do we normalise everything? Neo4j/Graph DBs Graph modelling - what are our entities? what are their properties? how do they relate (and what are the relations’ properties) (more details here) Mongo/Document Stores What are our entities? Which ones get top-level documents? What document validations should we enforce?
  • 27. W H AT I S K A F K A ? J U S T I N C A S E …
  • 28. W H AT I S K A F K A ? • Immutable log data store, with a multicast/pub-sub message interface • It’s technically not these things but they may be a helpful abstraction: • Message Queue with a DB store behind it • Real-time Streaming with a catch-up facility • ESB without the rules and bloatedness that make it bad.
  • 29. W H Y D O E S T H I S S H A P E O U R D ATA M O D E L S ? • Easy Answer: It’s a different data store and therefore our low-level models will be shaped by its implementation details • But Kafka has taken off not despite its implementation details but because of it.
  • 30. A N E W D ATA M O D E L L I N G PA R A D I G M E V E N T S T R E A M I N G
  • 31. E V E N T S T R E A M I N G • Do not store the “state” of an object as your primary model • Instead store a sequence of events that have transpired that will build up that state.
  • 32. M O T I VAT I O N • You can construct this state in different ways for different purposes • Back-up/Restoration for free! • You can reconstruct the state of the system at a given moment in time • Better support for distributed/highly concurrent systems
  • 33. S O M E FA M I L I A R E X A M P L E S • Git • Bank Statements • Blockchain • Accounting Ledgers
  • 34. W H AT D O E S T H I S L O O K L I K E I N T H E R E A L W O R L D ? B U T …
  • 35. E X A M P L E : D E C RY P T O https://github.com/SwamWithTurtles/decrypto-be https://github.com/SwamWithTurtles/decrypto-fe • A multi-player board game. • Players can see a subset of words and must communicate them to their team mates without being intercepted (by being too literal). • Challenge: Make a web-app version of this game for people to play over hangouts during lockdown.
  • 36. E X A M P L E : D E C RY P T O
  • 37. E X A M P L E : D E C RY P T O
  • 38. E X A M P L E : D E C RY P T O
  • 39. T Y P E S O F M O D E L C O N C E P T U A L M O D E L T H E R E A L W O R L D L O G I C A L M O D E L P H Y S I C A L M O D E L • The list of events & their attributes • The constraints of when they can occur • The impact they have • The classes/text keys for events • The name and types of their attributes ?
  • 40. W H AT G O O D I S A N S T R E A M O F E V E N T S ? B U T …
  • 41. W H AT G O O D I S A N S T R E A M O F E V E N T S ? • (Some) Domain Experts • Front-end Applications • Human Reasoning • Data Science/Analytics Teams
  • 42. C A N I S T O R E M Y D ATA E L S E W H E R E ? • Yes! • Pull into whatever high-fidelity data store you want - e.g. Neo4j, DynamoDB, ElasticSearch Firebase……* • * KSQLDB is attempting to solve this problem • You can even use Kafka Connect but be careful about the coupling of physical data models.
  • 43. P R A C T I C A L T I P S G I V E M E S O M E …
  • 44. S H O U L D I D O I T ? • There is an overhead involved (development, performance, resiliency). It is not right for every team. • Highly stateful services • Highly concurrent service • High throughput inputs • Futureproofing • Many different consumers of data. [SPOILERS!]
  • 45. S H O U L D I D O I T: PA R T I I • Event sourcing wants you to keep events in an immutable log forever. • GDPR frowns upon keeping personal data forever.
  • 46. H O W D O I D E F I N E E V E N T S • Events should be driven by domain understanding from domain experts. They should not be simple CRUD statements (“UserAccountMapping Created”) • Events should correspond to actual, definitive changes in state - not requests to do so. • Event Storming is the name given to a domain modelling session in the event sourcing world.
  • 47. T H E T E C H N I C A L B I T S • Architectural Patterns: CQS/CQRS, Event-Driven or Reactive Programming • Tooling to look into: RxJS (JS/front- end), Akka (backend), Kafka, Event Store (data layer) • Recommended Videos : • https://www.infoq.com/presentations/event-sourcing-jvm/ • https://www.infoq.com/presentations/event-driven-benefits-pitfalls/ • https://www.infoq.com/presentations/systems-event-driven/
  • 48. S C A L I N G Y O U R D ATA M O D E L C H A P T E R T H R E E Recommended Reading: Domain-Driven Design — Eric Evans
  • 49. W H AT A R E T H E P R O B L E M S A S Y O U G E T B I G G E R ?
  • 50. W H AT A R E T H E P R O B L E M S A S Y O U G E T B I G G E R ? • As your scope grows, the complexity of your model increases (“user”, “person” or “account” will often be the worst offender.) • As your software grows, discoverability, traceability and lineage grows harder • As your team grows, you will either have many more meetings or will suffer from breaking changes and poor communication around your model.
  • 51. S P L I T I T U P T H E S O L U T I O N …
  • 52. C AU T I O N O P I N I O N P R E S E N T E D A S FA C T A H E A D
  • 53. D ATA S C O P I N G U T O P I A The rules • Each piece of data must have exactly one point of truth on your system. • Models within other contexts can duplicate concepts from other contexts as long as they know who is the boss. • Models within a concept should be encapsulated and should not be impacted by changes to other teams models*.
  • 54. M A S T E RY O F D ATA • Each piece of data must have exactly one point of truth on your system • Does everyone know who the point of truth is? Is it defined or documented? • How do we ensure all changes are registered in this system.
  • 55. D E N O R M A L I S E D M O D E L S • Models within other contexts can — perhaps even should — duplicate concepts from other contexts in the format they want. As long as they know who is the boss. • This means they must stay in sync (including respecting of alterations) • They should only get the data they need but they should feel free to transform it.
  • 56. M O D E L E N C A P S U L AT I O N • Models within a concept should be encapsulated and should not be impacted by changes to other teams models*. • This includes validation - this should only be applied where data is mastered. • *Possible exception: Changes to translation layers may need to happen due to changes in physical model.
  • 57. B O U N D E D C O N T E X T U T O P I A Prisons In-Court Transcription Court Scheduling Defendant • Name • CrimeType • Availability • Special Needs Hearing • Time • Court Room Inmate • Name • CrimeType Person • Name Role Type e.g. Defense Barrister, Judge, Defendant
  • 58. T H E P R O B L E M S … ( N O M O R E ! ) PROBLEM SOLVED BY… • As your scope grows, the complexity of your model increases • Each piece of data must have exactly one point of truth on your system. • As your software grows, discoverability, traceability and lineage grows harder • Models within other contexts can duplicate concepts from other contexts as long as they know who is the boss. • As your team grows, you will either have many more meetings or will suffer from breaking changes and poor communication around your model. • Models within a concept should be encapsulated and should not be impacted by changes to other teams models.
  • 59. W H E R E A R E T H E B O U N D A R I E S ? B U T …
  • 60. – E R I C E VA N S “A bounded context delimits the applicability of a particular model so that team members have a clear and shared understanding of what has to be consistent and how it relates to other contexts. Within that context, work to keep the model logically unified but do not worry about applicability outside those bounds.”
  • 61. “ B O U N D E D C O N T E X T ” S M E L L S • Too Big: • Polysemes/False Cognates • Duplicate Concepts • Too Small: • Data/Feature Envy • Incomplete Model
  • 62. W I T H I N Y O U R C O N T E X T… • See Chapters 1 and 2 of this talk.
  • 63. T H I S M AY B E A G O O D WAY T O S T R U C T U R E Y O U R T E A M S A N A S I D E …
  • 64. T H I S I S M A D E M U C H E A S I E R I N A N E V E N T S O U R C I N G W O R L D . I C L A I M T H AT …
  • 65. I N T H E O L D S TAT E - W O R L D • How would we notify and push changes? • How do we translate information between the different services? • How do we decouple the physical models?
  • 66. I N S T E A D … • All state changes are business-driven events. Contexts can listen to these and do what they want with them. • New contexts can be spun up and construct their state from past events. • Events are perfect candidates for MQs or Kafka to unlock a push-based system.
  • 67. P U T T I N G I T A L L T O G E T H E R
  • 68. Split your domain model up into “bounded contexts”. This may incorporate multiple teams or systems but should be a reasonable size.
  • 69. All stakeholders of this bounded context should define the boundary and understand what the conceptual model (state, entities, relationships). This should be documented and discoverable.
  • 70. After that, they should event storm event storm to drive understanding of their data model: What (in the real world) can make the entities/ relationships in the conceptual data model change? When can they happen? What info is needed to action them?
  • 71. Enter… Kafka. These events should be published on Kafka. They represent your team’s (internally) public interface and should be documented/publicised. This should be the source of truth.
  • 72. You and any other team that cares about this event can now use it to update their readable/high-fidelity state (e.g. RDBMS, Elastic, Neo4j).
  • 73. And it ended happily ever after.
  • 74. I N S U M M A RY • Data Modelling is the crystallisation of the assumptions we have made about the real world within our domain. It is an imprecise science, but a good model will allow frictionless progress. • Event Sourcing asks what if we build our domain model around state changes instead of state. Kafka is a great backbone for this kind of architecture that is reliable, and futureproof and highly scalable. • As we scale up, data modelling as a whole org is unsustainable. We break our model into independently changeable sections that are called bounded contexts. Kafka can acts well as a central nervous system.
  • 75. Questions?David Simons | @SwamWithTurtles