Presentation from April, 2010 summarizing the principles of data-centric design and how they apply to DDS technology. Message-centric design is presented by way of contrast.
Data-Centric and Message-Centric System Architecture
1. The Real-Time
Middleware Experts
Data-Centric and Message-Centric
System Architecture
Making Design Explicit to Reduce Cost and Risk
Rick Warren, Principal Engineer rick.warren@rti.com
April 2010
From the beginning, the data stream is associated with the schema of the data that will be propagated on that stream. Your applications already have some expectations; if you express those to a data-centric infrastructure, it can help you. For example, you can use this schema to automatically transform data into other formats. (This is how the Routing Service and Web Integration Service work.) The infrastructure can also dissect your data to filter on content (for example “give me updates where x > 5”).
“Key” means “this field establishes the identity of a unique object.” Like the key in a relational database table. In DDS, can be any number of fields of any type(s).
New track you’ve never seen before. Notice that since type is already known, only need to send field values, not field names or types.
Update to a track you’ve already seen
Another new track – notice that the key is different
A track you’ve seen before has gone away
The first thing to notice is that the knowledge of your data model that was associated with the data stream in the data-centric technology disappears when you use a message-centric technology. That makes it much harder to develop a generic component such as the Web Integration Service, which much transform arbitrary data types to and from XML, downsample data by based on content, etc.
First message arrives. It has the same structure as we saw before, except without a known type definition, the type information must be embedded within the message itself, significantly increasing its size.
The second message arrives. It’s in a totally different format than the first! This one is just a blob of binary-encoded data.
Maybe the consuming application understands how to decode it and maybe not. Each application connected to the network will have expectations about the formats of the messages it receives. But a messaging infrastructure can’t support those expectations, so they have to be enforced by an organizational policy. I write up a Word document that describes how you should format your messages and email it to you, and you have to follow my instructions. If you make a mistake, we’ll have to debug it at integration time. In a data-centric approach, data type enforcement is built in: developers work with typed objects in their programming languages, errors are detected when the code is compiled before it’s ever deployed, and runtime mismatches that do occur are detected automatically by the middleware.
How do I describe a content-based filter on a binary blob? How do I transform it into another format? How do I map it into a database?
The third message arrives. It’s in yet a third format: a plain text string. Because the messaging system doesn’t have any concept of object lifecycle, each system has to define its own ad hoc system of sentinels: “create” messages, “dispose” messages, etc. More work, and it makes it much more difficult to leverage something you’ve built for one project on the next project. By comparison, Web Integration Service takes advantage of the built-in lifecycle support in DDS – you saw that when tracks were marked with “X” or “?”.
And without any knowledge of your objects or their lifecycle, a messaging infrastructure can only support qualities of service that make sense across an entire topic: for example time-to-live (“lifespan” in language of DDS).
Last point is most subtle. But lack of explicit data model makes integration more difficult.
If message definitions and formats are ad hoc and supported only in documentation, not by the infrastructure, problems arise:
More teams need to share this documentation and implement correctly. More chances for errors, more chances for change management to break down. Team members can call each other up and sort out these problems; multiple geographically and organizationally separated have a harder time.
Infrastructural components – persistence, logging, technology/protocol gateways, custom tooling – become coupled to system-specific message definitions, making them very brittle in the face of change. If one team manages everything, they can roll out changes to whole line at once. But incremental updates across multiple teams is hard.
This is how corporations roll out messaging solutions: one IT department from one company has total control over the deployment. Deployment may span several groups within the company, but almost never spans companies.