The New York Times stores every piece of content it has ever published in Apache Kafka. It takes a log-based approach where all content is published as events to a central "Monolog" topic in Kafka. This log becomes the single source of truth, and all applications consume from the log. The content is stored in Kafka as Protobuf messages with a well-defined schema. A separate "Skinny Log" topic contains notifications that content was processed, to enable caching and track service level objectives. This log-based architecture aims to decouple producers and consumers of content and make the system more flexible and scalable.
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apache Kafka® Delivers a Single Source of Truth for The New York Times
1. The Source of Truth
2018-05-09
The New York Times
Why The New York Times
Stores Every Piece of Content
Ever Published
in Kafka
2. 1
Boerge Svingen was a founder of Fast Search &
Transfer (alltheweb.com, FAST ESP). He was later a
founder and CTO of Open AdExchange, doing
contextual advertising for online news. He is now
working on search and backend platforms at The
New York Times.
Boerge Svingen
Director of Engineering, The New York Times
3. 2
Housekeeping Items
" This session will last about an hour.
● This session will be recorded.
" You can submit your questions by entering them into the GoToWebinar panel.
" The last 10-15 minutes will consist of Q&A.
● The slides and recording will be available after the talk.
42. Why not PubSub or AWS SNS/SQS/
Kinesis?
For most use cases, Kafka is used as a
message broker, and the log is an
implementation detail.
43. Why not PubSub or AWS SNS/SQS/
Kinesis?
For this use case, the log is the point.
44. Why not PubSub or AWS SNS/SQS/
Kinesis?
Two requirements:
1. We most retain the log forever.
2. Since messages have causal relationships,
consumption must be ordered.
45. Why not PubSub or AWS SNS/SQS/
Kinesis?
Only Kafka gives us an
ordered log with infinite
retention.
46. Agenda
1. A little history
2. How things used to work
3. Log-based architectures
4. The schema
5. The Monolog
6. The Skinny Log
7. Some challenges
61. message Article {
PublicationProperties publicationProperties = 1; // Publication metadata
CreativeWork creativeWork = 2; // Editorial metadata
PromotionalProperties promotionalProperties = 3; // Promotional metadata
PrintInformation printInformation = 4; // Print metadata
Dateline dateline = 5; // Where reporting for the article occurred
int32 wordCount = 6; // The number of words of body text.
DocumentBlock body = 7; // The body of the article
}
81. The Skinny Log
Example: A list service consumes a publish
for an asset, updates the list, and posts a
notification saying that list has been updated.