Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2lbrffT.
Celia Kung talks about Brooklin – LinkedIn’s managed data streaming service that supports multiple pluggable sources and destinations, which can be data stores or messaging systems. She dives deeper into Brooklin’s architecture and use cases, as well as their future plans. Filmed at qconnewyork.com.
Celia Kung manages the data pipelines team at LinkedIn. Previously, she was the lead engineer for building Oracle change-data capture support for Brooklin, as well as a new Kafka mirroring solution that has fully replaced Kafka MirrorMaker at LinkedIn.
2. InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
linkedin-streams-brooklin/
3. Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
6. Nearline Applications
• Require near real-time response
• Thousands of applications at LinkedIn
○ E.g. Live search indices, Notifications
7. Nearline Applications
• Require continuous, low-latency access to data
○ Data could be spread across multiple database systems
• Need an easy way to move data to applications
○ App devs should focus on event processing and not on
data access
9. Building the Right Infrastructure
• Build separate, specialized
solutions to stream data from
and to each different system?
○ Slows down development
○ Hard to manage!
Microsoft
EventHubs
...
Streaming
System C
Streaming
System D
Streaming
System A
Streaming
System B
...
Nearline
Applications
10. Need a centralized, managed, and extensible service
to continuously deliver data in near real-time
12. Brooklin
• Streams are dynamically
provisioned and individually
configured
• Extensible: Plug-in support for
additional sources/destinations
• Streaming data pipeline service
• Propagates data from many source
types to many destination types
• Multitenant: Can run several
thousand streams simultaneously
20. Capturing Live Updates
Member DB
Updates
News Feed
Service
query
Notifications
Service
Standardization
Service
...
query
query
query
Search Indices
Service
21. Capturing Live Updates
Member DB
Updates
News Feed
Service
query
Notifications
Service
Standardization
Service
...
query
query
query
Search Indices
Service
22. Capturing Live Updates
Member DB
Updates
News Feed
Service
query
Notifications
Service
Standardization
Service
...
query
query
query
Search Indices
Service
23. • Isolation: Applications are decoupled
from the sources and don’t compete
for resources with online queries
• Applications can be at different
points in change timelines
• Brooklin can stream database
updates to a change stream
• Data processing applications
consume from change streams
Change Data Capture (CDC)
24. Change Data Capture (CDC)
Messaging System
News Feed
Service
Notifications
Service
Standardization
Service
Search Indices
Service
Member DB
Updates
26. ● Across…
○ cloud services
○ clusters
○ data centers
Stream Data from X to Y
27. Streaming Bridge
• Data pipe to move data between
different environments
• Enforce policy: Encryption,
Obfuscation, Data formats
28. ● Aggregating data from all data centers into a centralized place
● Moving data between LinkedIn and external cloud services (e.g. Azure)
● Brooklin has replaced Kafka MirrorMaker (KMM) at LinkedIn
○ Issues with KMM: didn’t scale well, difficult to operate and manage, poor
failure isolation
Mirroring Kafka Data
29. Use Brooklin to Mirror Kafka Data
DestinationsSources
Messaging systems
Microsoft
EventHubs
Messaging systems
Microsoft
EventHubs
Databases Databases
32. Brooklin Kafka Mirroring
● Optimized for stability and operability
● Manually pause and resume mirroring at every level
○ Entire pipeline, topic, topic-partition
● Can auto-pause partitions facing mirroring issues
○ Auto-resumes the partitions after a configurable duration
● Flow of messages from other partitions is unaffected
63. Current
• Consumers:
○ Espresso
○ Oracle
○ Kafka
○ EventHubs
○ Kinesis
• Producers:
○ Kafka
○ EventHubs
• APIs are standardized to support additional
sources and destinations
• Multitenant: Can power thousands of
datastreams across several source and
destination types
• Guarantees: At-least-once delivery, order is
maintained at partition level
• Kafka mirroring improvements: finer control
of pipelines (pause/auto-pause partitions),
improved latency with flushless-produce
mode
Sources & Destinations Features