Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments
Multi-cluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception. This session gives an overview of several scenarios that may require multi-cluster solutions and discusses real-world examples with their specific requirements and trade-offs, including disaster recovery, aggregation for analytics, cloud migration, mission-critical stretched deployments and global Kafka.
Key takeaways:
In many scenarios, one Kafka cluster is not enough. Understand different architectures and alternatives for multi-cluster deployments.
Zero data loss and high availability are two key requirements. Understand how to realize this, including trade-offs.
Learn about features and limitations of Kafka for multi cluster deployments
Global Kafka and mission-critical multi-cluster deployments with zero data loss and high availability became the normal, not an exception.
Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments
1. 1
Kai Waehner | Technology Evangelist, Confluent
contact@kai-waehner.de | LinkedIn | @KaiWaehner | www.confluent.io | www.kai-waehner.de
Architecture patterns for
distributed, hybrid, edge and global
Apache Kafka deployments
2. 2Abstract
Architecture patterns for distributed, hybrid, edge and global Apache Kafka deployments
Multi-cluster and cross-data center deployments of Apache Kafka have become the norm rather than an exception. This
session gives an overview of several scenarios that may require multi-cluster solutions and discusses real-world examples
with their specific requirements and trade-offs, including disaster recovery, aggregation for analytics, cloud migration,
mission-critical stretched deployments and global Kafka.
Key takeaways:
• In many scenarios, one Kafka cluster is not enough. Understand different architectures and alternatives for multi-cluster
deployments.
• Zero data loss and high availability are two key requirements. Understand how to realize this, including trade-offs.
• Learn about features and limitations of Kafka for multi cluster deployments
• Global Kafka and mission-critical multi-cluster deployments with zero data loss and high availability became the normal,
not an exception.
www.kai-waehner.de | @KaiWaehner
6. 6
The Beginning of a New Era
www.kai-waehner.de | @KaiWaehner
https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
The first use case: Log Analytics. This is why Kafka was created!
9. 9Apache Kafka (kafka.apache.org) includes
Kafka Connect and Kafka Streams
www.kai-waehner.de | @KaiWaehner
Kafka Streams
Your
app
sinksource
KafkaConnect
KafkaConnect
Kafka Cluster
10. 10A Streaming Platform
is the Underpinning of an Event-driven Architecture
www.kai-waehner.de | @KaiWaehner
Microservices
DBs
SaaS apps
Customer 360
Real-time fraud
detection
Data warehouse
Producers
Consumers
Database
change
Microservices
events
SaaS
data
Customer
experiences
Streams of real time events
Stream processing apps
Connectors
Connectors
Stream processing apps
11. 11
Apache Kafka at Scale at Tech Giants
www.kai-waehner.de | @KaiWaehner
> 7 trillion messages / day > 6 Petabytes / day
“You name it”
* Kafka Is not just used by tech giants
** Kafka is not just used for big data
12. 12
www.kai-waehner.de | @KaiWaehner
Improve
Customer
Experience
(CX)
Increase
Revenue
(make money)
Business
Value
Decrease
Costs
(save
money)
Core Business
Platform
Increase
Operational
Efficiency
Migrate to
Cloud
Mitigate Risk
(protect money)
Key Drivers
Strategic Objectives
(sample)
Fraud
Detection
IoT sensor
ingestion
Digital
replatforming/
Mainframe Offload
Connected Car: Navigation & improved
in-car experience: Audi
Customer 360
Simplifying Omni-channel Retail at
Scale: Target
Faster transactional
processing / analysis
incl. Machine Learning / AI
Mainframe Offload: RBC
Microservices
Architecture
Online Fraud Detection
Online Security
(syslog, log aggregation,
Splunk replacement)
Middleware
replacement
Regulatory
Digital
Transformation
Application Modernization: Multiple
Examples
Website / Core
Operations
(Central Nervous System)
The [Silicon Valley] Digital Natives;
LinkedIn, Netflix, Uber, Yelp...
Predictive Maintenance: Audi
Streaming Platform in a regulated
environment (e.g. Electronic Medical
Records): Celmatix
Real-time app
updates
Real Time Streaming Platform for
Communications and Beyond: Capital One
Developer Velocity - Building Stateful
Financial Applications with Kafka
Streams: Funding Circle
Detect Fraud & Prevent Fraud in Real
Time: PayPal
Kafka as a Service - A Tale of Security
and Multi-Tenancy: Apple
Example Use Cases
$↑
$↓
$
Example Case Studies
(of many)
15. 15
Why Multiple Kafka Clusters?
www.kai-waehner.de | @KaiWaehner
* Not a representative survey J
** Many DCs does NOT necessarily mean more than one Kafka Cluster
16. 16
Disaster Recovery – RPO and RTO
www.kai-waehner.de | @KaiWaehner
RPO = Recovery Point Objective
RTO = Recovery Time Objective
21. 21
A Stretched Kafka Cluster over 3DC
www.kai-waehner.de | @KaiWaehner
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Kafka Connect
Schema Registry
Producer
Consumer
Producer
Consumer
Kafka Connect
Schema Registry
DC1 DC2 DC3
High availability (Survives DC outage)
Zero data loss and zero downtime
Automatic client fail-over
Works well in cloud (3 AZs in 1 region)
Requires “good” latency (à DCs ”close” to each other)
Requires three DCs (Quorum / split brain)
Complex to configure and operate
23. 23
A Stretched Kafka Cluster over 2DC
www.kai-waehner.de | @KaiWaehner
High availability (Survives DC outage)
Zero data loss or zero downtime
Automatic client fail-over
Stopgap solution for on premise (if only 2 DCs available)
à 2.5 DC deployment as workaround
Requires “good” latency (à DCs ”close” to each other)
Quorum in 2 DCs not possible
Complex to configure and operate
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Kafka Connect
Schema Registry
Producer
Consumer
Producer
Consumer
Kafka Connect
Schema Registry
DC1 DC2
Kafka Broker
24. 24
A Single Kafka Cluster
www.kai-waehner.de | @KaiWaehner
Zookeeper
Kafka Broker
Schema Registry
OPC-UA
MQTT
PLC4X
KSQL
Grafana
Postgres
Kafka Connect
Simple setup
Works
Often used “at the Edge”
No high availability
26. 26
Independent Kafka Clusters
www.kai-waehner.de | @KaiWaehner
Total Independence
Owned by the project teams, central ops or SaaS
Different sizing, security, infrastructure
Related projects should run on the same Kafka cluster
Independent projects can run on the same Kafka cluster
• similar SLAs and requirements
• e.g. NOT Instant payment vs. log analytics vs. file transfer vs. video streaming
• ACLs / RBAC for fine-grained authentication and authorization
• throughput typically no issue (Confluent Cloud processes 1 Gigabyte / sec and more in one cluster)
• reduce overhead (operations, hardware, …)
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Schema Registry
Schema Registry
Producer Consumer
Kafka Connect
Kafka Connect
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Schema Registry
Schema Registry
Producer Consumer
Kafka Connect
Kafka Connect
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Schema Registry
Schema Registry
Producer Consumer
Kafka Connect
Kafka Connect
27. 27
Hybrid Integration of 2 Kafka Clusters
www.kai-waehner.de | @KaiWaehner
Hybrid integration
On premise and cloud or multi-cloud scenarios (due to technical, business or legal reasons)
Uni- or bi-directional
Know the best practices or get help
Know your SLAs and timelines
Choose the right (battle-tested?) tool
Works
Relatively easy to setup (some tools are complex / not up-to-date / not mature / not documented well)
Example: Replicate data from production to analytics cluster
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Schema Registry
Schema Registry
Producer Consumer
Kafka Connect
Kafka Connect
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Schema Registry
Schema Registry
Producer Consumer
Kafka Connect
Kafka Connect
MirrorMaker 1 / 2
Confluent Replicator
uReplicator (Uber)
Mirus (Salesforce)
Brooklin (LinkedIn)
Custom Replication
DC1 DC2
Streaming Replication
28. 28
Migration of Kafka Clusters
www.kai-waehner.de | @KaiWaehner
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Schema Registry
Schema Registry
Producer Consumer
Kafka Connect
Kafka Connect
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Schema Registry
Schema Registry
Producer Consumer
Kafka Connect
Kafka Connect
MirrorMaker 1 / 2
Confluent Replicator
uReplicator (Uber)
Mirus (Salesforce)
Brooklin (LinkedIn)
Custom Replication
DC1 DC2
Streaming Replication
Common migration scenarios
On premise à Cloud
Cloud A à Cloud B
Vendor 1 à Vendor 2
Self-Managed à SaaS
Migration steps:
1) Create new Kafka cluster
2) Producer / Consumer re-configuration
3) Shutdown of old Cluster
Know the best practices or get help
Know your SLAs and timelines
Choose the right (battle-tested?) tool
29. 29
Disaster Recovery with 2 Kafka Clusters
www.kai-waehner.de | @KaiWaehner
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Schema Registry
Schema Registry
Producer Consumer
Kafka Connect
Kafka Connect
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Schema Registry
Schema Registry
Producer Consumer
Kafka Connect
Kafka Connect
MirrorMaker 1 / 2
Confluent Replicator
uReplicator (Uber)
Mirus (Salesforce)
Brooklin (LinkedIn)
Custom Replication
DC1 DC2
Streaming Replication
Know the trade-offs!
If Kafka Cluster 1 is down, Kafka Cluster 2 is still live and running
Timestamp preservation
Offset translation
Manual client-failover / custom client code
Data loss in case of DC outage (asynchronous replication)
38. 38Replication Between Kafka Clusters over
Multiple Regions or Continents
www.kai-waehner.de | @KaiWaehner
Streaming replication works (MirrorMaker 2, Confluent Replicator)
Same challenges as in one region (data loss, custom code for fail-over, offset translation, etc.)
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Schema Registry
Schema Registry
Producer Consumer
Kafka Connect
Kafka Connect
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Schema Registry
Schema Registry
Producer Consumer
Kafka Connect
Kafka Connect
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Schema Registry
Schema Registry
Producer Consumer
Kafka Connect
Kafka Connect
China USA
Europe
39. 39
A Single Kafka Cluster over 3 Regions
with Multi-Region Replication
www.kai-waehner.de | @KaiWaehner
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Kafka Connect
Schema Registry
Producer
Consumer
Producer
Consumer
Kafka Connect
Schema Registry
US-East US-Central US-West
40. 40A Single Multi Region Kafka Cluster (MRC)
www.kai-waehner.de | @KaiWaehner
High availability (Survives region outage)
Zero data loss and zero downtime
Automatic client fail-over over regions
Works well in cloud and on premise
No external tools (like MirrorMaker) needed
Not part of Open Source Kafka à Build vs. Buy
Zookeeper Zookeeper Zookeeper
Kafka Broker Kafka Broker Kafka Broker
Kafka Connect
Schema Registry
Producer
Consumer
Producer
Consumer
Kafka Connect
Schema Registry
US-East US-Central US-West
How does this work?
Region-awareness
Synchronous or asynchronous replication per Topic
Follower-fetching
Regional topic locality
Replication rules
…
(Confluent Platform)
41. 41A Single Multi Region Kafka Cluster (MRC)
www.kai-waehner.de | @KaiWaehner
Broker
1
Broker
2
Broker
3
ZK1
Broker
4
Broker
5
Broker
6
Broker
1
Broker
2
ZK2
Client D Client F Client G
Failover site
ZK3
Broker
3
Broker
4
Broker
5
Broker
6
Client A Client B
us-central-1
Client A Client B
automated
client failover
Observer
replicas
us-west-1 us-east-1
Site failure!
“tie-breaker”
datacenter
Single Kafka Cluster
(Confluent Platform)
$ bin/kafka-topics.sh --bootstrap-servers localhost:2181
--create
--topic trades-west
--partitions 3
--config replication-factor={us-west: 2}
--config min.insync.replicas=2
--config async.replication-factor={us-east: 2}
--config max.async.time.behind.min=5
--config replay.truncated.messages=true
42. 42
Vision: One Global Kafka Cluster
www.kai-waehner.de | @KaiWaehner
Topic‘pos_payments’
44. 44
Architecture patterns for
distributed, hybrid, edge and global
Apache Kafka deployments…
www.kai-waehner.de | @KaiWaehner
à There is no best solution. It depends!
46. 46
Infrastructure Options
www.kai-waehner.de | @KaiWaehner
Infrastructure is your choice!
à Bare metal vs. VM vs. container vs. cloud…
Software is your choice!
à Open source vs. commercial vs. SaaS
Ops and management are your choice!
à Self-Managed vs. PaaS vs. fully-managed
Integration is your choice!
à Kafka-native vs. other tools / services
Find the right solution for your business case and for your SLAs…
47. 4747
CONFLUENT PLATFORM
EFFICIENT
OPERATIONS AT SCALE
PRODUCTION-STAGE
PREREQUISITES
UNRESTRICTED
DEVELOPER PRODUCTIVITY
Multi-language Development
Rich Pre-built Ecosystem
SQL-based Stream Processing
GUI-driven Mgmt & Monitoring
Flexible DevOps Automation
Dynamic Performance & Elasticity
Enterprise-grade Security
Data Compatibility
Global Availability
APACHE KAFKA
Fully Managed Cloud ServiceSelf Managed Software
FREEDOM OF CHOICE
COMMITTER-LED EXPERTISE PartnersTraining
Professional
Services
Enterprise
Support
DEVELOPER OPERATOR ARCHITECT
Hybrid Infrastructure
48. 48
Kafka as a Service – Fully Managed?
Infrastructure
management
(commodity)
Scaling
● Upgrades (latest stable version of Kafka)
● Patching
● Maintenance
● Sizing (retention, latency, throughput, storage, etc.)
● Data balancing for optimal performance
● Performance tuning for real-time and latency requirements
● Fixing Kafka bugs
● Uptime monitoring and proactive remediation of issues
● Recovery support from data corruption
● Scaling the cluster as needed
● Data balancing the cluster as nodes are added
● Support for any Kafka issue with less than X minutes response time
Infra-as-a-Service
Harness full power of Kafka
Kafka-specific
management
Platform-as-a-Service
Evolve as you need
Future-proof
Mission-critical reliability
Most Kafka-as-a-Service offerings are partially-managed
Kafka as a Service should be a serverless experience with consumption-based pricing!
49. 4949
I N V E S T M E N T & T I M E
VALUE
3
4
5
1
2
Event Streaming Maturity Model
49
Initial Awareness /
Pilot (1 Kafka Cluster)
Start to Build Pipeline /
Deliver 1 New Outcome
(1 Kafka Cluster)
Mission-Critical
Deployment
(Stretched, Hybrid,
Multi-Region)
Build Contextual
Event-Driven Apps
(Stretched, Hybrid,
Multi-Region)
Central Nervous
System
(Global Kafka)
Product, Support, Training, Partners, Technical Account Management...
50. 50End-to-End Integration with 24/7 Uptime and Zero Data Loss?
www.kai-waehner.de | @KaiWaehner
…. more components, clusters, technologies means more risks, conflicts, incompatibilities, operations burden!
ETL MQ
Storage Streaming
Messaging: Kafka Core
Storage: Kafka Core
Caching: Kafka Core
Real-Time, Batch: Kafka Clients
Integration: Kafka Connect
Stream Processing: Kafka Streams / KSQL
Request-Response: REST Proxy
Replication between Kafka Clusters
(Edge, On Premises, Hybrid, Cloud)
Multi-Region Kafka Cluster
”Eat your own dog food”
vs.
http://www.kai-waehner.de/blog/2019/03/07/apache-kafka-middleware-mq-etl-esb-comparison/