The document provides an overview of Kafka including its problem statement, use cases, key terminologies, architecture, and components. It defines topics as streams of data that can be split into partitions with a unique offset. Producers write data to brokers which replicate across partitions for fault tolerance. Consumers read data from partitions in a consumer group. Zookeeper manages the metadata and brokers act as the developers while topics are analogous to modules with partitions as tasks.
2. Source Source Source
Target Target Target
Problem statement
• Many source and target system integration
• High velocity streams
3. Source Source Source
Target Target Target
Kafka
Use cases
• Tracking user activity
• Log aggregation
• De-coupling systems
• Streaming processing
4. Producer Producer Producer
Consumer Consumer Consumer
Kafka
Kafka
• Scale to 100s of nodes
• Handle millions of messages per second
• Real-time processing (~10ms)
Kafka is horizontally scalable, fault tolerant and
fast messaging system.
5. Topic
• Stream of data
• Similar to table in a NoSQL
• Split into partition
• Data is retrieved through offset
• Offset unique per partition per topic
7. Kafka
Broker 1 Broker 2 Broker 3
Topic A
(Partition 0)
Topic A
(Partition 0)
Topic A
(Partition 1)
Topic A
(Partition 1)
Topic A
(Partition 2)
Topic A
(Partition 2)
Topic B
(Partition 0)
Topic B
(Partition 0)
Topic B
(Partition 1)
Topic B
(Partition 1)
• Topic A – 3 Partitions
• Topic B – 2 Partitions
Partition
• Enables topic to be distributed
• Unit of parallelism
• Usually one topic many partition
• Order is guaranteed only within a partition
• Messages are immutable
8. 1 2 3 4 5 6
1 2 3 4
1 2 3 4 5
Partition 0
Partition 1
Partition 2
Topic A
Kafka
ProducerProducer
Offset
Partition – offset & key
• Key
Messages are written to partition based on
key
No key then round-robin
Keys are important to avoid hotspots.
• Offsets
Incremental unique id per partition
10. Zookeeper 0
(Follower)
Zookeeper 0
(Follower)
Zookeeper 1
(Leader)
Zookeeper 1
(Leader)
Zookeeper 2
(Follower)
Zookeeper 2
(Follower)
Broker 0Broker 0 Broker 1Broker 1 Broker 2Broker 2 Broker 3Broker 3 Broker 5Broker 5
All Meta data
Writes
Zookeeper
• Hierarchical key-value store
• Configuration, synchronization and name registry services
• Ensemble layer
• Ties things together
• Ensures high availability
• Odd number of nodes
• More than 7 nodes not recommended
• Kafka can’t work without zookeeper
• Stores metadata
• Leader & follower nodes
• All writes only through leader node
• From Kafka 0.10 offsets are not managed by zookeeper
• Acts like a project manager (analogy)
Zookeeper is a centralized service for managing
distributed systems.
11. Kafka
1 2 3
1 2
Partition 0
Partition 1
Topic A
Broker 1
1 2 3
1 2
Partition 2
Partition 3
Topic A
Broker 2
Producer Producer Producer
Consumer Consumer Consumer
Broker
• Single Kafka node
• Managed by Zookeeper
• Topic is distributed across brokers based on
partition and replication
• Acts like a developer (analogy)
12. Kafka
Broker 1 Broker 2 Broker 3
Topic B
(Partition 0)
[Leader]
Topic B
(Partition 0)
[Leader]
Topic B
(Partition 0)
[Follower]
• Topic B – 2 Partitions
• Replication factor of 2
Topic B
(Partition 1)
[Leader]
Topic B
(Partition 1)
[Leader]
Topic B
(Partition 1)
[Follower]
Producer
Consumer
Group
Replication
• Copy of a partition in another broker
• Enables fault tolerant
• Follower partition replicates from leader
• Only leader serves both producer and
consumer
• ISR – In Sync Replica
13. Dev Team
Developer 1 Developer 2 Developer 3
Module B
(Task 0)
[Leader]
Module B
(Task 0)
[Leader]
Module B
(Task 0)
[Follower]
• Module B – 2 parallel task
• 1 back resource for module B
Module B
(Task 1)
[Leader]
Module B
(Task 1)
[Leader]
Module B
(Task 1)
[Follower]
Manager
(Leader)
Manager
(Leader)
Task
Assigner
Testing
Team
Replication – IT team analogy