SlideShare uma empresa Scribd logo
1 de 19
Apache Kafka - Messaging System 
Dmitry Tolpeko, EPAM Systems – September 2014
Kafka Overview 
Kafka is a real-time, fault-tolerant, scalable messaging 
system. 
2 
It is a publish-subscribe system that connects various 
applications with the help of messages - producers 
and consumers of information. 
Producers and consumers are independent, 
messages are queued, one producer can serve 
multiple consumers. 
Was originally developed by LinkedIn.
SECTION 
Apache Kafka 
CONCEPTS 
3
Kafka Architecture 
4 
Client Server Client 
Producer(s) Broker(s) Consumers(s) 
ZooKeeper 
• Brokers act as the server part of Kafka. Brokers are peers, 
there is no the master broker. 
• Brokers can run on multiple nodes, but you can also run 
multiple brokers on each node. Each broker has own IP and 
port for client connections.
Topic is a way to handle 
multiple data streams 
(different data feeds i.e.) 
Each producer sends 
messages to, and 
consumers read the 
messages from the 
specified topic. 
New topics can be 
created automatically 
when a message with a 
new topic arrives, or you 
can use --create 
command to create a 
topic. 
Topics 
5 
Broker 
Topic 1 
Topic 2 
Producer 1 
Producer 2 
Producer 3 
Consumer 1 
Consumer 2
A topic can contain one or more partitions. 
Each partition is stored on a single server, and 
multiple partitions allow the queue to scale 
and go beyond the limits of a single system. 
Partitions also allow a single consumer to 
concurrently read messages in multiple 
concurrent threads. You can add new 
partitions dynamically. 
Offset is uniquely identifies a message within 
partition. 
Partitions 
6 
Broker 1 
Topic 1 
Partition 1 
Topic 2 
Partition 1 
Broker 2 
Topic 2 
Partition 2 
Partition 3
Each partition is replicated for fault-tolerance. 
Partition has one server that acts a Leader, it 
handles all read-write requests. 
Zero or more servers act as Followers, they 
replicate the leader and if it fails one of them 
becomes the new Leader. 
Leader uses ZooKeeper heartbeat mechanism to 
indicate that it is alive. 
A follower acts as a normal consumer, it pulls 
messages and updates own log. Only when all 
followers (ISR group) sync the message it can be 
send to consumers. When a follower rejoins after 
a downtime it can re-sync. 
Replication 
7 
Broker 1 
Topic 1 
Partition 1 - Leader 
Broker 2 
Topic 1 
Partition 1 - Follower
Consumers are organized to consumer 
groups. 
To consume a single message by multiple 
consumers, they must belong to different 
consumer groups. 
A consumer group is a single consumer 
abstraction, so consumers from single group 
read messages like from a queue there is no 
message broadcast within the group. This 
helps balance load among consumers of the 
same type (fault-tolerance, scalability). 
The state of consumed messages are 
handled by consumers, not brokers. 
Consumers store the state in ZooKeeper - 
offset within each partition for each 
Consumer group, not consumer (!) 
Consumer group name is unique within the 
Kafka cluster. 
Consumer Groups 
8 
Topic 1 
Partition 1 
Partition 2 
Partition 3 
Group 1 
Consumer 
Consumer 
Group 2 
Consumer
Order Guarantees and Delivery Semantics 
Each partition can be consumed only one consumer within the consumer group. 
Kafka only provides total order guarantee within a partition, not between different 
partitions in a topic. 
If you need total order over messages you have to use one partition, and in this 
case you can use only one consumer process. 
Kafka guarantees at-least-once delivery semantics by default where 
messages are never lost but may be redelivered (keys can be used to handle 
duplicates). Kafka offers options to disable retries (so messages can be lost) in 
case if the application can handle this, and needs a higher performance. 
Kafka retains all published messages - no matter whether they are consumed or 
not - for the configured period of time (2 days by default). 
9
Producer can assign a key for a message 
that defines which partition to publish 
message to. 
• Random (default, when no partition 
class or key specified) 
• Round-robin for load balancing 
• Partition function (hash by message 
key i.e.) - if key is a class type 
(Source ID i.e.) then all messages of 
the same type go to one partition. 
Producer can optionally require an 
acknowledgment from the broker that the 
message was received (synced to Leader 
or all followers). 
Kafka can group multiple messages and 
compress them. 
Producers 
10 
Producer 1 
Topic 1 
Partition 1 
Partition 2 
Partition 3 
Producer 2
Consumers read the messages from the 
brokers leading the partitions (pull method). 
A consumer labels itself with a consumer 
group. 
If the number of consumers of a specific 
consumer group is greater than the number 
of partitions, then some consumers will never 
see a message. 
If there are more partitions than consumers 
of a specific consumer group, then a 
consumer can get messages from multiple 
partitions (no order guarantee). Then when 
you add consumers, Kafka re-balances 
partitions. 
Consumers can get compressed message as 
a single message. 
Consumers 
11 
Partition 1 
Partition 2 
Partition 3 
Group 1 
Consumer 1 
Consumer 2 
Consumer 3 
Consumer 4
Consumer Advanced Features 
There are High Level and Simple Consumer API. 
A High Level Consumer sets 
auto.commit.interval.ms option that defines how often 
offset is updated in ZooKeeper. If an error occurs between 
updates, the consumer will get replayed messages (!) 
Simple Consumer is a low-level API that allows you to set 
any offset, explicitly read messages multiple times, or ensure 
that a message is processed only once. 
12
SECTION 
Apache Kafka 
INTERNALS 
13
Kafka relies heavily on OS disk cache, not 
JVM heap even for caching messages. Data 
immediately written (appended) to a file. 
Consumed messages are not deleted. 
Data files (called logs) are stored at 
log.dirs 
A directory exists for each topic partition that 
contains log segments (files 0000000.log - 
named as offset of the 1st message in the 
log). log.segment.bytes and 
log.roll.hours define rotation policy. 
log.flush.interval.xxx options define 
how often fsync performed on files. 
All options can be specified either globally or 
per topic. 
Persistence 
14 
Broker JVM App 
OS page cache 
/data/kafka-logs 
TopicName-0 
00000.log
Messages can be grouped 
together to minimize the number of 
network round-trips. 
Multiple messages can be also 
compressed together (GZIP, 
Snappy) that helps achieve a good 
compression rate and reduce 
amount of data sent over network. 
Producer can specify 
compression.codec and 
compressed.topics 
Network I/O 
15 
Message1 
Message2 
Message3 
Compressed 
Network
There is no in-memory application 
level cache, data are in the OS 
pagecache. 
Kafka uses sendfile Linux API 
calls that directly sends data from 
pagecache to a network socket, so 
there is no need to do read/write 
to application memory space. 
Grouped messages are stored 
compressed in the log, and 
decompressed only by consumers. 
Memory 
16 
Broker JVM App 
OS page cache 
Network
Log Compaction 
Without log compaction (time series data): 
17 
Key1 Key2 Key3 Key1 Key2 Key1 Key3 
A B C AA BB AAA CC 
With log compaction only the last update is stored for each key: 
Key2 Key1 Key3 
BB AAA CC 
Log compaction can be defined per topic. This can help increase 
performance of roll-forward operations, and reduce storage.
Kafka Use Cases 
• Messaging - decouple processing or handle message 
buffer 
• Monitoring and Tracking - collect activity, clickstream, 
status data and logs from various systems 
• Stream Processing - aggregate, enrich, handle micro-batches 
etc. 
• Commit Log - facilitate replication between systems 
18
Thanks! 
Join us at 
https://www.linkedin.com/groups/Belarus- 
Hadoop-User-Group-BHUG-8104884 
dmitry_tolpeko@epam.com

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Kafka basics
Kafka basicsKafka basics
Kafka basics
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)Hello, kafka! (an introduction to apache kafka)
Hello, kafka! (an introduction to apache kafka)
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
Apache Kafka - Overview
Apache Kafka - OverviewApache Kafka - Overview
Apache Kafka - Overview
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 

Destaque

YVES ROCHER CATALOGO CAMPAÑA 10-2014
YVES ROCHER  CATALOGO CAMPAÑA 10-2014YVES ROCHER  CATALOGO CAMPAÑA 10-2014
YVES ROCHER CATALOGO CAMPAÑA 10-2014
Selene Gamboa
 

Destaque (18)

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
 
kafka
kafkakafka
kafka
 
Decoupling Decisions with Apache Kafka
Decoupling Decisions with Apache KafkaDecoupling Decisions with Apache Kafka
Decoupling Decisions with Apache Kafka
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
BioniceMe
BioniceMeBioniceMe
BioniceMe
 
Keys to Successful Governance
Keys to Successful GovernanceKeys to Successful Governance
Keys to Successful Governance
 
Making Your IEP System Work for You: 5 Questions to Ask About Your IEP System
Making Your IEP System Work for You: 5 Questions to Ask About Your IEP SystemMaking Your IEP System Work for You: 5 Questions to Ask About Your IEP System
Making Your IEP System Work for You: 5 Questions to Ask About Your IEP System
 
More than a vocabulary lesson
More than a vocabulary lessonMore than a vocabulary lesson
More than a vocabulary lesson
 
Thongtu01.2015/TT-BXD
Thongtu01.2015/TT-BXDThongtu01.2015/TT-BXD
Thongtu01.2015/TT-BXD
 
LA VOTE ORY
LA VOTE ORYLA VOTE ORY
LA VOTE ORY
 
39.2015.tt.bnnptnt
39.2015.tt.bnnptnt39.2015.tt.bnnptnt
39.2015.tt.bnnptnt
 
Meadowbrook Estates
Meadowbrook EstatesMeadowbrook Estates
Meadowbrook Estates
 
YVES ROCHER CATALOGO CAMPAÑA 10-2014
YVES ROCHER  CATALOGO CAMPAÑA 10-2014YVES ROCHER  CATALOGO CAMPAÑA 10-2014
YVES ROCHER CATALOGO CAMPAÑA 10-2014
 

Semelhante a Apache Kafka - Messaging System Overview

Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
Deep Shah
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
TarekHamdi8
 

Semelhante a Apache Kafka - Messaging System Overview (20)

Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-Driven
 
Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-Driven
 
Kafka Fundamentals
Kafka FundamentalsKafka Fundamentals
Kafka Fundamentals
 
Kafka 10000 feet view
Kafka 10000 feet viewKafka 10000 feet view
Kafka 10000 feet view
 
Kafka RealTime Streaming
Kafka RealTime StreamingKafka RealTime Streaming
Kafka RealTime Streaming
 
Notes leo kafka
Notes leo kafkaNotes leo kafka
Notes leo kafka
 
Session 23 - Kafka and Zookeeper
Session 23 - Kafka and ZookeeperSession 23 - Kafka and Zookeeper
Session 23 - Kafka and Zookeeper
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka Skills
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 
Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache Kafka
 
[@NaukriEngineering] Messaging Queues
[@NaukriEngineering] Messaging Queues[@NaukriEngineering] Messaging Queues
[@NaukriEngineering] Messaging Queues
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
Apache Kafka
Apache Kafka Apache Kafka
Apache Kafka
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
Apache kafka introduction
Apache kafka introductionApache kafka introduction
Apache kafka introduction
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
 
Triage Presentation
Triage PresentationTriage Presentation
Triage Presentation
 

Mais de Dmitry Tolpeko

Mais de Dmitry Tolpeko (6)

Big Data Analytics for BI, BA and QA
Big Data Analytics for BI, BA and QABig Data Analytics for BI, BA and QA
Big Data Analytics for BI, BA and QA
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
Epam BI - Near Realtime Marketing Support System
Epam BI - Near Realtime Marketing Support SystemEpam BI - Near Realtime Marketing Support System
Epam BI - Near Realtime Marketing Support System
 
Big Data Technology - Solit 2015 Conference
Big Data Technology - Solit 2015 ConferenceBig Data Technology - Solit 2015 Conference
Big Data Technology - Solit 2015 Conference
 
Apache Yarn - Hadoop Cluster Management
Apache Yarn -  Hadoop Cluster ManagementApache Yarn -  Hadoop Cluster Management
Apache Yarn - Hadoop Cluster Management
 
Bi 2.0 hadoop everywhere
Bi 2.0   hadoop everywhereBi 2.0   hadoop everywhere
Bi 2.0 hadoop everywhere
 

Último

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Último (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 

Apache Kafka - Messaging System Overview

  • 1. Apache Kafka - Messaging System Dmitry Tolpeko, EPAM Systems – September 2014
  • 2. Kafka Overview Kafka is a real-time, fault-tolerant, scalable messaging system. 2 It is a publish-subscribe system that connects various applications with the help of messages - producers and consumers of information. Producers and consumers are independent, messages are queued, one producer can serve multiple consumers. Was originally developed by LinkedIn.
  • 3. SECTION Apache Kafka CONCEPTS 3
  • 4. Kafka Architecture 4 Client Server Client Producer(s) Broker(s) Consumers(s) ZooKeeper • Brokers act as the server part of Kafka. Brokers are peers, there is no the master broker. • Brokers can run on multiple nodes, but you can also run multiple brokers on each node. Each broker has own IP and port for client connections.
  • 5. Topic is a way to handle multiple data streams (different data feeds i.e.) Each producer sends messages to, and consumers read the messages from the specified topic. New topics can be created automatically when a message with a new topic arrives, or you can use --create command to create a topic. Topics 5 Broker Topic 1 Topic 2 Producer 1 Producer 2 Producer 3 Consumer 1 Consumer 2
  • 6. A topic can contain one or more partitions. Each partition is stored on a single server, and multiple partitions allow the queue to scale and go beyond the limits of a single system. Partitions also allow a single consumer to concurrently read messages in multiple concurrent threads. You can add new partitions dynamically. Offset is uniquely identifies a message within partition. Partitions 6 Broker 1 Topic 1 Partition 1 Topic 2 Partition 1 Broker 2 Topic 2 Partition 2 Partition 3
  • 7. Each partition is replicated for fault-tolerance. Partition has one server that acts a Leader, it handles all read-write requests. Zero or more servers act as Followers, they replicate the leader and if it fails one of them becomes the new Leader. Leader uses ZooKeeper heartbeat mechanism to indicate that it is alive. A follower acts as a normal consumer, it pulls messages and updates own log. Only when all followers (ISR group) sync the message it can be send to consumers. When a follower rejoins after a downtime it can re-sync. Replication 7 Broker 1 Topic 1 Partition 1 - Leader Broker 2 Topic 1 Partition 1 - Follower
  • 8. Consumers are organized to consumer groups. To consume a single message by multiple consumers, they must belong to different consumer groups. A consumer group is a single consumer abstraction, so consumers from single group read messages like from a queue there is no message broadcast within the group. This helps balance load among consumers of the same type (fault-tolerance, scalability). The state of consumed messages are handled by consumers, not brokers. Consumers store the state in ZooKeeper - offset within each partition for each Consumer group, not consumer (!) Consumer group name is unique within the Kafka cluster. Consumer Groups 8 Topic 1 Partition 1 Partition 2 Partition 3 Group 1 Consumer Consumer Group 2 Consumer
  • 9. Order Guarantees and Delivery Semantics Each partition can be consumed only one consumer within the consumer group. Kafka only provides total order guarantee within a partition, not between different partitions in a topic. If you need total order over messages you have to use one partition, and in this case you can use only one consumer process. Kafka guarantees at-least-once delivery semantics by default where messages are never lost but may be redelivered (keys can be used to handle duplicates). Kafka offers options to disable retries (so messages can be lost) in case if the application can handle this, and needs a higher performance. Kafka retains all published messages - no matter whether they are consumed or not - for the configured period of time (2 days by default). 9
  • 10. Producer can assign a key for a message that defines which partition to publish message to. • Random (default, when no partition class or key specified) • Round-robin for load balancing • Partition function (hash by message key i.e.) - if key is a class type (Source ID i.e.) then all messages of the same type go to one partition. Producer can optionally require an acknowledgment from the broker that the message was received (synced to Leader or all followers). Kafka can group multiple messages and compress them. Producers 10 Producer 1 Topic 1 Partition 1 Partition 2 Partition 3 Producer 2
  • 11. Consumers read the messages from the brokers leading the partitions (pull method). A consumer labels itself with a consumer group. If the number of consumers of a specific consumer group is greater than the number of partitions, then some consumers will never see a message. If there are more partitions than consumers of a specific consumer group, then a consumer can get messages from multiple partitions (no order guarantee). Then when you add consumers, Kafka re-balances partitions. Consumers can get compressed message as a single message. Consumers 11 Partition 1 Partition 2 Partition 3 Group 1 Consumer 1 Consumer 2 Consumer 3 Consumer 4
  • 12. Consumer Advanced Features There are High Level and Simple Consumer API. A High Level Consumer sets auto.commit.interval.ms option that defines how often offset is updated in ZooKeeper. If an error occurs between updates, the consumer will get replayed messages (!) Simple Consumer is a low-level API that allows you to set any offset, explicitly read messages multiple times, or ensure that a message is processed only once. 12
  • 13. SECTION Apache Kafka INTERNALS 13
  • 14. Kafka relies heavily on OS disk cache, not JVM heap even for caching messages. Data immediately written (appended) to a file. Consumed messages are not deleted. Data files (called logs) are stored at log.dirs A directory exists for each topic partition that contains log segments (files 0000000.log - named as offset of the 1st message in the log). log.segment.bytes and log.roll.hours define rotation policy. log.flush.interval.xxx options define how often fsync performed on files. All options can be specified either globally or per topic. Persistence 14 Broker JVM App OS page cache /data/kafka-logs TopicName-0 00000.log
  • 15. Messages can be grouped together to minimize the number of network round-trips. Multiple messages can be also compressed together (GZIP, Snappy) that helps achieve a good compression rate and reduce amount of data sent over network. Producer can specify compression.codec and compressed.topics Network I/O 15 Message1 Message2 Message3 Compressed Network
  • 16. There is no in-memory application level cache, data are in the OS pagecache. Kafka uses sendfile Linux API calls that directly sends data from pagecache to a network socket, so there is no need to do read/write to application memory space. Grouped messages are stored compressed in the log, and decompressed only by consumers. Memory 16 Broker JVM App OS page cache Network
  • 17. Log Compaction Without log compaction (time series data): 17 Key1 Key2 Key3 Key1 Key2 Key1 Key3 A B C AA BB AAA CC With log compaction only the last update is stored for each key: Key2 Key1 Key3 BB AAA CC Log compaction can be defined per topic. This can help increase performance of roll-forward operations, and reduce storage.
  • 18. Kafka Use Cases • Messaging - decouple processing or handle message buffer • Monitoring and Tracking - collect activity, clickstream, status data and logs from various systems • Stream Processing - aggregate, enrich, handle micro-batches etc. • Commit Log - facilitate replication between systems 18
  • 19. Thanks! Join us at https://www.linkedin.com/groups/Belarus- Hadoop-User-Group-BHUG-8104884 dmitry_tolpeko@epam.com