SlideShare a Scribd company logo
1 of 41
Apache Kafka
Monitoring vs Alerting
Ratish Ravindran
Sr. Site Reliability Engineer - LinkedIn
Agenda
Kafka 101
History of Kafka monitoring at LinkedIn
Key Metrics to alert on and why
Future of Kafka monitoring at LinkedIn
Q&A
Kafka 101
Brokers and
Clusters
● Single Kafka Server
● Multiple brokers
Topics and
Partitions
● Categorized messages
● Broken down into partitions
Leader and
Replica
● Broker owning partition
● Broker with partition copy
Producer and
Consumer
Controller
● Broker with more responsibilities
● Partition management
• Leader election
• Leader Switch
• New topic and partition
• New broker
History of Kafka Monitoring
at LinkedIn
Common Metrics
● Host metrics
● JVM metrics
● Broker or Application metrics
Host Metrics
● Load average
● Memory usage (free, used, %)
● CPU
● Inbound/Outbound network stats
● Disk usage
● RAID stats
● Inode (free, used, %)
● IO stats
● Disk states
● TCP connections (est. count, error etc)
~ 320 host metrics
JVM Metrics
● Minor/Full GC time
● Minor/Full GC count
● Total thread count
● Used thread count
● Total heap size available
● Total heap size used
Broker Metrics
● Bytes and messages in/out rate
● Fetch/produce queue size
● Request handler idle ratio
● Request pool usage
● Request latencies – at different stages
● Log flush rate
● Produce and consume latency
2100 broker metrics and 165 graphs
Kafka at LinkedIn
Kafka at LinkedIn
● 2500+ Kafka brokers
● 1 PB In per day
● 3.9 PB Out per day
Kafka Key Metrics
Kafka as a service
● Broker/Cluster Health
● Message Delivery
● Performance
● Capacity
Metrics you need to know
Partitions that are
not fully replicated
within the cluster
Partitions
unavailable for
produce and
consume
Should always be 1
URP Offline
Partitions
Active Controller
Count of partitions
with ISR < MinISR
Under MinISR
Count
Under Replicated Partitions
● Highly discussed
● Overall cluster health
● Replication is consumer and producer
Under Replicated Partitions
Under Replicated Partitions
Under Replicated Partitions
Metrics you need to know
Partitions that are
not fully replicated
within the cluster
Partitions
unavailable for
produce and
consume
Should always be 1
URP Offline
Partitions
Active Controller
Count of partitions
with ISR < MinISR
UnderMinIsrP
artitionCount
Offline partition count
● Partition(s) unavailable
● All brokers hosting replica down OR
● unclean.leader.election.enabled=false
● Potential data loss
Metrics you need to know
Partitions that are
not fully replicated
within the cluster
Partitions
unavailable for
produce and
consume
Should always be 1
URP Offline
Partitions
Active Controller
Count of partitions
with ISR < MinISR
UnderMinIsrP
artitionCount
Active Controller Count
● What is Kafka controller ?
● Partition management
● There should only 1 controller
Active Controller Count < 1
Active Controller Count > 1
Metrics you need to know
Partitions that are
not fully replicated
within the cluster
Partitions
unavailable for
produce and
consume
Should always be 1
URP Offline
Partitions
Active Controller
Count of partitions
with ISR < MinISR
UnderMinIsrP
artitionCount
UnderMinIsrPartitionCount
● min.insync.replicas
● in-sync replicas count < min.insync.replicas
● Per-broker metrics
UnderMinIsrPartitionCount
Screen Shot 2018-06-18 at 3.12.17 PMOffline partitions URP
Under MinIsr Partition count
Operating System
and Hardware
Metrics
● Should I be worried ?
● What application is causing it ?
● Don’t alert unless:
● 100% clear signal
● 100% actionable
Capacity
Planning
● Plan in advance
● Multi-factor
● Don’t alert for capacity
Capacity
Metrics
● Request Handler Idle Ratio
● Disk Utilization
● Partition Count
● Network Utilization
Future of Kafka Monitoring at
LInkedIn
Future of Monitoring Kafka at LinkedIn
● SLO based
● Monitor and alert on SLO metrics
• Latency – Produce and consume
• Availability – Produce and consume
• Retention
Availability
Monitoring
● Measured externally
● Client focused
● github.com/linkedin/kafka-monitor
Conclusion
Monitoring is not alerting
● Collect everything
● Alert on almost nothing
● SLO based monitoring
Getting (and Giving) Help
• Community
users@kafka.apache.org
dev@kafka.apache.org
• Bugs and Work
https://issues.apache.org/jira/projects/KAFKA
• Kafka Monitor
https://github.com/linkedin/kafka-monitor
• Burrow
https://github.com/linkedin/Burrow
• Cruise Control
https://github.com/linkedin/cruise-control
• kafka-tools
https://github.com/linkedin/kafka-tools
Questions

More Related Content

What's hot

Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in Kafka
Jayesh Thakrar
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancing
confluent
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
PyData
 

What's hot (20)

Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
Introducing KRaft: Kafka Without Zookeeper With Colin McCabe | Current 2022
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
Bringing Kafka Without Zookeeper Into Production with Colin McCabe | Kafka Su...
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Everything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @TwitterEverything I Ever Learned About JVM Performance Tuning @Twitter
Everything I Ever Learned About JVM Performance Tuning @Twitter
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
 
No data loss pipeline with apache kafka
No data loss pipeline with apache kafkaNo data loss pipeline with apache kafka
No data loss pipeline with apache kafka
 
Data Loss and Duplication in Kafka
Data Loss and Duplication in KafkaData Loss and Duplication in Kafka
Data Loss and Duplication in Kafka
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
Api observability
Api observability Api observability
Api observability
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
Loki - like prometheus, but for logs
Loki - like prometheus, but for logsLoki - like prometheus, but for logs
Loki - like prometheus, but for logs
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...
6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...
6 Nines: How Stripe keeps Kafka highly-available across the globe with Donny ...
 
Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)Exploring Java Heap Dumps (Oracle Code One 2018)
Exploring Java Heap Dumps (Oracle Code One 2018)
 
Design and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative RebalancingDesign and Implementation of Incremental Cooperative Rebalancing
Design and Implementation of Incremental Cooperative Rebalancing
 
Ray distributed python framework
Ray distributed python framework Ray distributed python framework
Ray distributed python framework
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Designing a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd productsDesigning a complete ci cd pipeline using argo events, workflow and cd products
Designing a complete ci cd pipeline using argo events, workflow and cd products
 

Similar to Apache Kafka : Monitoring vs Alerting

Similar to Apache Kafka : Monitoring vs Alerting (20)

Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
kafka
kafkakafka
kafka
 
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark StreamingIntro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
Intro to Apache Apex (next gen Hadoop) & comparison to Spark Streaming
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Architectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark StreamingArchitectual Comparison of Apache Apex and Spark Streaming
Architectual Comparison of Apache Apex and Spark Streaming
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Building zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafkaBuilding zero data loss pipelines with apache kafka
Building zero data loss pipelines with apache kafka
 
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
 
Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
 
Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
 

Recently uploaded

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 

Recently uploaded (20)

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptxA CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
A CASE STUDY ON CERAMIC INDUSTRY OF BANGLADESH.pptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 

Apache Kafka : Monitoring vs Alerting