SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
A Bridge over Troubled Water - Implementing
Exactly-Once Semantics and Escaping Kafka
Rebalance Storms
Antonovsky Yulia
© 2023 Akamai
2
© 2023 Akamai
3
© 2023 Akamai
4
About Me
Senior Software Engineer II at Akamai Technologies since 2020
Big Data Engineering experience since 2016
Started career as student intern at SAP Labs Israel in 2007
yulia-antonovsky
© 2023 Akamai
5
Agenda
➢ Introduction
➢ CSI Ingest architecture
➢ Managing Kafka Transactions
➢ Avoid Kafka endless rebalancing
➢ Q&A
© 2023 Akamai
6
About Akamai Technologies
Akamai Technologies is the largest content delivery network (CDN) services
provider in the world that also offers cloud and security services.
In numbers:
● 350K servers across the world
● 8B requests per day
● ~ 30% of the global internet traffic
We power and protect life online
© 2023 Akamai
7
About CSI Group (Cloud Security Intelligence)
Our team is responsible for the ongoing development and maintenance of a platform
designed to collect, analyze, and distill high-quality security intelligence information. We
handle a daily traffic of about 10GB/s, processing approximately 150 billion raw data
events per day.
CSI Cluster
© 2023 Akamai
8
CSI Ingest Architecture
© 2023 Akamai
9
Drill Down
Standart iteration flow:
1. Consume kafka messages
2. Read files from Blob
3. Process the data
4. Write to Blob results
5. Produce kafka messages
© 2023 Akamai
10
Guardians of the Data
Just like the Guardians of the
Galaxy protect the universe, we
are dedicated to protecting the
accuracy of our customers' data
How can we prevent data loss or duplication when application pods are
continuously scaled in and out to handle data traffic?
© 2023 Akamai
11
Managing Kafka Transactions
● We actively manage partition offsets to ensure that we consume data from Kafka exactly
once.
● We rely on Kafka Transactional API support of idempotent writes in preventing duplicate data
even in the event of failures or retries.
● We leverage Kafka's Transactional API to write data to multiple Kafka topics, ensuring that all
writes either succeed or fail together.
© 2023 Akamai
12
KafkaTransactionManager
● It supports seamless processing of transactional data across one or more source and target
topics.
● The component handles the entire process from message consumption to committing or
aborting Kafka transactions.
● To simplify the use of Kafka transactions across all our applications, we developed a
component called KafkaTransactionManager.
© 2023 Akamai
13
KafkaTransactionManager API
© 2023 Akamai
14
KafkaTransactionManager API
kafkaTransactionManager.beginTransaction() starts new transaction and reset offsets
kafkaTransactionManager.consumeRecords(pollTimeout) executes one poll from subscribed Kafka topics,
returns consumed messages, and updates offsets if needed. It can be called multiple times during the same transaction
to retrieve additional messages.
kafkaTransactionManager.produceRecord(topics, key, value) produces a record on one or more target
topics. This method can be called multiple times within the same transaction to send additional messages.
kafkaTransactionManager.commitTransaction() this API finalizes the current transaction, sends the updated
consumed offsets to the consumer group, and commits both consumed and produced messages on all topics. If a failure
occurs, the abortTransaction API must be called to ensure that the transaction is rolled back.
kafkaTransactionManager.abortTransaction() closes the current transaction, resets consumed offsets by
executing the seek API for all assigned TopicPartitions on the Kafka consumer client. If abort transaction fails, the Kafka
producer client is closed, and a new one is created.
© 2023 Akamai
15
Kafka Clients’ “Transactional” Configurations
● Kafka consumer client configurations:
○ enable.auto.commit = false
○ isolation.level = read_committed
● Kafka producer client configurations:
○ transactional.id = randomUUID()
○ transaction.timeout.ms - depends on application
© 2023 Akamai
16
Avoid Kafka endless rebalancing
Within a consumer group, Kafka changes the ownership of partition from one consumer to
another at certain events. The process of changing partition ownership across the
consumers is called a rebalance.
© 2023 Akamai
17
What Triggers a Rebalancing?
● The topic partition or partition replica count changes
● Consumer group properties are changed
● Consumer joins or leaves a group
Why it can rebalance forever?
● Networking issues
● System complexity
● Inappropriate configurations
● Scale up/down, k8s moves pods
● Application/pod restarts
● Not all pods start synchronously
© 2023 Akamai
18
Kafka “Rebalance” Configurations
All of the related configurations are Kafka consumer configurations.
session.timeout.ms: specifies the maximum time duration that the consumer coordinator will wait for
a heartbeat signal from a consumer before removing it from the group.
heartbeat.interval.ms: This configuration specifies the expected time between heartbeats sent to
the consumer coordinator.
max.poll.interval.ms: This setting determines the maximum delay between invocations of poll()
when using consumer group management.
group.instance.id: A unique identifier provided by the end-user for the consumer instance.
partition.assignment.strategy: A list of class names or class types, ordered by preference, of
supported partition assignment strategies that the client will use to distribute partition ownership.
© 2023 Akamai
19
Partition Assignment Strategy
CooperativeStickyAssignor - Follows the same StickyAssignor logic, but allows for cooperative
rebalancing. Available since version 2.4.
RangeAssignor - Assigns partitions on a per-topic basis, where each consumer is assigned a
contiguous range of partitions.
RoundRobinAssignor - Assigns partitions to consumers in a round-robin fashion.
StickyAssignor - Guarantees an assignment that is maximally balanced while preserving as
many existing partition assignments as possible.
© 2023 Akamai
20
Kafka Rebalance Listener
ConsumerPartitionAssignor is a high-level interface that allows you to implement your own
custom partition assignment strategy.
● Rebalance listener can't prevent rebalancing but can minimize its impact
● Can only be triggered during polling
● In transactional iterations, it can save processing costs
ConsumerRebalanceListener is a low-level interface that allows you to receive notifications before
and after the partition assignment.
© 2023 Akamai
21
Summary
★ Manage consumed offsets manually when using Kafka's transactional API.
★ Disable auto commit, use read committed mode on consumer client config,
and add transactional.id to producer config.
★ Use ConsumerRebalanceListener to minimize the impact of Kafka rebalance.
★ Configure appropriate timeouts on consumer client and define
group.instance.id, when possible, to skip Kafka rebalances.
★ Choose a partition assignment strategy carefully, and experiment with
different strategies to determine the best fit.
© 2023 Akamai
22
Q&A
Thank you:)
Feel free to reach me out yulia-antonovsky

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 
Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...
Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...
Restoring Restoration's Reputation in Kafka Streams with Bruno Cadonna & Luca...
 
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
 
Kafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and OpsKafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and Ops
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!Counting Unique Users in Real-Time: Here's a Challenge for You!
Counting Unique Users in Real-Time: Here's a Challenge for You!
 
Kafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platformKafka Tutorial - introduction to the Kafka streaming platform
Kafka Tutorial - introduction to the Kafka streaming platform
 
Google Cloud Platform Data Storage
Google Cloud Platform Data StorageGoogle Cloud Platform Data Storage
Google Cloud Platform Data Storage
 
Real-Time Streaming Data on AWS
Real-Time Streaming Data on AWSReal-Time Streaming Data on AWS
Real-Time Streaming Data on AWS
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
kafka
kafkakafka
kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Trunk based development and Canary deployment
Trunk based development and Canary deploymentTrunk based development and Canary deployment
Trunk based development and Canary deployment
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...Building a Streaming Microservice Architecture: with Apache Spark Structured ...
Building a Streaming Microservice Architecture: with Apache Spark Structured ...
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 

Semelhante a Implementing Exactly-once Delivery and Escaping Kafka Rebalance Storms with Yulia Antonovsky

Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Shameera Rathnayaka
 
Kafka-and-event-driven-architecture-OGYatra20.ppt
Kafka-and-event-driven-architecture-OGYatra20.pptKafka-and-event-driven-architecture-OGYatra20.ppt
Kafka-and-event-driven-architecture-OGYatra20.ppt
Inam Bukhary
 
MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021
Julian Douch
 

Semelhante a Implementing Exactly-once Delivery and Escaping Kafka Rebalance Storms with Yulia Antonovsky (20)

Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
Kafka-and-event-driven-architecture-OGYatra20.ppt
Kafka-and-event-driven-architecture-OGYatra20.pptKafka-and-event-driven-architecture-OGYatra20.ppt
Kafka-and-event-driven-architecture-OGYatra20.ppt
 
Insta clustr seattle kafka meetup presentation bb
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
 
A Primer Towards Running Kafka on Top of Kubernetes.pdf
A Primer Towards Running Kafka on Top of Kubernetes.pdfA Primer Towards Running Kafka on Top of Kubernetes.pdf
A Primer Towards Running Kafka on Top of Kubernetes.pdf
 
Kafka aws
Kafka awsKafka aws
Kafka aws
 
Building Data Streaming Platforms using OpenShift and Kafka
Building Data Streaming Platforms using OpenShift and KafkaBuilding Data Streaming Platforms using OpenShift and Kafka
Building Data Streaming Platforms using OpenShift and Kafka
 
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for KubernetesConfluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
 
Kafka and event driven architecture -og yatra20
Kafka and event driven architecture -og yatra20Kafka and event driven architecture -og yatra20
Kafka and event driven architecture -og yatra20
 
Kafka and event driven architecture -apacoug20
Kafka and event driven architecture -apacoug20Kafka and event driven architecture -apacoug20
Kafka and event driven architecture -apacoug20
 
Apache Kafka: Next Generation Distributed Messaging System
Apache Kafka: Next Generation Distributed Messaging SystemApache Kafka: Next Generation Distributed Messaging System
Apache Kafka: Next Generation Distributed Messaging System
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Leveraging the power of the unbundled database
Leveraging the power of the unbundled databaseLeveraging the power of the unbundled database
Leveraging the power of the unbundled database
 
MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021
 
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
 
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
 
Event Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaEvent Driven Architectures with Apache Kafka
Event Driven Architectures with Apache Kafka
 
Comparison of Current Service Mesh Architectures
Comparison of Current Service Mesh ArchitecturesComparison of Current Service Mesh Architectures
Comparison of Current Service Mesh Architectures
 
Transform into a Cloud-First Business with SAP on AWS and Capgemini’s Cloud C...
Transform into a Cloud-First Business with SAP on AWS and Capgemini’s Cloud C...Transform into a Cloud-First Business with SAP on AWS and Capgemini’s Cloud C...
Transform into a Cloud-First Business with SAP on AWS and Capgemini’s Cloud C...
 

Mais de HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

Mais de HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Implementing Exactly-once Delivery and Escaping Kafka Rebalance Storms with Yulia Antonovsky

  • 1. A Bridge over Troubled Water - Implementing Exactly-Once Semantics and Escaping Kafka Rebalance Storms Antonovsky Yulia
  • 4. © 2023 Akamai 4 About Me Senior Software Engineer II at Akamai Technologies since 2020 Big Data Engineering experience since 2016 Started career as student intern at SAP Labs Israel in 2007 yulia-antonovsky
  • 5. © 2023 Akamai 5 Agenda ➢ Introduction ➢ CSI Ingest architecture ➢ Managing Kafka Transactions ➢ Avoid Kafka endless rebalancing ➢ Q&A
  • 6. © 2023 Akamai 6 About Akamai Technologies Akamai Technologies is the largest content delivery network (CDN) services provider in the world that also offers cloud and security services. In numbers: ● 350K servers across the world ● 8B requests per day ● ~ 30% of the global internet traffic We power and protect life online
  • 7. © 2023 Akamai 7 About CSI Group (Cloud Security Intelligence) Our team is responsible for the ongoing development and maintenance of a platform designed to collect, analyze, and distill high-quality security intelligence information. We handle a daily traffic of about 10GB/s, processing approximately 150 billion raw data events per day. CSI Cluster
  • 8. © 2023 Akamai 8 CSI Ingest Architecture
  • 9. © 2023 Akamai 9 Drill Down Standart iteration flow: 1. Consume kafka messages 2. Read files from Blob 3. Process the data 4. Write to Blob results 5. Produce kafka messages
  • 10. © 2023 Akamai 10 Guardians of the Data Just like the Guardians of the Galaxy protect the universe, we are dedicated to protecting the accuracy of our customers' data How can we prevent data loss or duplication when application pods are continuously scaled in and out to handle data traffic?
  • 11. © 2023 Akamai 11 Managing Kafka Transactions ● We actively manage partition offsets to ensure that we consume data from Kafka exactly once. ● We rely on Kafka Transactional API support of idempotent writes in preventing duplicate data even in the event of failures or retries. ● We leverage Kafka's Transactional API to write data to multiple Kafka topics, ensuring that all writes either succeed or fail together.
  • 12. © 2023 Akamai 12 KafkaTransactionManager ● It supports seamless processing of transactional data across one or more source and target topics. ● The component handles the entire process from message consumption to committing or aborting Kafka transactions. ● To simplify the use of Kafka transactions across all our applications, we developed a component called KafkaTransactionManager.
  • 14. © 2023 Akamai 14 KafkaTransactionManager API kafkaTransactionManager.beginTransaction() starts new transaction and reset offsets kafkaTransactionManager.consumeRecords(pollTimeout) executes one poll from subscribed Kafka topics, returns consumed messages, and updates offsets if needed. It can be called multiple times during the same transaction to retrieve additional messages. kafkaTransactionManager.produceRecord(topics, key, value) produces a record on one or more target topics. This method can be called multiple times within the same transaction to send additional messages. kafkaTransactionManager.commitTransaction() this API finalizes the current transaction, sends the updated consumed offsets to the consumer group, and commits both consumed and produced messages on all topics. If a failure occurs, the abortTransaction API must be called to ensure that the transaction is rolled back. kafkaTransactionManager.abortTransaction() closes the current transaction, resets consumed offsets by executing the seek API for all assigned TopicPartitions on the Kafka consumer client. If abort transaction fails, the Kafka producer client is closed, and a new one is created.
  • 15. © 2023 Akamai 15 Kafka Clients’ “Transactional” Configurations ● Kafka consumer client configurations: ○ enable.auto.commit = false ○ isolation.level = read_committed ● Kafka producer client configurations: ○ transactional.id = randomUUID() ○ transaction.timeout.ms - depends on application
  • 16. © 2023 Akamai 16 Avoid Kafka endless rebalancing Within a consumer group, Kafka changes the ownership of partition from one consumer to another at certain events. The process of changing partition ownership across the consumers is called a rebalance.
  • 17. © 2023 Akamai 17 What Triggers a Rebalancing? ● The topic partition or partition replica count changes ● Consumer group properties are changed ● Consumer joins or leaves a group Why it can rebalance forever? ● Networking issues ● System complexity ● Inappropriate configurations ● Scale up/down, k8s moves pods ● Application/pod restarts ● Not all pods start synchronously
  • 18. © 2023 Akamai 18 Kafka “Rebalance” Configurations All of the related configurations are Kafka consumer configurations. session.timeout.ms: specifies the maximum time duration that the consumer coordinator will wait for a heartbeat signal from a consumer before removing it from the group. heartbeat.interval.ms: This configuration specifies the expected time between heartbeats sent to the consumer coordinator. max.poll.interval.ms: This setting determines the maximum delay between invocations of poll() when using consumer group management. group.instance.id: A unique identifier provided by the end-user for the consumer instance. partition.assignment.strategy: A list of class names or class types, ordered by preference, of supported partition assignment strategies that the client will use to distribute partition ownership.
  • 19. © 2023 Akamai 19 Partition Assignment Strategy CooperativeStickyAssignor - Follows the same StickyAssignor logic, but allows for cooperative rebalancing. Available since version 2.4. RangeAssignor - Assigns partitions on a per-topic basis, where each consumer is assigned a contiguous range of partitions. RoundRobinAssignor - Assigns partitions to consumers in a round-robin fashion. StickyAssignor - Guarantees an assignment that is maximally balanced while preserving as many existing partition assignments as possible.
  • 20. © 2023 Akamai 20 Kafka Rebalance Listener ConsumerPartitionAssignor is a high-level interface that allows you to implement your own custom partition assignment strategy. ● Rebalance listener can't prevent rebalancing but can minimize its impact ● Can only be triggered during polling ● In transactional iterations, it can save processing costs ConsumerRebalanceListener is a low-level interface that allows you to receive notifications before and after the partition assignment.
  • 21. © 2023 Akamai 21 Summary ★ Manage consumed offsets manually when using Kafka's transactional API. ★ Disable auto commit, use read committed mode on consumer client config, and add transactional.id to producer config. ★ Use ConsumerRebalanceListener to minimize the impact of Kafka rebalance. ★ Configure appropriate timeouts on consumer client and define group.instance.id, when possible, to skip Kafka rebalances. ★ Choose a partition assignment strategy carefully, and experiment with different strategies to determine the best fit.
  • 22. © 2023 Akamai 22 Q&A Thank you:) Feel free to reach me out yulia-antonovsky