SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
Nielsen Presents: Fun
with Kafka, Spark and
Offset Management
or how we stopped worrying and started to enjoy offset
management
Agenda
Agenda
• How we used to work
• What we wanted to do and why
• What we ended up doing
• Extra Benefits
• Simona
• Big Data Engineer at Nielsen Marketing Cloud
• Data lover
• Concert Goer
• Japan enthusiast
whoami
How we used to
work
• Kafka Spark Consumer
• Offsets Strategies
• Kafka Offset Manager
• TEAM A vs TEAM B Flows
Our Different Flows
TEAM RON TEAM ITAI
Micro Batch
Length
4-6 minutes 1 hour
Output to
Commit
Kafka
Offsets
Kafka
Offsets, File
names
Defining Starting
Offsets
• Kafka Offset Manager
• earliest
• latest
• XML
Defining Starting
Offsets
• Kafka Offset Manager
• earliest
• latest
• XML
Defining Starting
Offsets
• Kafka Offset Manager
• earliest
• latest
• XML
Defining Starting
Offsets
• Kafka Offset Manager
• earliest
• latest
• XML
Defining Starting
Offsets
• Kafka Offset Manager
• earliest
• latest
• XML
Defining Starting
Offsets
• Kafka Offset Manager
• earliest
• latest
• XML
kafka offset
manager
• Self made
• Offsets stored in kafka
• Everything is against brokers
What we wanted to
do and why
• JUST A SIMPLE UPGRADE
• Two main goals to achieve with our
infrastructure
• Some of the problems
we encountered
Just a Simple
Upgrade
● New Kafka Consumer API
● New consumer config
● Subscribing to Kafka
SOME OF THE PROBLEMS
WE ENCOUNTERED
Committing offsets to Kafka fails
Longer timeouts then!
Spark graceful shutdown
Spark Kafka aSync Commit
……?
what we ended up
doing
• RDS Offset Store
• Consistent Committing
• Subscribing to topics (Offsets
Strategies)
• Unified infrastructure
• Timeouts
RDS Offset Store
• Table Structure
• Constraints and upserts
• Triggers
RDS Offset Store
RDS Offset Store
RDS Offset Store
Subscribing to
Topics
Building the
Offsets Map
● What is the offsets Map[TopicPartition,Long]
● Getting the number of partitions
● Constructing the offsets map
Getting Number of
Partitions
Getting Current
Offsets
Getting Current
Offsets
Getting Current
Offsets
Getting Current
Offsets
Getting Current
Offsets
Getting Current
Offsets
Getting Current
Offsets
Getting Current
Offsets
● Each map represents a row
● Reduce rows into the offsets map
“Two Phase”
Commit
● What is a “two phase” commit?
● Why do we need it?
● How did we end up implementing it?
“Two Phase”
Commit
“Two Phase”
Commit
“Two Phase”
Commit
“Two Phase”
Commit
“Two Phase”
Commit
“Two Phase”
Commit
Extra benefits
• Restart Time
• Monitoring
• Disaster Recovery
• Unified Configurations
Committing to
Kafka
● Why?
● Two options of committing to Kafka
New Offsets
Strategies
• DataBase
• Earliest
• Latest
Working against
bd-commons
Nielsen Presents: Fun with Kafka, Spark and Offset Management

Mais conteúdo relacionado

Mais procurados

Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)
Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)
Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)
DataArt
 

Mais procurados (20)

Infrastructure as code (iac) - Terraform for AWS
Infrastructure as code (iac) - Terraform for AWSInfrastructure as code (iac) - Terraform for AWS
Infrastructure as code (iac) - Terraform for AWS
 
The great migration embracing serverless first
The great migration  embracing serverless first The great migration  embracing serverless first
The great migration embracing serverless first
 
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...
Best practices for running Windows workloads on AWS - AWS Summit Stockholm (M...
 
Serverless Computing @ x-celerate 2018
Serverless Computing @ x-celerate 2018Serverless Computing @ x-celerate 2018
Serverless Computing @ x-celerate 2018
 
Serverless Computing: Run code, not servers
Serverless Computing: Run code, not serversServerless Computing: Run code, not servers
Serverless Computing: Run code, not servers
 
Fall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using GrafanaFall in Love with Graphs and Metrics using Grafana
Fall in Love with Graphs and Metrics using Grafana
 
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
MongoDB World 2018: Solving Your Backup Needs Using MongoDB Ops Manager, Clou...
 
How to move a mission critical system to 4 AWS regions in one year?
How to move a mission critical system to 4 AWS regions in one year?How to move a mission critical system to 4 AWS regions in one year?
How to move a mission critical system to 4 AWS regions in one year?
 
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
 
What is new in pass summit 2014
What is new in pass summit 2014What is new in pass summit 2014
What is new in pass summit 2014
 
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018
The Problem is Data: Gwen Shapira, Confluent, Serverless NYC 2018
 
Algolia's Fury Road to a Worldwide API
Algolia's Fury Road to a Worldwide APIAlgolia's Fury Road to a Worldwide API
Algolia's Fury Road to a Worldwide API
 
Elk meetup
Elk meetupElk meetup
Elk meetup
 
Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]Choosing the right messaging service for your serverless app [with lumigo]
Choosing the right messaging service for your serverless app [with lumigo]
 
How we use the play framework
How we use the play frameworkHow we use the play framework
How we use the play framework
 
CICD in the World of Serverless
CICD in the World of ServerlessCICD in the World of Serverless
CICD in the World of Serverless
 
Tracing Java Applications on Azure
Tracing Java Applications on AzureTracing Java Applications on Azure
Tracing Java Applications on Azure
 
FME Cloud Tips for Success
FME Cloud Tips for SuccessFME Cloud Tips for Success
FME Cloud Tips for Success
 
Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)
Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)
Михаил Максимов ( Software engineer, DataArt. AWS certified Solution Architect)
 
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
 

Semelhante a Nielsen Presents: Fun with Kafka, Spark and Offset Management

2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
asya999
 

Semelhante a Nielsen Presents: Fun with Kafka, Spark and Offset Management (20)

Learn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best PracticesLearn from HomeAway Hadoop Development and Operations Best Practices
Learn from HomeAway Hadoop Development and Operations Best Practices
 
Distributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola ScaleDistributed Kafka Architecture Taboola Scale
Distributed Kafka Architecture Taboola Scale
 
Stream Processing @ Lyft
Stream Processing @ LyftStream Processing @ Lyft
Stream Processing @ Lyft
 
Introduction to Kafka
Introduction to KafkaIntroduction to Kafka
Introduction to Kafka
 
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
(ARC310) Solving Amazon's Catalog Contention With Amazon Kinesis
 
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...
 
Apache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectosApache spark y cómo lo usamos en nuestros proyectos
Apache spark y cómo lo usamos en nuestros proyectos
 
High Performance Object Pascal Code on Servers (at EKON 22)
High Performance Object Pascal Code on Servers (at EKON 22)High Performance Object Pascal Code on Servers (at EKON 22)
High Performance Object Pascal Code on Servers (at EKON 22)
 
Moving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journeyMoving to microservices – a technology and organisation transformational journey
Moving to microservices – a technology and organisation transformational journey
 
KoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginners
KoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginnersKoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginners
KoprowskiT_SQLRelay2014#4_Caerdydd_MaintenancePlansForBeginners
 
Rubyslava + PyVo #48
Rubyslava + PyVo #48Rubyslava + PyVo #48
Rubyslava + PyVo #48
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
 
The Metamorphosis of Database Changes With Tim Steinbach | Current 2022
The Metamorphosis of Database Changes With Tim Steinbach | Current 2022The Metamorphosis of Database Changes With Tim Steinbach | Current 2022
The Metamorphosis of Database Changes With Tim Steinbach | Current 2022
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using Kafka
 
023 Quick Deploy GraphQL API With SST - NODES2022 AMERICAS Intermediate 9 - S...
023 Quick Deploy GraphQL API With SST - NODES2022 AMERICAS Intermediate 9 - S...023 Quick Deploy GraphQL API With SST - NODES2022 AMERICAS Intermediate 9 - S...
023 Quick Deploy GraphQL API With SST - NODES2022 AMERICAS Intermediate 9 - S...
 
Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.Modern MySQL Monitoring and Dashboards.
Modern MySQL Monitoring and Dashboards.
 
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
2013 CPM Conference, Nov 6th, NoSQL Capacity Planning
 
Inside Wordnik's Architecture
Inside Wordnik's ArchitectureInside Wordnik's Architecture
Inside Wordnik's Architecture
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
Building and Deploying Large Scale SSRS using Lessons Learned from Customer D...
 

Último

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
dharasingh5698
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 

Último (20)

Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 

Nielsen Presents: Fun with Kafka, Spark and Offset Management