SlideShare uma empresa Scribd logo
1 de 40
Baixar para ler offline
❏ Introduction
❏ First use case: Linkedin
❏ Architecture
❏ Main components
❏ Kafka protocol
❏ Kafka ecosystem
❏ Kafka Connect
❏ Schema Registry
❏ Installation/Configuration
❏ Datalab Use Cases
❏ Demo
Contents
Introduction
Maslow Hierarchy of human needs
Introduction
Data hierarchy of needs
Vision
mission
Products
Data Science
Data Infrastructure
Data Access
Introduction - Data Sources
Introduction - Data Ingestion
script to
read data
aggregation
script
aggregation
script
Tweet fetch
script
script to
read data
api rest
script
script to
read data
Introduction - Data Ingestion
script to
read data
aggregation
script
aggregation
script
Tweet fetch
script
script to
read data
api rest
script
script to
read data
Introduction - Data Ingestion
Take something in
Absorb something
Linkedin data pipeline problem
They had a lot of data
● User activity tracking
● Server logs and metrics
● Messaging
● Analytics
They Build products on data
● Newsfeed
● Recommendation
● Search
● Metrics and monitoring
Problem
How to integrate this variety of data
and make it available to all their
products?
Frontend Server
Metrics Server
Inter-process communication channel
Linkedin data pipeline problem
Many publisher using direct connections
Frontend Server
Frontend Server
Database Server
Chat Server
Metrics Analysis
Metrics UI Database MonitorActive Monitoring
Backend server
Linkedin data pipeline problem
Publish/subscribe system
Frontend Server
Frontend Server
Database Server
Chat Server
Metrics Analysis
Metrics UI Database MonitorActive Monitoring
Backend server
Metrics pub/sub
Linkedin data pipeline problem
Publish-Subscribe
Topic
Publisher Publisher Publisher
SubscriberSubscriberSubscriber
Pattern Protocol Technology
AMQP
MQTT
implements implements
Log Search
Multiple publish/subscribe systems
Frontend Server
Frontend Server
Database Server
Chat Server
Metrics Analysis
Metrics UI Database Monitor
Active Monitoring
Backend server
Metrics pub/sub
Log Search
Offline processing
Logging pub/sub Tracking pub/sub
Linkedin data pipeline problem
Log Search
Linkedin data pipeline problem
Custom infrastructure for the data pipeline
Frontend Server
Frontend Server
Database Server
Chat Server
Metrics Analysis
Metrics UI Database Monitor
Active Monitoring
Backend server
Metrics pub/sub
Log Search
Offline processing
Logging pub/sub Tracking pub/sub
Log Search
Frontend ServerFrontend Server
Database Server Chat Server
Metrics Analysis
Metrics UI Database Monitor
Backend server
Log Search
Offline processing
● Decouple data pipelines
● Provide persistence for
message data to allow
multiple consumers
● Optimize for high
throughput of messages
● Allow for horizontal scaling
of the system to grow as the
data stream grow
Linkedin data pipeline problem - Kafka Goals
Log Search
Kafka Architecture - Elements
Frontend ServerFrontend Server
Producer Producer
Metrics Analysis
Consumer Consumer
Producer
Log Search
Consumer
Kafka Cluster
Kafka → distributed, replicated commit log
Broker Partition X / Topic Y
commit log
Kafka Architecture - Broker
Producer
Consumer 1Consumer 2
Read at
offset
Read at
offset
Kafka Cluster
Kafka Architecture - Broker
Broker 1
Broker 2
Broker 3
Partition 0 / Topic B
Partition 0 / Topic C
Partition 0 / Topic A
Partition 0 / Topic B
Partition 1 / Topic A
distributed
replicated
Kafka Architecture - Producer/Consumer
Log Search
Frontend Server
Producer
Consumer
Kafka Cluster
Basic Concepts
● Latency
● Throughput
● Quality of service:
at most once, at least once, exactly once
Use Case Requirements
o Quality of service / Latency
o Throughput / Latency
Producer/Consumer Technology
o Ingestion Technologies
o Kafka Client API
o Kafka Connect
Kafka Architecture - Producer / Consumer
Kafka Architecture - Producer/Consumer API
Log Search
Frontend Server
Producer
Consumer
Kafka Cluster
Kafka Protocol
Kafka Producer
Kafka Protocol - Producer
Producer Record
Topic
[Partition]
[Key]
Value
Broker Partition 0 / Topic A
Producer.send (record)
exception/metadata
Productor Kafka
Topic / Partition Buffer Sender Thread
Producer Record
Topic
[Partition]
[Key]
Value
Serializer
Partitioner
Topic A / Partition 0
Batch 0
Batch 1
Batch 0 / Topic A /
Partition 0
Batch 0 / Topic B /
Partition 0
Batch 0 / Topic B /
Partition 1
Batch 1 / Topic B /
Partition 0
Retry
Fail
Yes
Yes
No
NoException
Metadata
Topic
Partition
X
Partition
Commit
Metadata
Topic Part. Offset
Send
Kafka Protocol - Producer
Broker
91 2 4 5 6 7 83
Consumer
Partition 0
Kafka Protocol - Consumer
● Subscribe (topic) & poll
● Reads topic-partition-offset
● Order is guaranteed only within a partition
● Data is kept only for a limited time (configurable)
● Numbers represents offsets not messages
● Deserialize data
Topic A
Broker Kafka Consumer Group
91 2 4 5 6 7 83
1 3 4 5 6 72
1 2 3 5 64
1 2 3 4 6 7 85
Consumer 2
Consumer 1
Consumer 0
Partition 0
Partition 1
Partition 2
Partition 3
Kafka Protocol - Consumer
Topic A
Topic Consumer
Group
Partition 0 Consumer 0
Partition 1
Partition 2
Partition 3
Consumer 1
Consumer 2
Consumer 3
Topic Consumer
Group
Partition 0 Consumer 0
Partition 1
Partition 2
Partition 3
Consumer 1
Topic Consumer
Group
Partition 0 Consumer 0
Partition 1 Consumer 1
Consumer 2
Consumer 3
Kafka Protocol - Consumer
Consumer Group
Kafka Protocol - Consumer
Producer
Producer
Broker
aemet
Producer
Producer
Server
aemet
google places 2
Exchange
google places
google places 1
Consumer
Consumer
Consumer Consumer
Consumer Group
Consumer
Consumer
Consumer Consumer
AMQP
Kafka
● Consumes topic
● High throughput scenario
● Consumes queue
● Low latency scenario
Kafka Ecosystem
Kafka Connect
Make it easy to add new systems to your
scalable and secure stream data pipelines
Kafka Connect is a framework included in
Apache Kafka that integrates Kafka with other
systems
KafkaSourceConnect
KafkaSinkConnect
Kafka Connect
Task
Conector
Config
Properties
Standalone Distributed
Worker
Kafka
Producer
Kafka
Consumer
Worker
TaskThread
Schema Registry
Task
stream
stream
stream
Worker
Conector
Source record
Source record
Source record
sendRecords()
Task
Worker
Schema
Registry
Converter
Producer recordProducer record
Producer record
schema
id
Converter
id
schema
Subject
topic
schema
version
Subject
topic
schema
version
Subject
topic
schema
version
Producer recordProducer record
Consumer record
pollConsumer()
Sink record
Sink record
Sink record
Connector
Installation
Configuration
✓ zookeeper.properties
✓ server.properties
✓ schema-registry.properties
✓ connect-distributed.properties
✓ connect-standalone.properties
✓ kafka-rest.properties
Zookeeper
Kafka Server
Kafka Connect
Schema
Registry
Kafka rest
Demo
Demo
Fuentes
Kafka
Producer
Single Broker - Single Instance
Kafka
Connect
Schema Registry

Mais conteúdo relacionado

Mais procurados

Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...HostedbyConfluent
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...confluent
 
Follow the (Kafka) Streams
Follow the (Kafka) StreamsFollow the (Kafka) Streams
Follow the (Kafka) Streamsconfluent
 
Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...confluent
 
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBMBeyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBMHostedbyConfluent
 
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...HostedbyConfluent
 
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, UberKafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, UberHostedbyConfluent
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...HostedbyConfluent
 
Distributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache PulsarDistributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache PulsarStreamlio
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsTimothy Spann
 
RedisConf18 - Implementing a New Data Structure for Redis
RedisConf18 - Implementing a New Data Structure for Redis  RedisConf18 - Implementing a New Data Structure for Redis
RedisConf18 - Implementing a New Data Structure for Redis Redis Labs
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...HostedbyConfluent
 
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...confluent
 
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...HostedbyConfluent
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEkawamuray
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology confluent
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant confluent
 
Introducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka StreamsIntroducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka Streamsconfluent
 

Mais procurados (20)

Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
 
Follow the (Kafka) Streams
Follow the (Kafka) StreamsFollow the (Kafka) Streams
Follow the (Kafka) Streams
 
Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...Kafka Streams: Revisiting the decisions of the past (How I could have made it...
Kafka Streams: Revisiting the decisions of the past (How I could have made it...
 
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBMBeyond the Brokers | Emma Humber and Andrew Borley, IBM
Beyond the Brokers | Emma Humber and Andrew Borley, IBM
 
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
Building Retry Architectures in Kafka with Compacted Topics | Matthew Zhou, V...
 
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, UberKafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
Kafka Tiered Storage | Satish Duggana and Sriharsha Chintalapani, Uber
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
Not Your Mother's Kafka - Deep Dive into Confluent Cloud Infrastructure | Gwe...
 
Distributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache PulsarDistributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache Pulsar
 
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
 
RedisConf18 - Implementing a New Data Structure for Redis
RedisConf18 - Implementing a New Data Structure for Redis  RedisConf18 - Implementing a New Data Structure for Redis
RedisConf18 - Implementing a New Data Structure for Redis
 
Data Pipeline with Kafka
Data Pipeline with KafkaData Pipeline with Kafka
Data Pipeline with Kafka
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
 
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
 
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
How Zillow Unlocked Kafka to 50 Teams in 8 months | Shahar Cizer Kobrinsky, Z...
 
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINEKafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
Kafka Multi-Tenancy - 160 Billion Daily Messages on One Shared Cluster at LINE
 
A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology A Practical Guide to Selecting a Stream Processing Technology
A Practical Guide to Selecting a Stream Processing Technology
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
 
Introducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka StreamsIntroducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka Streams
 

Semelhante a Introduction to Apache Kafka

Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaSteven Wu
 
Kafka Connect by Datio
Kafka Connect by DatioKafka Connect by Datio
Kafka Connect by DatioDatio Big Data
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMEconfluent
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar AppsTimothy Spann
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...Timothy Spann
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLEdunomica
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko GlobalLogic Ukraine
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafkaconfluent
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaTimothy Spann
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterHostedbyConfluent
 
PNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data AnalyticsPNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data AnalyticsJohn Evans
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022HostedbyConfluent
 
Tokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdfTokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdfssuser2ae721
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performanceconfluent
 

Semelhante a Introduction to Apache Kafka (20)

Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Kafka Connect by Datio
Kafka Connect by DatioKafka Connect by Datio
Kafka Connect by Datio
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
JConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with Java
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, TwitterTwitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
Twitter’s Apache Kafka Adoption Journey | Ming Liu, Twitter
 
PNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data AnalyticsPNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data Analytics
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
 
Tokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdfTokyo AK Meetup Speedtest - Share.pdf
Tokyo AK Meetup Speedtest - Share.pdf
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
 

Último

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...tanu pandey
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfrs7054576148
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 

Último (20)

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 

Introduction to Apache Kafka

  • 1.
  • 2. ❏ Introduction ❏ First use case: Linkedin ❏ Architecture ❏ Main components ❏ Kafka protocol ❏ Kafka ecosystem ❏ Kafka Connect ❏ Schema Registry ❏ Installation/Configuration ❏ Datalab Use Cases ❏ Demo Contents
  • 3.
  • 5. Introduction Data hierarchy of needs Vision mission Products Data Science Data Infrastructure Data Access
  • 7. Introduction - Data Ingestion script to read data aggregation script aggregation script Tweet fetch script script to read data api rest script script to read data
  • 8. Introduction - Data Ingestion script to read data aggregation script aggregation script Tweet fetch script script to read data api rest script script to read data
  • 9. Introduction - Data Ingestion Take something in Absorb something
  • 10.
  • 11. Linkedin data pipeline problem They had a lot of data ● User activity tracking ● Server logs and metrics ● Messaging ● Analytics They Build products on data ● Newsfeed ● Recommendation ● Search ● Metrics and monitoring Problem How to integrate this variety of data and make it available to all their products?
  • 12. Frontend Server Metrics Server Inter-process communication channel Linkedin data pipeline problem
  • 13. Many publisher using direct connections Frontend Server Frontend Server Database Server Chat Server Metrics Analysis Metrics UI Database MonitorActive Monitoring Backend server Linkedin data pipeline problem
  • 14. Publish/subscribe system Frontend Server Frontend Server Database Server Chat Server Metrics Analysis Metrics UI Database MonitorActive Monitoring Backend server Metrics pub/sub Linkedin data pipeline problem
  • 16. Log Search Multiple publish/subscribe systems Frontend Server Frontend Server Database Server Chat Server Metrics Analysis Metrics UI Database Monitor Active Monitoring Backend server Metrics pub/sub Log Search Offline processing Logging pub/sub Tracking pub/sub Linkedin data pipeline problem
  • 17. Log Search Linkedin data pipeline problem Custom infrastructure for the data pipeline Frontend Server Frontend Server Database Server Chat Server Metrics Analysis Metrics UI Database Monitor Active Monitoring Backend server Metrics pub/sub Log Search Offline processing Logging pub/sub Tracking pub/sub
  • 18. Log Search Frontend ServerFrontend Server Database Server Chat Server Metrics Analysis Metrics UI Database Monitor Backend server Log Search Offline processing ● Decouple data pipelines ● Provide persistence for message data to allow multiple consumers ● Optimize for high throughput of messages ● Allow for horizontal scaling of the system to grow as the data stream grow Linkedin data pipeline problem - Kafka Goals
  • 19.
  • 20. Log Search Kafka Architecture - Elements Frontend ServerFrontend Server Producer Producer Metrics Analysis Consumer Consumer Producer Log Search Consumer Kafka Cluster Kafka → distributed, replicated commit log Broker Partition X / Topic Y
  • 21. commit log Kafka Architecture - Broker Producer Consumer 1Consumer 2 Read at offset Read at offset
  • 22. Kafka Cluster Kafka Architecture - Broker Broker 1 Broker 2 Broker 3 Partition 0 / Topic B Partition 0 / Topic C Partition 0 / Topic A Partition 0 / Topic B Partition 1 / Topic A distributed replicated
  • 23. Kafka Architecture - Producer/Consumer Log Search Frontend Server Producer Consumer Kafka Cluster Basic Concepts ● Latency ● Throughput ● Quality of service: at most once, at least once, exactly once Use Case Requirements o Quality of service / Latency o Throughput / Latency Producer/Consumer Technology o Ingestion Technologies o Kafka Client API o Kafka Connect
  • 24. Kafka Architecture - Producer / Consumer
  • 25. Kafka Architecture - Producer/Consumer API Log Search Frontend Server Producer Consumer Kafka Cluster
  • 27. Kafka Producer Kafka Protocol - Producer Producer Record Topic [Partition] [Key] Value Broker Partition 0 / Topic A Producer.send (record) exception/metadata
  • 28. Productor Kafka Topic / Partition Buffer Sender Thread Producer Record Topic [Partition] [Key] Value Serializer Partitioner Topic A / Partition 0 Batch 0 Batch 1 Batch 0 / Topic A / Partition 0 Batch 0 / Topic B / Partition 0 Batch 0 / Topic B / Partition 1 Batch 1 / Topic B / Partition 0 Retry Fail Yes Yes No NoException Metadata Topic Partition X Partition Commit Metadata Topic Part. Offset Send Kafka Protocol - Producer
  • 29. Broker 91 2 4 5 6 7 83 Consumer Partition 0 Kafka Protocol - Consumer ● Subscribe (topic) & poll ● Reads topic-partition-offset ● Order is guaranteed only within a partition ● Data is kept only for a limited time (configurable) ● Numbers represents offsets not messages ● Deserialize data Topic A
  • 30. Broker Kafka Consumer Group 91 2 4 5 6 7 83 1 3 4 5 6 72 1 2 3 5 64 1 2 3 4 6 7 85 Consumer 2 Consumer 1 Consumer 0 Partition 0 Partition 1 Partition 2 Partition 3 Kafka Protocol - Consumer Topic A
  • 31. Topic Consumer Group Partition 0 Consumer 0 Partition 1 Partition 2 Partition 3 Consumer 1 Consumer 2 Consumer 3 Topic Consumer Group Partition 0 Consumer 0 Partition 1 Partition 2 Partition 3 Consumer 1 Topic Consumer Group Partition 0 Consumer 0 Partition 1 Consumer 1 Consumer 2 Consumer 3 Kafka Protocol - Consumer
  • 32. Consumer Group Kafka Protocol - Consumer Producer Producer Broker aemet Producer Producer Server aemet google places 2 Exchange google places google places 1 Consumer Consumer Consumer Consumer Consumer Group Consumer Consumer Consumer Consumer AMQP Kafka ● Consumes topic ● High throughput scenario ● Consumes queue ● Low latency scenario
  • 34. Kafka Connect Make it easy to add new systems to your scalable and secure stream data pipelines Kafka Connect is a framework included in Apache Kafka that integrates Kafka with other systems KafkaSourceConnect KafkaSinkConnect
  • 36. Schema Registry Task stream stream stream Worker Conector Source record Source record Source record sendRecords() Task Worker Schema Registry Converter Producer recordProducer record Producer record schema id Converter id schema Subject topic schema version Subject topic schema version Subject topic schema version Producer recordProducer record Consumer record pollConsumer() Sink record Sink record Sink record Connector
  • 38. Configuration ✓ zookeeper.properties ✓ server.properties ✓ schema-registry.properties ✓ connect-distributed.properties ✓ connect-standalone.properties ✓ kafka-rest.properties Zookeeper Kafka Server Kafka Connect Schema Registry Kafka rest
  • 39. Demo
  • 40. Demo Fuentes Kafka Producer Single Broker - Single Instance Kafka Connect Schema Registry