SlideShare uma empresa Scribd logo
1 de 44
1
Streaming in Practice
Putting Apache Kafka in Production
Roger Hoover, Engineer, Confluent
2
Apache Kafka: Online Talk Series
Part 1: September 27 Part 2: October 6 Part 3: October 27
Part 4: November 17 Part 6: December 15Part 5: December 1
Introduction To Streaming
Data and Stream
Processing with Apache
Kafka
Deep Dive into
Apache Kafka
Demystifying
Stream Processing with
Apache Kafka
Data Integration with
Apache Kafka
A Practical Guide to
Selecting a Stream
Processing Technology
https://www.confluent.io/apache-kafka-talk-series/
3
Agenda
• Kafka Basics
• Tuning Kafka For Your Application
• Data Balancing
• Spanning Multiple Datacenters
4
Agenda
• Kafka Basics
• Tuning Kafka For Your Application
• Data Balancing
• Spanning Multiple Datacenters
5
6
Architecture
Kafka cluster
broker 1
…
producer producer producer
consum
er
consum
er
broker 2 broker n topic partition
server 1
server 2
server 3
ZooKeeper
cluster
7
Operations
• Simple Deployment
• Rolling Upgrades
• Good metrics for component monitoring
8
Agenda
• Kafka Basics
• Tuning Kafka For Your Application
• Data Balancing
• Spanning Multiple Datacenters
9
Two Example Apps
• User activity tracking
• Collect page view events while users are browsing
our web and mobile storefronts
• Persist the data to HDFS for subsequent use in
recommendation engine
• Inventory adjustments
• Track sales, maintain inventory, and re-order on-
demand
10
Application Priorities
• User activity tracking
• High throughput (100x the sales stream)
• Availability is most important
• Low retention required - 3 days
• Inventory adjustments
• Relatively low throughput
• Durability is most important
• Long retention required – 6 months
11
Knobs
- Partition count
- Replication factor
- Retention
- Batching + compression
- Producer send acknowledgements
- Minimum ISRs
- Unclean Leader Election
12
Partition Count
- Partitions are the unit of consumer parallelism
- Over-partition your topics (especially keyed topics)
- Easy to add consumers but hard to add partitions for keyed topics
- Kafka can support ~10s k partitions
13
Partition Count
- High Throughput (User activity tracking)
- Large number of partitions (~100)
- Fewer Resources (Inventory adjustments)
- Smaller number of partitions (< 50)
14
Replication Factor
- More replicas require more storage, disk I/O, and network bandwidth
- More replicas can tolerate more failures
topic1-part1
logs
broker 1
topic1-part2
logs
broker 2
topic2-part2
topic2-part1
logs
broker 3
topic1-part1
logs
broker 4
topic1-part2
topic2-part2 topic1-part1 topic1-part2
topic2-part1
topic2-part2
topic2-part1
15
Replication Factor
- Lower cost (User activity tracking)
- replication.factor = 2
- High Fault Tolerance (Inventory adjustments)
- replication.factor = 3
- Defaults to 1
16
Retention
- Retention time can be set per topic
- Longer retention times require more storage (imagine that!)
- Longer retention allows consumers to rewind further back in time
- Part of the consumer’s SLA!
17
Retention
- Less Storage (User activity tracking)
- log.retention.hours=72 (3 days)
- Longer Time Travel (Inventory adjustments)
- log.retention.hours=4380 (6 months)
- Default is 7 days
18
Side-note: Time Travel
- Kafka 0.10.1 supports rewinding by time
- E.g. “Rewind to 10 minutes ago”
19
Batching & Compression
- Producer: batch.size, linger.ms, compression.type
- Consumer: fetch.min.bytes, fetch.wait.max.ms
compressed
batch 1send()
send()
send()
send()
producer
async
flush
poll()compressed
batch 2
compressed
batch 3
compressed
batch 1
compressed
batch 2
compressed
batch 3
consumerbroker
20
Batching & Compression
- High throughput (User activity tracking)
- Producer: compression.type=lz4, batch.size (256KB), linger.ms (~10ms) or flush manually
- Consumer: fetch.min.bytes (256KB), fetch.wait.max.ms (~10ms)
- Low latency (Inventory adjustments)
- Producer: linger.ms=0
- Consumer: fetch.min.bytes=1
- Defaults
- compression.type = none
- linger.ms = 0 (i.e. send immediately)
- fetch.min.bytes = 1 (i.e. receive immediately)
21
Producer Acknowledgements on Send
broker 1
producer
leader
broker 2
follower
broker 3
follower
4
2
2
3
commit
ack
When producer receives ack Latency Durability on failures
acks=0 (no ack) no network delay some data loss
acks=1 (wait for leader) 1 network roundtrip a few data loss
acks=all (wait for committed) 2 network roundtrips no data loss
topic1-part1 topic1-part1 topic1-part1
consumer
1
22
Producer Acknowledgements on Send
- Throughput++ (User activity tracking)
- acks = 1
- Durability++ (Inventory adjustments)
- acks = all
- Default
- acks = 1
23
In-Sync Replicas (ISRs)
broker 1
producer
leader
broker 2
follower
broker 3
follower
2
2
topic1-part1 topic1-part1 topic1-part1
1
m1 m1 m1
m2 m2 m2
ISR
last
committed
m2, m1
In-sync : replica reads from leader’s
log end within
replica.lag.time.max.ms
24
Minimum In-Sync Replicas
broker 1
producer
leader
broker 2
follower
broker 3
topic1-part1 topic1-part1 topic1-part1
m1 m1 m1
m2 m2 m2
ISR
m3
m4last
committed
m5 follower
- Topic config to tell Kafka how to handle writes during severe outages (rare)
- Leader will reject writes if the ISR count is too small
topic1: min.insync.replicas=2
25
Minimum In-Sync Replicas
- Availability++ (User activity tracking)
- min.insync.replicas = 1
- Durability++ (Inventory adjustments)
- min.insync.replicas = 2
- Defaults to 1
26
Unclean Leader Election
- Topic config to tell Kafka how to handle topic leadership during severe outages
(rare)
- Allows automatic recovery in exchange for losing data
m5
broker 1
producer
leader ???
broker 2
leader
broker 3
2
topic1-part1 topic1-part1 topic1-part1
1
m1 m1 m1
m2 m2 m2
ISR
m3 m3
m4 m4last
committed
m3
follower
m4
m5
27
Unclean Leader Election
- Availability++ (User activity tracking)
- unclean.leader.election.enable = true
- Durability++ (Inventory adjustments)
- unclean.leader.election.enable = false
- Defaults to true
28
Mission Critical Data
- Producer acknowledgments
- acks=all
- Replication factor
- replication.factor = 3
- Minimum ISRs
- min.insync.replicas = 2
- Unclean Leader Election
- unclean.leader.election.enable = false
29
Agenda
• Kafka Basics
• Tuning Kafka For Your Application
• Data Balancing
• Spanning Multiple Datacenters
30
Replica Placement
• Partitions are replicated
• Replicas are spread evenly across the cluster
• Only when the topic is created or modified
topic1-part1
logs
broker 1
topic1-part2
logs
broker 2
topic2-part2
topic2-part1
logs
broker 3
topic1-part1
logs
broker 4
topic1-part2
topic2-part2 topic1-part1 topic1-part2
topic2-part1
topic2-part2
topic2-part1
31
Replica Placement
• Over time broker load and storage become unbalanced
• Initial replica placement does not account for topic throughput or retention
• Adding or removing brokers
topic1-part1
broker 1
topic1-part2
broker 2
topic2-part2
topic2-part1
broker 3
topic1-part1
broker 4
topic1-part2
topic2-part2
topic1-part1
topic1-part2
topic2-part1
topic2-part2
topic2-part1
broker 5
32
Replica Reassignment
• Create plan to rebalance replicas
• Upload new assignment to the cluster
• Kafka migrates replicas without disruption
topic1-part1
broker 1
topic1-part2
broker 2
topic2-part2
topic2-part1
broker 3
topic1-part1
broker 4
topic1-part2
topic1-part1
topic1-part2
topic2-part1
topic2-part2
broker 5
topic2-part1
topic2-part2
topic1-part1
broker 1
topic1-part2
broker 2
topic2-part2
topic2-part1
broker 3
topic1-part1
broker 4
topic1-part2
topic2-part2
topic1-part1
topic1-part2
topic2-part1
topic2-part2
topic2-part1
broker 5
Before
After
33
Data Balancing: Tricky Parts
• Creating a good plan
• Balance broker disk space
• Balance broker load
• Minimize data movement
• Preserve rack placement
• Movement of replicas can overload I/O and bandwidth resources
• Use replication quota feature in 0.10.1
34
Data Balancing: Solutions
• DIY
• kafka-reassign-partitions.sh script in Apache Kafka
• Confluent Enterprise Auto Data Balancing
• Optimizes storage utilization
• Rack awareness and minimal data movement
• Leverages replication quotas during rebalance
35
Agenda
• Kafka Basics
• Tuning Kafka For Your Application
• Data Balancing
• Spanning Multiple Datacenters
36
Use cases
• Disaster Recovery
• Replicate data out to geo-localized data centers
• Aggregate data from other data centers for analysis
• Part of hybrid cloud or cloud migration strategy
37
Multi-DC: Two Approaches
• Stretched cluster
• Mirroring across clusters
38
Stretched Cluster
• Low-latency links between 3 DCs. Typically AZs in a single AWS region.
• Applications in all 3 DCs share the same cluster and handle failures automatically.
• Relies on intra-cluster replication to copy data across DCs (replication.factor >= 3)
• Use rack awareness in Kafka 0.10; manual partition placement otherwise
Kafka
producers
consumer
s
AZ 1 AZ 3AZ 2
producersproducers
consumer
s
consumer
s
AWS
Region
39
Mirroring Across Clusters
• Separate Kafka clusters in each DC. Mirroring process copies data between them.
• Several variations of this pattern. Some require manual intervention on failover and recovery.
40
How to Mirror Across Clusters
• MirrorMaker tool in Apache Kafka
• Manual topic creation
• Manual sync of topic configuration
• Confluent Enterprise Multi-DC
• Dynamic topic creation at the destination
• Automatic sync for topic configurations (including access controls)
• Can be configured and managed from the Control Center UI
• Leverages Connect API
41
More Information: Tuning Tradeoffs
• Apache Kafka and Confluent Documentation
• When it Absolutely, Positively, Has to be There: Reliability Guarantees in Kafka
• Gwen Shapira and Jeff Holoman - https://www.confluent.io/kafka-summit-2016-ops-when-it-absolutely-
positively-has-to-be-there/
• Chapter 6: Reliability Guarantees
• Neha Narkhede, Gwen Shapira, Todd Palino – Kafka: The Definitive Guide
• Confluent Operations Training
42
More Information: Multi-DC
• Building Large Scale Stream Infrastructures Across Multiple Data Centers with Apache Kafka –
Jun Rao
• Video: https://www.youtube.com/watch?v=XcvHmqmh16g
• Slides: http://www.slideshare.net/HadoopSummit/building-largescale-stream-infrastructures-
across-multiple-data-centers-with-apache-kafka
• Confluent Enterprise Multi-DC - https://www.confluent.io/product/multi-datacenter/
43
More Information: Metadata Management
• Yes, Virginia, You Really Do Need a Schema Registry
• Gwen Shapira - https://www.confluent.io/blog/schema-registry-kafka-stream-
processing-yes-virginia-you-really-need-one/
44
Thank you!
www.kafka-summit.org May 8, 2017
New York City
Hilton Midtown
August 28, 2017
San Francisco
Hilton
Union Square

Mais conteúdo relacionado

Mais procurados

Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer confluent
 
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...HostedbyConfluent
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structuresconfluent
 
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...HostedbyConfluent
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafkaconfluent
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsLightbend
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBrian Ritchie
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructuremattlieber
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020confluent
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...HostedbyConfluent
 
Top Ten Kafka® Configs
Top Ten Kafka® ConfigsTop Ten Kafka® Configs
Top Ten Kafka® Configsconfluent
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin PodvalMartin Podval
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-CamusDeep Shah
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
 
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...HostedbyConfluent
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams SafeHow to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams Safeconfluent
 

Mais procurados (20)

Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
 
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
 
Power of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data StructuresPower of the Log: LSM & Append Only Data Structures
Power of the Log: LSM & Append Only Data Structures
 
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
 
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache KafkaStrata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
Strata+Hadoop 2017 San Jose: Lessons from a year of supporting Apache Kafka
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
 
Building Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache KafkaBuilding Event-Driven Systems with Apache Kafka
Building Event-Driven Systems with Apache Kafka
 
Architecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructureArchitecture of a Kafka camus infrastructure
Architecture of a Kafka camus infrastructure
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
Kafka aws
Kafka awsKafka aws
Kafka aws
 
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
Real-time Data Ingestion from Kafka to ClickHouse with Deterministic Re-tries...
 
Top Ten Kafka® Configs
Top Ten Kafka® ConfigsTop Ten Kafka® Configs
Top Ten Kafka® Configs
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...
 
Copy of Kafka-Camus
Copy of Kafka-CamusCopy of Kafka-Camus
Copy of Kafka-Camus
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
Everything you ever needed to know about Kafka on Kubernetes but were afraid ...
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams SafeHow to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams Safe
 

Semelhante a Streaming in Practice - Putting Apache Kafka in Production

Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexApache Apex
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEkawamuray
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly SolarWinds Loggly
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafkaSamuel Kerrien
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?Jagadish Venkatraman
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingOh Chan Kwon
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon_Org Team
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasMonal Daxini
 
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarUnifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarKarthik Ramasamy
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedEdureka!
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising TechApache Apex
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker seriesMonal Daxini
 
Training Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringTraining Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringContinuent
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...confluent
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 

Semelhante a Streaming in Practice - Putting Apache Kafka in Production (20)

Ingestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache ApexIngestion and Dimensions Compute and Enrich using Apache Apex
Ingestion and Dimensions Compute and Enrich using Apache Apex
 
Multitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINEMultitenancy: Kafka clusters for everyone at LINE
Multitenancy: Kafka clusters for everyone at LINE
 
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
AWS re:Invent presentation: Unmeltable Infrastructure at Scale by Loggly
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?
 
Spark cep
Spark cepSpark cep
Spark cep
 
Extending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event ProcessingExtending Spark Streaming to Support Complex Event Processing
Extending Spark Streaming to Support Complex Event Processing
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache PulsarUnifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
Unifying Messaging, Queueing & Light Weight Compute Using Apache Pulsar
 
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics RedefinedApache Kafka with Spark Streaming: Real-time Analytics Redefined
Apache Kafka with Spark Streaming: Real-time Analytics Redefined
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising Tech
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
Flink at netflix paypal speaker series
Flink at netflix   paypal speaker seriesFlink at netflix   paypal speaker series
Flink at netflix paypal speaker series
 
Training Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringTraining Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten Clustering
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 

Mais de confluent

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 

Mais de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Último

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 

Último (20)

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 

Streaming in Practice - Putting Apache Kafka in Production

  • 1. 1 Streaming in Practice Putting Apache Kafka in Production Roger Hoover, Engineer, Confluent
  • 2. 2 Apache Kafka: Online Talk Series Part 1: September 27 Part 2: October 6 Part 3: October 27 Part 4: November 17 Part 6: December 15Part 5: December 1 Introduction To Streaming Data and Stream Processing with Apache Kafka Deep Dive into Apache Kafka Demystifying Stream Processing with Apache Kafka Data Integration with Apache Kafka A Practical Guide to Selecting a Stream Processing Technology https://www.confluent.io/apache-kafka-talk-series/
  • 3. 3 Agenda • Kafka Basics • Tuning Kafka For Your Application • Data Balancing • Spanning Multiple Datacenters
  • 4. 4 Agenda • Kafka Basics • Tuning Kafka For Your Application • Data Balancing • Spanning Multiple Datacenters
  • 5. 5
  • 6. 6 Architecture Kafka cluster broker 1 … producer producer producer consum er consum er broker 2 broker n topic partition server 1 server 2 server 3 ZooKeeper cluster
  • 7. 7 Operations • Simple Deployment • Rolling Upgrades • Good metrics for component monitoring
  • 8. 8 Agenda • Kafka Basics • Tuning Kafka For Your Application • Data Balancing • Spanning Multiple Datacenters
  • 9. 9 Two Example Apps • User activity tracking • Collect page view events while users are browsing our web and mobile storefronts • Persist the data to HDFS for subsequent use in recommendation engine • Inventory adjustments • Track sales, maintain inventory, and re-order on- demand
  • 10. 10 Application Priorities • User activity tracking • High throughput (100x the sales stream) • Availability is most important • Low retention required - 3 days • Inventory adjustments • Relatively low throughput • Durability is most important • Long retention required – 6 months
  • 11. 11 Knobs - Partition count - Replication factor - Retention - Batching + compression - Producer send acknowledgements - Minimum ISRs - Unclean Leader Election
  • 12. 12 Partition Count - Partitions are the unit of consumer parallelism - Over-partition your topics (especially keyed topics) - Easy to add consumers but hard to add partitions for keyed topics - Kafka can support ~10s k partitions
  • 13. 13 Partition Count - High Throughput (User activity tracking) - Large number of partitions (~100) - Fewer Resources (Inventory adjustments) - Smaller number of partitions (< 50)
  • 14. 14 Replication Factor - More replicas require more storage, disk I/O, and network bandwidth - More replicas can tolerate more failures topic1-part1 logs broker 1 topic1-part2 logs broker 2 topic2-part2 topic2-part1 logs broker 3 topic1-part1 logs broker 4 topic1-part2 topic2-part2 topic1-part1 topic1-part2 topic2-part1 topic2-part2 topic2-part1
  • 15. 15 Replication Factor - Lower cost (User activity tracking) - replication.factor = 2 - High Fault Tolerance (Inventory adjustments) - replication.factor = 3 - Defaults to 1
  • 16. 16 Retention - Retention time can be set per topic - Longer retention times require more storage (imagine that!) - Longer retention allows consumers to rewind further back in time - Part of the consumer’s SLA!
  • 17. 17 Retention - Less Storage (User activity tracking) - log.retention.hours=72 (3 days) - Longer Time Travel (Inventory adjustments) - log.retention.hours=4380 (6 months) - Default is 7 days
  • 18. 18 Side-note: Time Travel - Kafka 0.10.1 supports rewinding by time - E.g. “Rewind to 10 minutes ago”
  • 19. 19 Batching & Compression - Producer: batch.size, linger.ms, compression.type - Consumer: fetch.min.bytes, fetch.wait.max.ms compressed batch 1send() send() send() send() producer async flush poll()compressed batch 2 compressed batch 3 compressed batch 1 compressed batch 2 compressed batch 3 consumerbroker
  • 20. 20 Batching & Compression - High throughput (User activity tracking) - Producer: compression.type=lz4, batch.size (256KB), linger.ms (~10ms) or flush manually - Consumer: fetch.min.bytes (256KB), fetch.wait.max.ms (~10ms) - Low latency (Inventory adjustments) - Producer: linger.ms=0 - Consumer: fetch.min.bytes=1 - Defaults - compression.type = none - linger.ms = 0 (i.e. send immediately) - fetch.min.bytes = 1 (i.e. receive immediately)
  • 21. 21 Producer Acknowledgements on Send broker 1 producer leader broker 2 follower broker 3 follower 4 2 2 3 commit ack When producer receives ack Latency Durability on failures acks=0 (no ack) no network delay some data loss acks=1 (wait for leader) 1 network roundtrip a few data loss acks=all (wait for committed) 2 network roundtrips no data loss topic1-part1 topic1-part1 topic1-part1 consumer 1
  • 22. 22 Producer Acknowledgements on Send - Throughput++ (User activity tracking) - acks = 1 - Durability++ (Inventory adjustments) - acks = all - Default - acks = 1
  • 23. 23 In-Sync Replicas (ISRs) broker 1 producer leader broker 2 follower broker 3 follower 2 2 topic1-part1 topic1-part1 topic1-part1 1 m1 m1 m1 m2 m2 m2 ISR last committed m2, m1 In-sync : replica reads from leader’s log end within replica.lag.time.max.ms
  • 24. 24 Minimum In-Sync Replicas broker 1 producer leader broker 2 follower broker 3 topic1-part1 topic1-part1 topic1-part1 m1 m1 m1 m2 m2 m2 ISR m3 m4last committed m5 follower - Topic config to tell Kafka how to handle writes during severe outages (rare) - Leader will reject writes if the ISR count is too small topic1: min.insync.replicas=2
  • 25. 25 Minimum In-Sync Replicas - Availability++ (User activity tracking) - min.insync.replicas = 1 - Durability++ (Inventory adjustments) - min.insync.replicas = 2 - Defaults to 1
  • 26. 26 Unclean Leader Election - Topic config to tell Kafka how to handle topic leadership during severe outages (rare) - Allows automatic recovery in exchange for losing data m5 broker 1 producer leader ??? broker 2 leader broker 3 2 topic1-part1 topic1-part1 topic1-part1 1 m1 m1 m1 m2 m2 m2 ISR m3 m3 m4 m4last committed m3 follower m4 m5
  • 27. 27 Unclean Leader Election - Availability++ (User activity tracking) - unclean.leader.election.enable = true - Durability++ (Inventory adjustments) - unclean.leader.election.enable = false - Defaults to true
  • 28. 28 Mission Critical Data - Producer acknowledgments - acks=all - Replication factor - replication.factor = 3 - Minimum ISRs - min.insync.replicas = 2 - Unclean Leader Election - unclean.leader.election.enable = false
  • 29. 29 Agenda • Kafka Basics • Tuning Kafka For Your Application • Data Balancing • Spanning Multiple Datacenters
  • 30. 30 Replica Placement • Partitions are replicated • Replicas are spread evenly across the cluster • Only when the topic is created or modified topic1-part1 logs broker 1 topic1-part2 logs broker 2 topic2-part2 topic2-part1 logs broker 3 topic1-part1 logs broker 4 topic1-part2 topic2-part2 topic1-part1 topic1-part2 topic2-part1 topic2-part2 topic2-part1
  • 31. 31 Replica Placement • Over time broker load and storage become unbalanced • Initial replica placement does not account for topic throughput or retention • Adding or removing brokers topic1-part1 broker 1 topic1-part2 broker 2 topic2-part2 topic2-part1 broker 3 topic1-part1 broker 4 topic1-part2 topic2-part2 topic1-part1 topic1-part2 topic2-part1 topic2-part2 topic2-part1 broker 5
  • 32. 32 Replica Reassignment • Create plan to rebalance replicas • Upload new assignment to the cluster • Kafka migrates replicas without disruption topic1-part1 broker 1 topic1-part2 broker 2 topic2-part2 topic2-part1 broker 3 topic1-part1 broker 4 topic1-part2 topic1-part1 topic1-part2 topic2-part1 topic2-part2 broker 5 topic2-part1 topic2-part2 topic1-part1 broker 1 topic1-part2 broker 2 topic2-part2 topic2-part1 broker 3 topic1-part1 broker 4 topic1-part2 topic2-part2 topic1-part1 topic1-part2 topic2-part1 topic2-part2 topic2-part1 broker 5 Before After
  • 33. 33 Data Balancing: Tricky Parts • Creating a good plan • Balance broker disk space • Balance broker load • Minimize data movement • Preserve rack placement • Movement of replicas can overload I/O and bandwidth resources • Use replication quota feature in 0.10.1
  • 34. 34 Data Balancing: Solutions • DIY • kafka-reassign-partitions.sh script in Apache Kafka • Confluent Enterprise Auto Data Balancing • Optimizes storage utilization • Rack awareness and minimal data movement • Leverages replication quotas during rebalance
  • 35. 35 Agenda • Kafka Basics • Tuning Kafka For Your Application • Data Balancing • Spanning Multiple Datacenters
  • 36. 36 Use cases • Disaster Recovery • Replicate data out to geo-localized data centers • Aggregate data from other data centers for analysis • Part of hybrid cloud or cloud migration strategy
  • 37. 37 Multi-DC: Two Approaches • Stretched cluster • Mirroring across clusters
  • 38. 38 Stretched Cluster • Low-latency links between 3 DCs. Typically AZs in a single AWS region. • Applications in all 3 DCs share the same cluster and handle failures automatically. • Relies on intra-cluster replication to copy data across DCs (replication.factor >= 3) • Use rack awareness in Kafka 0.10; manual partition placement otherwise Kafka producers consumer s AZ 1 AZ 3AZ 2 producersproducers consumer s consumer s AWS Region
  • 39. 39 Mirroring Across Clusters • Separate Kafka clusters in each DC. Mirroring process copies data between them. • Several variations of this pattern. Some require manual intervention on failover and recovery.
  • 40. 40 How to Mirror Across Clusters • MirrorMaker tool in Apache Kafka • Manual topic creation • Manual sync of topic configuration • Confluent Enterprise Multi-DC • Dynamic topic creation at the destination • Automatic sync for topic configurations (including access controls) • Can be configured and managed from the Control Center UI • Leverages Connect API
  • 41. 41 More Information: Tuning Tradeoffs • Apache Kafka and Confluent Documentation • When it Absolutely, Positively, Has to be There: Reliability Guarantees in Kafka • Gwen Shapira and Jeff Holoman - https://www.confluent.io/kafka-summit-2016-ops-when-it-absolutely- positively-has-to-be-there/ • Chapter 6: Reliability Guarantees • Neha Narkhede, Gwen Shapira, Todd Palino – Kafka: The Definitive Guide • Confluent Operations Training
  • 42. 42 More Information: Multi-DC • Building Large Scale Stream Infrastructures Across Multiple Data Centers with Apache Kafka – Jun Rao • Video: https://www.youtube.com/watch?v=XcvHmqmh16g • Slides: http://www.slideshare.net/HadoopSummit/building-largescale-stream-infrastructures- across-multiple-data-centers-with-apache-kafka • Confluent Enterprise Multi-DC - https://www.confluent.io/product/multi-datacenter/
  • 43. 43 More Information: Metadata Management • Yes, Virginia, You Really Do Need a Schema Registry • Gwen Shapira - https://www.confluent.io/blog/schema-registry-kafka-stream- processing-yes-virginia-you-really-need-one/
  • 44. 44 Thank you! www.kafka-summit.org May 8, 2017 New York City Hilton Midtown August 28, 2017 San Francisco Hilton Union Square

Notas do Editor

  1. I’m an engineer at Confluent. In a previous job, I’ve taken Kafka from proof of concept all the way to production with some pipelines handling more than 5B events per day. My goal is to share what I think are the most important things to know when taking Kafka to production. This is the last talk in a series of 6. The previous talks cover components of the Kafka ecosystem and stream processing in general. This talk is about taking Kafka to production. In general, I think Kafka is pretty easy to operate and has great documentation compared to other technologies I’ve worked with. Since we cannot cover everything, I want to focus on the important concepts and hopefully give you enough insight that you what to plan for and where to find more information. Patterns for integrating with existing data systems and applications – covered in a previous talk Metadata management at enterprise scale – I’ll include a link at the end to a great blog post by Gwen Shapira
  2. - Review the basics Talk about tuning Kafka – what tradeoffs you can make Data balancing Spanning Multiple Datacenters
  3. First, I want to review a few basics of Kafka to make sure we have enough context for the rest of the talk. This is be a quick review if you’ve seen other talks in the series. If not, that’s ok. You should be able to follow along.
  4. Kafka is a streaming platform. A streaming platform can be THE common point of data integration across an organization. It allows the teams/systems within the organization share data in realtime and react as fast as necessary. It allows teams to work together without tight coupling of their services. Kafka has some key characteristics that make it well-suited to being a streaming platform: First it scales well and cheaply. Very efficient. You can do hundreds of MB/sec of writes per server and can have many servers. Kafka doesn’t get slower as you store more data in it --- this is a huge win if you’ve operated other data systems Distributed by design – replication, fault tolerance, partitioning, elastic scaling Strong guarantees around ordering and durability Has some unique features such as compacted topics that lets it handle some unique cases. Has enterprise features like fine-grain security controls
  5. Brokers hold data Topics are logical streams that are broken into partitions Partitions are the unit of parallelism for consumers Topic-partitions are replicated and spread over multiple brokers Producers and consumers read from brokers Kafka relies on ZooKeeper for it’s own internal cluster management
  6. Deployment is pretty easy. There are only a few compoents. They run as JVM processes. No downtime
  7. In distributed systems, there are tradeoffs. The goal of this section is to highlight the tradeoffs you can make to tune Kafka to match with your application's priorities. Getting the most out of Kafka.
  8. You can imagine that we have a demand model to - match supply with demand while - keeping our inventory as low as possible
  9. What knobs do we have in Kafka to match these priorities?
  10. We’re going to look at each of these and how to apply them to our example applications
  11. Resources vs. Throughput
  12. F replicas can tolerate f-1 failures Topic1 has 3 replicas in this example spread over different brokers Cost vs. Availability
  13. F replicas can tolerate f-1 failures
  14. Storage vs. Time Travel
  15. Storage vs. Time Travel
  16. Latency vs. Throughput My experiments showed 4x compression ratio with lz4 even with Avro data
  17. Latency vs. Throughput Compression works on compacted topics now too
  18. Latency vs. Durability
  19. Latency vs. Durability
  20. Before we get to the next knob, we need to review the idea of In-Sync Replicas.
  21. Availability vs. Durability
  22. Availability vs. Durability
  23. Availability vs. Durability This should be very rare but in a severe outage situation, some applications prefer automatic recover even if data is lost
  24. Availability vs. Durability
  25. This is a good place to start and adjust down if you find you need to optimize further With batching and compression, you should be able to get very good throughput and safety
  26. Now let’s assume that you’ve got a cluster up and running. You’ve setup good component-level monitoring so you tell that ZooKeeper and Kafka are healthy. You’ve tuned it for your application priorities. Kafka handles failures very smoothly and requires little attention. I’ve run it in production at a previous job and had a broker die without any interruption to the application (handling > 4B events/day). However, there is some maintenance that you have to do and it’s around data balancing so I think it’s important to understand this and plan for it.
  27. In the example Broker 2 is under-utilized and Broker 5 is not being utilized at all
  28. We’ve heard from many customers that this is a pain point - Took us 2 weeks
  29. We’ve talked about running Kafka reliably in a single data center. Another important consideration for putting Kafka in production is how to handle multiple datacenters. The topic is too deep to cover in detail so the goal of this section is to give an introduction and motivate you too watch an excellent talk on this subject by Confluent Co-Founder Jun Rao.
  30. Simplest setup for failure handling but does not work across regions
  31. There are a number of variations to this and some of them require manual intervention on failure recovery. Details in Jun’s talk. Please watch it. Over time, you’ll probably need to support multiple replication patterns to match different use cases. This picture shows an example of 1) aggregating data from other DCs for analytics and 2) cross-replicating between DCs so they can both see each others data.
  32. The goal of this section is to highlight the tradeoffs you can make to align Kafka with your application's priorities