SlideShare uma empresa Scribd logo
1 de 52
Baixar para ler offline
1
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Kafka Chaos Engineering
Shlomi Hassan & Yaniv Ranen
2
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
It’s a lovely day, let’s
upgrade the
production cluster
3
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 3
We didn’t even get a
chance to say
goodbye
4
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Where did we go wrong?
5
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Chaos Engineering is the discipline of experimenting on a system in
order to build confidence in the system’s capability to withstand
turbulent conditions in production.
Chaos engineering
6
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Hello!
Yaniv Ranen
Shlomi Hassan
7
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 77
ZipRecruiter’s mission
We actively connect
people to their next
great opportunity.
8
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Logging infrastructure scheme
9
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Logging Kafka cluster
Cluster spec
1) Kafka
a) 8 EC2 - m4.4xlarge
b) EBS - Io1
2) Zookeeper
a) 5 EC2 - m4.large
b) EBS - Io1
3) Cluster volume - 3TB/day
4) Broker data spread on
Multi AZ
KafkaZookeeper
10
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Basic Kafka terminology
11
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Basic Kafka terminology
P0
R1
P1
R1
broker1
P2
R1
P3
R1
P0
R2
P1
R2
broker2
P2
R2
P3
R2
broker3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry Replica y of partition x
Leader Replica is in bold
Active controller is in bold
12
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
13
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Chaos engineering method
+ Ask a question
+ Write it down as a scenario in a
collaborative document
+ Check the behaviour in a
controlled environment
+ Document the exact commands
and output
14
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 1414
Scenario #1
+ What happens if we kill one broker?
+ Will Kafka self-heal?
15
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #1: Stopping a broker
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
P0
R2
P1
R2
Broker 2
P2
R2
P3
R2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
Cluster health
0Under replicated
16
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #1: Stopping a broker
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
P0
R2
P1
R2
Broker 2
P2
R2
P3
R2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
Cluster health
54 Under replicated
17
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #1: Stopping a broker
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
P0
R2
P1
R2
Broker 2
P2
R2
P3
R2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
18
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #1: Stopping a broker
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
P0
R2
P1
R2
Broker 2
P2
R2
P3
R2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
19
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #1: Stopping a broker
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
P0
R2
P1
R2
Broker 2
P2
R2
P3
R2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
Cluster health
0Under replicatedReassign partitions
20
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 2020
Demonstration
21
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 2121
Scenario #1: Stopping a broker
+ Kafka is not self-healing
+ Manual reassign partitions is needed
22
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #2
+ What can we do to revive an
offline partition?
23
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #2: Reviving an offline partition
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
Cluster health
0offline partitions
24
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #2: Reviving an offline partition
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
Cluster health
4offline partitions
25
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #2: Reviving an offline partition
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
Cluster health
4offline partitions
Reassign partitions
26
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #2: Reviving an offline partition
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
Cluster health
4offline partitions
Reassign partitions
27
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #2: Reviving an offline partition
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
Cluster health
4offline partitions
28
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #2: Reviving an offline partition
Broker 2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
Cluster health
4offline partitions
29
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #2: Reviving an offline partition
Broker 1
Broker 2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
Cluster health
4offline partitions
30
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #2: Reviving an offline partition
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
Cluster health
0offline partitions
New topic
Data loss
31
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 3131
Scenario #2:
Reviving an offline partition
+ Restarting broker revives the offline partition
+ Partition reassignment doesn’t work when it
is offline
+ Recreation of a broker
○ Data loss
○ Inconsistent state
32
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #3
+ How can we recover data
from an offline partition?
33
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #3: Recovering offline partitions without data loss
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
Cluster health
0offline partitions
P0
R2
P1
R2
P2
R2
P3
R2
34
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #3: Recovering offline partitions without data loss
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
P0
R2
P1
R2
P2
R2
P3
R2
Cluster health
4offline partitions
35
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #3: Recovering offline partitions without data loss
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
P0
R2
P1
R2
P2
R2
P3
R2
Cluster health
4offline partitions
36
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #3: Recovering offline partitions without data loss
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
"Broker 1”
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
P0
R2
P1
R2
P2
R2
P3
R2
Cluster health
4offline partitions
P0
-
P1
-
P2
-
P3
-
37
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Scenario #3: Recovering offline partitions without data loss
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
“Broker 1”
ZooKeeper
Kafka cluster
Producer Consumer
Px
Ry
Replica y of partition x
EBS
volumes
P0
R2
P1
R2
P2
R2
P3
R2
Cluster health
4under replicated
P0
R2
P1
R2
P2
R2
P3
R2
38
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 3838
Scenario #3: How can we
recover from a lost AZ?
+ Spawning a replacement node using the
same EBS volumes
+ Maintain the same broker ID
+ Consumer group offsets are kept
39
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 3939
Our Conclusions
+ Chaos engineering helps in gaining knowledge
+ Kafka is not self-healing
+ Offline partitions can be brought back using
EBS volumes
+ Problems with the health check
+ Using different versions of Kafka might introduce lag
+ consider freezing the old version protocol
Inter.broker.protocol.version,log.message.format.version
+ Upgrade active controller last
40
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Deployment strategies
+ In-place deployment might
prove risky
41
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Blue green deployment
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Old cluster
P0
R2
P1
R2
P2
R2
P3
R2
Producers
Consumers
42
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Disconnecting producers and consumers
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Old cluster
P0
R2
P1
R2
P2
R2
P3
R2
Producers
Consumers
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
New cluster
P0
R2
P1
R2
P2
R2
P3
R2
43
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Draining the old cluster into the new one
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Old cluster
P0
R2
P1
R2
P2
R2
P3
R2
Producers
Consumers
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
New cluster
P0
R2
P1
R2
P2
R2
P3
R2
Replication
tool
44
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Reconnecting the producers and consumers
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Old cluster
P0
R2
P1
R2
P2
R2
P3
R2
Producers
Consumers
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
New cluster
P0
R2
P1
R2
P2
R2
P3
R2
Replication
tool
Producers
Consumers
45
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
The new cluster becomes the production one
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
New cluster
P0
R2
P1
R2
P2
R2
P3
R2
Producers
Consumers
46
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Replication = data + offsets metadata
47
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Replication tools
Tools name Pros Cons
Mirror Maker Open source
Relatively easy to use
Not a real mirror
UReplicator Creates a real mirror
Scalable
Open source
Maintained mainly/only by Uber
Kafka connect - Replicator Supports smart replication
Creates a real mirror
Based on Kafka Connect ecosystem
Paid solution
48
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Blue green as a failover solution
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Old cluster
P0
R2
P1
R2
P2
R2
P3
R2
Producers
Consumers
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Passive cluster
P0
R2
P1
R2
P2
R2
P3
R2
Replication
tool
Producers
Consumers
49
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Blue green as a failover solution
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Old cluster
P0
R2
P1
R2
P2
R2
P3
R2
Producers
Consumers
P0
R1
P1
R1
Broker 1
P2
R1
P3
R1
Broker 2
Broker 3
ZooKeeper
Passive cluster
P0
R2
P1
R2
P2
R2
P3
R2
Replication
tool
Producers
Consumers
50
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Kafka Pro Tips
+ Be part of the community
○ Join confluent Slack team
○ follow /suggest new KIP (Kafka Improvement)
○ Contribute fixes to Kafka and it’s ecosystem
+ Use smart metrics (like health check) for better visibility
+ Try chaos engineering at home use our Github repo
+ Don’t stay behind
○ Use updated Kafka consumers/producers
○ Update your cluster regularly
+ Follow confluent white papers
+ Kafka Health check repo
51
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Q & A
52
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
ZipRecruiter, Inc. Proprietary and Confidential.
Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
Thank you
https://www.linkedin.com/in/yaniv-ranen-284b003/
https://www.linkedin.com/in/shlomihassan/

Mais conteúdo relacionado

Mais procurados

Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...HostedbyConfluent
 
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. SaxIntroducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. SaxDatabricks
 
JavaからAkkaハンズオン
JavaからAkkaハンズオンJavaからAkkaハンズオン
JavaからAkkaハンズオンTIS Inc.
 
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...Amazon Web Services
 
Hyperledger Fabric のプラットフォームおよびインフラ運用
Hyperledger Fabric のプラットフォームおよびインフラ運用Hyperledger Fabric のプラットフォームおよびインフラ運用
Hyperledger Fabric のプラットフォームおよびインフラ運用Hyperleger Tokyo Meetup
 
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxData
 
Apache Kafka 0.11 の Exactly Once Semantics
Apache Kafka 0.11 の Exactly Once SemanticsApache Kafka 0.11 の Exactly Once Semantics
Apache Kafka 0.11 の Exactly Once SemanticsYoshiyasu SAEKI
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practicesconfluent
 
Architecture & Pitfalls of Logical Replication
Architecture & Pitfalls of Logical ReplicationArchitecture & Pitfalls of Logical Replication
Architecture & Pitfalls of Logical ReplicationAtsushi Torikoshi
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions Yugabyte
 
GitとCIとかチャットとかをオンプレで運用する話
GitとCIとかチャットとかをオンプレで運用する話GitとCIとかチャットとかをオンプレで運用する話
GitとCIとかチャットとかをオンプレで運用する話mdome
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScyllaDB
 
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)NTT DATA Technology & Innovation
 
LineairDBの紹介
LineairDBの紹介LineairDBの紹介
LineairDBの紹介Sho Nakazono
 
ニコニコ生放送の配信基盤改善
ニコニコ生放送の配信基盤改善ニコニコ生放送の配信基盤改善
ニコニコ生放送の配信基盤改善takahiro_yachi
 
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~NTT Communications Technology Development
 
Materialize: a platform for changing data
Materialize: a platform for changing dataMaterialize: a platform for changing data
Materialize: a platform for changing dataAltinity Ltd
 
Real-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidReal-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidJan Graßegger
 

Mais procurados (20)

Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
 
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. SaxIntroducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
 
Kafka・Storm・ZooKeeperの認証と認可について #kafkajp
Kafka・Storm・ZooKeeperの認証と認可について #kafkajpKafka・Storm・ZooKeeperの認証と認可について #kafkajp
Kafka・Storm・ZooKeeperの認証と認可について #kafkajp
 
JavaからAkkaハンズオン
JavaからAkkaハンズオンJavaからAkkaハンズオン
JavaからAkkaハンズオン
 
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
(APP307) Leverage the Cloud with a Blue/Green Deployment Architecture | AWS r...
 
Hyperledger Fabric のプラットフォームおよびインフラ運用
Hyperledger Fabric のプラットフォームおよびインフラ運用Hyperledger Fabric のプラットフォームおよびインフラ運用
Hyperledger Fabric のプラットフォームおよびインフラ運用
 
Deep Dive on Amazon Aurora
Deep Dive on Amazon AuroraDeep Dive on Amazon Aurora
Deep Dive on Amazon Aurora
 
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOxInfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
InfluxDB IOx Tech Talks: Query Processing in InfluxDB IOx
 
Apache Kafka 0.11 の Exactly Once Semantics
Apache Kafka 0.11 の Exactly Once SemanticsApache Kafka 0.11 の Exactly Once Semantics
Apache Kafka 0.11 の Exactly Once Semantics
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Architecture & Pitfalls of Logical Replication
Architecture & Pitfalls of Logical ReplicationArchitecture & Pitfalls of Logical Replication
Architecture & Pitfalls of Logical Replication
 
YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions YugaByte DB Internals - Storage Engine and Transactions
YugaByte DB Internals - Storage Engine and Transactions
 
GitとCIとかチャットとかをオンプレで運用する話
GitとCIとかチャットとかをオンプレで運用する話GitとCIとかチャットとかをオンプレで運用する話
GitとCIとかチャットとかをオンプレで運用する話
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
 
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)
その Pod 突然落ちても大丈夫ですか!?(OCHaCafe5 #5 実験!カオスエンジニアリング 発表資料)
 
LineairDBの紹介
LineairDBの紹介LineairDBの紹介
LineairDBの紹介
 
ニコニコ生放送の配信基盤改善
ニコニコ生放送の配信基盤改善ニコニコ生放送の配信基盤改善
ニコニコ生放送の配信基盤改善
 
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~
 
Materialize: a platform for changing data
Materialize: a platform for changing dataMaterialize: a platform for changing data
Materialize: a platform for changing data
 
Real-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and DruidReal-time Analytics with Apache Flink and Druid
Real-time Analytics with Apache Flink and Druid
 

Semelhante a Using Chaos Engineering to Level up Apache Kafka Skills

Breaking Extreme Networks WingOS: How to own millions of devices running on A...
Breaking Extreme Networks WingOS: How to own millions of devices running on A...Breaking Extreme Networks WingOS: How to own millions of devices running on A...
Breaking Extreme Networks WingOS: How to own millions of devices running on A...Priyanka Aash
 
" Breaking Extreme Networks WingOS: How to own millions of devices running on...
" Breaking Extreme Networks WingOS: How to own millions of devices running on..." Breaking Extreme Networks WingOS: How to own millions of devices running on...
" Breaking Extreme Networks WingOS: How to own millions of devices running on...PROIDEA
 
[CB19] MalConfScan with Cuckoo: Automatic Malware Configuration Extraction Sy...
[CB19] MalConfScan with Cuckoo: Automatic Malware Configuration Extraction Sy...[CB19] MalConfScan with Cuckoo: Automatic Malware Configuration Extraction Sy...
[CB19] MalConfScan with Cuckoo: Automatic Malware Configuration Extraction Sy...CODE BLUE
 
Big Data LDN 2018: PROGRESS FOR BIG DATA IN KUBERNETES
Big Data LDN 2018: PROGRESS FOR BIG DATA IN KUBERNETESBig Data LDN 2018: PROGRESS FOR BIG DATA IN KUBERNETES
Big Data LDN 2018: PROGRESS FOR BIG DATA IN KUBERNETESMatt Stubbs
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Dataviz in a collaborative mixed reality space
Dataviz in a collaborative mixed reality spaceDataviz in a collaborative mixed reality space
Dataviz in a collaborative mixed reality spaceStefan Wasserbauer
 
Making Aerial Micro-Drones a Reality (AIS306) - AWS re:Invent 2018
Making Aerial Micro-Drones a Reality (AIS306) - AWS re:Invent 2018Making Aerial Micro-Drones a Reality (AIS306) - AWS re:Invent 2018
Making Aerial Micro-Drones a Reality (AIS306) - AWS re:Invent 2018Amazon Web Services
 
Webinar: Why evasive zero day attacks are killing traditional sandboxing
Webinar: Why evasive zero day attacks are killing traditional sandboxingWebinar: Why evasive zero day attacks are killing traditional sandboxing
Webinar: Why evasive zero day attacks are killing traditional sandboxingCyren, Inc
 
C-SEC|2016 Session 2 The Security Game : You Failed at the Beginning By Incog...
C-SEC|2016 Session 2 The Security Game : You Failed at the Beginning By Incog...C-SEC|2016 Session 2 The Security Game : You Failed at the Beginning By Incog...
C-SEC|2016 Session 2 The Security Game : You Failed at the Beginning By Incog...acinfotec
 
Rooted2020 the day i_ruled_the_world_deceiving_software_developers_through_op...
Rooted2020 the day i_ruled_the_world_deceiving_software_developers_through_op...Rooted2020 the day i_ruled_the_world_deceiving_software_developers_through_op...
Rooted2020 the day i_ruled_the_world_deceiving_software_developers_through_op...RootedCON
 
The day I ruled the world (RootedCON 2020)
The day I ruled the world (RootedCON 2020)The day I ruled the world (RootedCON 2020)
The day I ruled the world (RootedCON 2020)Javier Junquera
 
Conférence - Arbor Edge Defense, Première et dernière ligne de défense intell...
Conférence - Arbor Edge Defense, Première et dernière ligne de défense intell...Conférence - Arbor Edge Defense, Première et dernière ligne de défense intell...
Conférence - Arbor Edge Defense, Première et dernière ligne de défense intell...African Cyber Security Summit
 
Black Clouds and Silver Linings in Node.js Security - Liran Tal Snyk OWASP Gl...
Black Clouds and Silver Linings in Node.js Security - Liran Tal Snyk OWASP Gl...Black Clouds and Silver Linings in Node.js Security - Liran Tal Snyk OWASP Gl...
Black Clouds and Silver Linings in Node.js Security - Liran Tal Snyk OWASP Gl...Liran Tal
 
Runtime Security across Kubernetes and AWS Fargate (CON317-R1) - AWS re:Inven...
Runtime Security across Kubernetes and AWS Fargate (CON317-R1) - AWS re:Inven...Runtime Security across Kubernetes and AWS Fargate (CON317-R1) - AWS re:Inven...
Runtime Security across Kubernetes and AWS Fargate (CON317-R1) - AWS re:Inven...Amazon Web Services
 
Advanced Threat Hunting - BotConf 2017
Advanced Threat Hunting - BotConf 2017Advanced Threat Hunting - BotConf 2017
Advanced Threat Hunting - BotConf 2017ThreatConnect
 
Advanced Threat Hunting - Botconf 2017
Advanced Threat Hunting - Botconf 2017Advanced Threat Hunting - Botconf 2017
Advanced Threat Hunting - Botconf 2017Kevin Finley
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanStreamNative
 

Semelhante a Using Chaos Engineering to Level up Apache Kafka Skills (20)

Breaking Extreme Networks WingOS: How to own millions of devices running on A...
Breaking Extreme Networks WingOS: How to own millions of devices running on A...Breaking Extreme Networks WingOS: How to own millions of devices running on A...
Breaking Extreme Networks WingOS: How to own millions of devices running on A...
 
" Breaking Extreme Networks WingOS: How to own millions of devices running on...
" Breaking Extreme Networks WingOS: How to own millions of devices running on..." Breaking Extreme Networks WingOS: How to own millions of devices running on...
" Breaking Extreme Networks WingOS: How to own millions of devices running on...
 
Integrating Vert.x
Integrating Vert.xIntegrating Vert.x
Integrating Vert.x
 
[CB19] MalConfScan with Cuckoo: Automatic Malware Configuration Extraction Sy...
[CB19] MalConfScan with Cuckoo: Automatic Malware Configuration Extraction Sy...[CB19] MalConfScan with Cuckoo: Automatic Malware Configuration Extraction Sy...
[CB19] MalConfScan with Cuckoo: Automatic Malware Configuration Extraction Sy...
 
Big Data LDN 2018: PROGRESS FOR BIG DATA IN KUBERNETES
Big Data LDN 2018: PROGRESS FOR BIG DATA IN KUBERNETESBig Data LDN 2018: PROGRESS FOR BIG DATA IN KUBERNETES
Big Data LDN 2018: PROGRESS FOR BIG DATA IN KUBERNETES
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Dataviz in a collaborative mixed reality space
Dataviz in a collaborative mixed reality spaceDataviz in a collaborative mixed reality space
Dataviz in a collaborative mixed reality space
 
Making Aerial Micro-Drones a Reality (AIS306) - AWS re:Invent 2018
Making Aerial Micro-Drones a Reality (AIS306) - AWS re:Invent 2018Making Aerial Micro-Drones a Reality (AIS306) - AWS re:Invent 2018
Making Aerial Micro-Drones a Reality (AIS306) - AWS re:Invent 2018
 
Trouble with memory
Trouble with memoryTrouble with memory
Trouble with memory
 
Webinar: Why evasive zero day attacks are killing traditional sandboxing
Webinar: Why evasive zero day attacks are killing traditional sandboxingWebinar: Why evasive zero day attacks are killing traditional sandboxing
Webinar: Why evasive zero day attacks are killing traditional sandboxing
 
C-SEC|2016 Session 2 The Security Game : You Failed at the Beginning By Incog...
C-SEC|2016 Session 2 The Security Game : You Failed at the Beginning By Incog...C-SEC|2016 Session 2 The Security Game : You Failed at the Beginning By Incog...
C-SEC|2016 Session 2 The Security Game : You Failed at the Beginning By Incog...
 
Rooted2020 the day i_ruled_the_world_deceiving_software_developers_through_op...
Rooted2020 the day i_ruled_the_world_deceiving_software_developers_through_op...Rooted2020 the day i_ruled_the_world_deceiving_software_developers_through_op...
Rooted2020 the day i_ruled_the_world_deceiving_software_developers_through_op...
 
The day I ruled the world (RootedCON 2020)
The day I ruled the world (RootedCON 2020)The day I ruled the world (RootedCON 2020)
The day I ruled the world (RootedCON 2020)
 
Visão Geral de Inteligência Artificial
Visão Geral de Inteligência ArtificialVisão Geral de Inteligência Artificial
Visão Geral de Inteligência Artificial
 
Conférence - Arbor Edge Defense, Première et dernière ligne de défense intell...
Conférence - Arbor Edge Defense, Première et dernière ligne de défense intell...Conférence - Arbor Edge Defense, Première et dernière ligne de défense intell...
Conférence - Arbor Edge Defense, Première et dernière ligne de défense intell...
 
Black Clouds and Silver Linings in Node.js Security - Liran Tal Snyk OWASP Gl...
Black Clouds and Silver Linings in Node.js Security - Liran Tal Snyk OWASP Gl...Black Clouds and Silver Linings in Node.js Security - Liran Tal Snyk OWASP Gl...
Black Clouds and Silver Linings in Node.js Security - Liran Tal Snyk OWASP Gl...
 
Runtime Security across Kubernetes and AWS Fargate (CON317-R1) - AWS re:Inven...
Runtime Security across Kubernetes and AWS Fargate (CON317-R1) - AWS re:Inven...Runtime Security across Kubernetes and AWS Fargate (CON317-R1) - AWS re:Inven...
Runtime Security across Kubernetes and AWS Fargate (CON317-R1) - AWS re:Inven...
 
Advanced Threat Hunting - BotConf 2017
Advanced Threat Hunting - BotConf 2017Advanced Threat Hunting - BotConf 2017
Advanced Threat Hunting - BotConf 2017
 
Advanced Threat Hunting - Botconf 2017
Advanced Threat Hunting - Botconf 2017Advanced Threat Hunting - Botconf 2017
Advanced Threat Hunting - Botconf 2017
 
Apache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! Japan
 

Mais de confluent

Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flinkconfluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsconfluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluentconfluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkconfluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloudconfluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluentconfluent
 

Mais de confluent (20)

Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluent
 

Último

React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 

Último (20)

React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 

Using Chaos Engineering to Level up Apache Kafka Skills

  • 1. 1 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Kafka Chaos Engineering Shlomi Hassan & Yaniv Ranen
  • 2. 2 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. It’s a lovely day, let’s upgrade the production cluster
  • 3. 3 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 3 We didn’t even get a chance to say goodbye
  • 4. 4 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Where did we go wrong?
  • 5. 5 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production. Chaos engineering
  • 6. 6 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Hello! Yaniv Ranen Shlomi Hassan
  • 7. 7 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 77 ZipRecruiter’s mission We actively connect people to their next great opportunity.
  • 8. 8 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Logging infrastructure scheme
  • 9. 9 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Logging Kafka cluster Cluster spec 1) Kafka a) 8 EC2 - m4.4xlarge b) EBS - Io1 2) Zookeeper a) 5 EC2 - m4.large b) EBS - Io1 3) Cluster volume - 3TB/day 4) Broker data spread on Multi AZ KafkaZookeeper
  • 10. 10 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Basic Kafka terminology
  • 11. 11 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Basic Kafka terminology P0 R1 P1 R1 broker1 P2 R1 P3 R1 P0 R2 P1 R2 broker2 P2 R2 P3 R2 broker3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x Leader Replica is in bold Active controller is in bold
  • 12. 12 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved.
  • 13. 13 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Chaos engineering method + Ask a question + Write it down as a scenario in a collaborative document + Check the behaviour in a controlled environment + Document the exact commands and output
  • 14. 14 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 1414 Scenario #1 + What happens if we kill one broker? + Will Kafka self-heal?
  • 15. 15 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #1: Stopping a broker P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 P0 R2 P1 R2 Broker 2 P2 R2 P3 R2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes Cluster health 0Under replicated
  • 16. 16 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #1: Stopping a broker P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 P0 R2 P1 R2 Broker 2 P2 R2 P3 R2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes Cluster health 54 Under replicated
  • 17. 17 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #1: Stopping a broker P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 P0 R2 P1 R2 Broker 2 P2 R2 P3 R2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes
  • 18. 18 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #1: Stopping a broker P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 P0 R2 P1 R2 Broker 2 P2 R2 P3 R2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes
  • 19. 19 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #1: Stopping a broker P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 P0 R2 P1 R2 Broker 2 P2 R2 P3 R2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes Cluster health 0Under replicatedReassign partitions
  • 20. 20 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 2020 Demonstration
  • 21. 21 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 2121 Scenario #1: Stopping a broker + Kafka is not self-healing + Manual reassign partitions is needed
  • 22. 22 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #2 + What can we do to revive an offline partition?
  • 23. 23 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #2: Reviving an offline partition P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes Cluster health 0offline partitions
  • 24. 24 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #2: Reviving an offline partition P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes Cluster health 4offline partitions
  • 25. 25 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #2: Reviving an offline partition P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes Cluster health 4offline partitions Reassign partitions
  • 26. 26 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #2: Reviving an offline partition P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes Cluster health 4offline partitions Reassign partitions
  • 27. 27 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #2: Reviving an offline partition P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes Cluster health 4offline partitions
  • 28. 28 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #2: Reviving an offline partition Broker 2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes Cluster health 4offline partitions
  • 29. 29 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #2: Reviving an offline partition Broker 1 Broker 2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes Cluster health 4offline partitions
  • 30. 30 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #2: Reviving an offline partition P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes Cluster health 0offline partitions New topic Data loss
  • 31. 31 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 3131 Scenario #2: Reviving an offline partition + Restarting broker revives the offline partition + Partition reassignment doesn’t work when it is offline + Recreation of a broker ○ Data loss ○ Inconsistent state
  • 32. 32 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #3 + How can we recover data from an offline partition?
  • 33. 33 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #3: Recovering offline partitions without data loss P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes Cluster health 0offline partitions P0 R2 P1 R2 P2 R2 P3 R2
  • 34. 34 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #3: Recovering offline partitions without data loss P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes P0 R2 P1 R2 P2 R2 P3 R2 Cluster health 4offline partitions
  • 35. 35 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #3: Recovering offline partitions without data loss P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes P0 R2 P1 R2 P2 R2 P3 R2 Cluster health 4offline partitions
  • 36. 36 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #3: Recovering offline partitions without data loss P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 "Broker 1” ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes P0 R2 P1 R2 P2 R2 P3 R2 Cluster health 4offline partitions P0 - P1 - P2 - P3 -
  • 37. 37 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Scenario #3: Recovering offline partitions without data loss P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 “Broker 1” ZooKeeper Kafka cluster Producer Consumer Px Ry Replica y of partition x EBS volumes P0 R2 P1 R2 P2 R2 P3 R2 Cluster health 4under replicated P0 R2 P1 R2 P2 R2 P3 R2
  • 38. 38 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 3838 Scenario #3: How can we recover from a lost AZ? + Spawning a replacement node using the same EBS volumes + Maintain the same broker ID + Consumer group offsets are kept
  • 39. 39 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. 3939 Our Conclusions + Chaos engineering helps in gaining knowledge + Kafka is not self-healing + Offline partitions can be brought back using EBS volumes + Problems with the health check + Using different versions of Kafka might introduce lag + consider freezing the old version protocol Inter.broker.protocol.version,log.message.format.version + Upgrade active controller last
  • 40. 40 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Deployment strategies + In-place deployment might prove risky
  • 41. 41 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Blue green deployment P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Old cluster P0 R2 P1 R2 P2 R2 P3 R2 Producers Consumers
  • 42. 42 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Disconnecting producers and consumers P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Old cluster P0 R2 P1 R2 P2 R2 P3 R2 Producers Consumers P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper New cluster P0 R2 P1 R2 P2 R2 P3 R2
  • 43. 43 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Draining the old cluster into the new one P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Old cluster P0 R2 P1 R2 P2 R2 P3 R2 Producers Consumers P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper New cluster P0 R2 P1 R2 P2 R2 P3 R2 Replication tool
  • 44. 44 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Reconnecting the producers and consumers P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Old cluster P0 R2 P1 R2 P2 R2 P3 R2 Producers Consumers P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper New cluster P0 R2 P1 R2 P2 R2 P3 R2 Replication tool Producers Consumers
  • 45. 45 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. The new cluster becomes the production one P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper New cluster P0 R2 P1 R2 P2 R2 P3 R2 Producers Consumers
  • 46. 46 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Replication = data + offsets metadata
  • 47. 47 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Replication tools Tools name Pros Cons Mirror Maker Open source Relatively easy to use Not a real mirror UReplicator Creates a real mirror Scalable Open source Maintained mainly/only by Uber Kafka connect - Replicator Supports smart replication Creates a real mirror Based on Kafka Connect ecosystem Paid solution
  • 48. 48 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Blue green as a failover solution P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Old cluster P0 R2 P1 R2 P2 R2 P3 R2 Producers Consumers P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Passive cluster P0 R2 P1 R2 P2 R2 P3 R2 Replication tool Producers Consumers
  • 49. 49 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Blue green as a failover solution P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Old cluster P0 R2 P1 R2 P2 R2 P3 R2 Producers Consumers P0 R1 P1 R1 Broker 1 P2 R1 P3 R1 Broker 2 Broker 3 ZooKeeper Passive cluster P0 R2 P1 R2 P2 R2 P3 R2 Replication tool Producers Consumers
  • 50. 50 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Kafka Pro Tips + Be part of the community ○ Join confluent Slack team ○ follow /suggest new KIP (Kafka Improvement) ○ Contribute fixes to Kafka and it’s ecosystem + Use smart metrics (like health check) for better visibility + Try chaos engineering at home use our Github repo + Don’t stay behind ○ Use updated Kafka consumers/producers ○ Update your cluster regularly + Follow confluent white papers + Kafka Health check repo
  • 51. 51 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Q & A
  • 52. 52 ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. ZipRecruiter, Inc. Proprietary and Confidential. Copyright © 2018 ZipRecruiter, Inc. All Rights Reserved. Thank you https://www.linkedin.com/in/yaniv-ranen-284b003/ https://www.linkedin.com/in/shlomihassan/