Kafka is the bedrock of Wix's distributed microservices system. For the last 5 years we have learned a lot about how to successfully scale our event-driven architecture to roughly 1500 microservices.
We’ve managed to achieve higher decoupling and independence for our various services and dev teams that have very different use-cases while maintaining a single uniform infrastructure in place.
In these slides you will learn about 8 key decisions and steps you can take in order to safely scale-up your Kafka-based system. These include:
* How to increase dev velocity of event driven style code.
* How to optimize working with Kafka in polyglot setting
* How to support growing amount of traffic and developers.
2. >180M registered users (website builders) from 190 countries
5% of all Internet websites run on Wix
4000+ people work at Wix
>500B HTTP Requests / Day
6PB of static content
At Wix
@NSilnitsky
3. Kafka Messages per day
Microservices
(Service) Developers
300M 1510M
500 1500
300 900
2017 2020
At Wix
4. What do you do
when the traffic,
meta-data, and amount of
developers and use cases
grow?
5. What do you do
when the traffic,
meta-data, and amount of
developers and use cases
grow?
6. #1 Common Infra
with common
features.What do you do
when the traffic,
meta-data, and amount of
developers and use cases
grow?
23. #2 Retry Topics will
cause your cluster to
grow faster. 😐
What do you do
when the traffic,
meta-data, and amount of
developers and use cases
grow?
#1 Common Infra
32. What do you do
when the traffic,
meta-data, and amount of
developers and use cases
grow?
#3 Self-service
tooling and
documentation.
#2 Retry Topics - bigger cluster
#1 Common Infra
43. Self-service
Docs
How do I investigate this lag?
How do I add retries on errors?
1. Github Readme for Greyhound code
2. Internal StackOverflow Q&A
3. Slack bot that answers without you
49. What do you do
when the traffic,
meta-data, and amount of
developers and use cases
grow?
#4 Async event
driven monitoring
is less trivial.
#3 Self-service tooling & docs
#2 Retry Topics - bigger cluster
#1 Common Infra
52. #5 Proactive
broker maintenance.
What do you do
when the traffic,
meta-data, and amount of
developers and use cases
grow?
#4 Non-trivial Monitoring
#3 Self-service tooling & docs
#2 Retry Topics - bigger cluster
#1 Common Infra
53. ● Add brokers when needed
● Split clusters when needed
● Delete unused topics
● Avoid hard failures
As a rule of thumb, we recommend each broker to have up to 4,000 partitions
and each cluster to have up to 200,000 partitions.
https://blogs.apache.org/kafka/entry/apache-kafka-supports-more-partitions
Don’t let brokers break...
@NSilnitsky
54. We’re migrating to Confluent Cloud
@NSilnitsky
● High Availability
● Don’t need to worry about
scaling clusters
55. What do you do
when the traffic,
meta-data, and amount of
developers and use cases
grow?
#6 Avoid using
Kafka SDK
directly in nodeJs
#5 Proactive broker maintenance
#4 Non-trivial Monitoring
#3 Self-service tooling & docs
#2 Retry Topics - bigger cluster
#1 Common Infra
59. What do you do
when the traffic,
meta-data, and amount of
developers and use cases
grow?
#7 Consume and
project
#5 Proactive broker maintenance
#4 Non-trivial Monitoring
#3 Self-service tooling & docs
#2 Retry Topics - bigger cluster
#1 Common Infra
#6 Avoid nodeJs SDK
66. Kafka messaging is event driven.
It is only relevant to service-service communications,
not for browser-server interactions, where a user is waiting,
right?
@NSilnitsky
67. Kafka messaging is event driven.
It is only relevant to service-service communications,
not for browser-server interactions, where a user is waiting,
right? Wrong.
@NSilnitsky
68. What do you do
when the traffic,
meta-data, and amount of
developers and use cases
grow?
#8 WebSockets
are Kafka’s best
friend
#7 Consume and project
#5 Proactive broker maintenance
#4 Non-trivial Monitoring
#3 Self-service tooling & docs
#2 Retry Topics - bigger cluster
#1 Common Infra
#6 Avoid nodeJs SDK
71. Kafka Broker
Subscribe for notifications
ConsumerProducer
Use Case:
Long-running async
business process
Contacts
Importer
Contacts
Jobs
Browser Web
Sockets
Service
@NSilnitsky