Zhaopin.com is a Chinese online recruitment services provider. As a bilingual job board, Zhaopin.com has one of the largest selections of job vacancies in China, including both prominent local and foreign companies. It has over 2.2 million clients, and its average daily page views are over 68M.
Zhaopin.com has used RabbitMQ for years as its enterprise event bus. As the company grew, the amount of data grew, and the use cases varied widely. Because of this, the original RabbitMQ-based architecture was hard to scale, maintain, and operate. Ultimately, Zhaopin.com chose Apache Pulsar to replace RabbitMQ based architecture in 2018.
Sijie Guo and Penghui Li describe the advantages of Pulsar over RabbitMQ and explain how Pulsar meets the requirements of high durability, high throughput, and low latency. Finally, they detail how Pulsar was put into production at Zhaopin.com and share use cases and their experience operating Pulsar at scale.
How Zhaopin built its Event Center using Apache Pulsar
1. How Zhaopin built its Event Center
using Apache Pulsar
Penghui Li
Sijie Guo
2. Zhaopin.com
Zhaopin.com is the biggest online recruitment service provider
in China
Zhaopin.com provides job seekers a comprehensive resume service, latest
employment, and career development related information, as well as in-depth online
job search for positions throughout China
Zhaopin.com provides professional HR services to over 2.2 million clients and its
average daily page views are over 68 million.
3. Who are we
Penghui Li
-Tech lead of infrastructure team at zhaopin.com
-5+ years of experiences developing message
queues and microservices
-Apache Pulsar Committer
4. Who are we
Sijie Guo
-Apache Pulsar Committer & PMC Member
-Apache BookKeeper Committer & PMC Member
-Interested in technologies around Event Streaming
-Worked for Twitter and Yahoo before
5. 1. Why building an Event Center
2. Why Apache Pulsar
3. Apache Pulsar at Zhaopin
4. Streaming Platform
5. Zhaopin’s contributions to Apache Pulsar
7. Data Silos
To Enterprises
MSMQ
To End Users
RabbitMQ
Data Processing
Kafka
• High Maintenance Cost
• Extremely hard to share data cross
teams
• Inconsistency between data silos
• Doesn’t Scale
• No consistent SLA
Pain Points
8. Data Silos
To Enterprises
MSMQ
To End Users
RabbitMQ
Data Processing
Kafka
• High Maintenance Cost
• Extremely hard to share data cross
teams
• Inconsistency between data silos
• Doesn’t Scale
• No consistent SLA
Pain Points
9. Unification - MQService
Thrift
RabbitMQ RabbitMQ RabbitMQ
HTTP MQTT
Submission ServiceResume ServiceJob Search
MQService
RabbitMQ RabbitMQ
• Simplified Operations
• Scale-out Service
• High availability
Problems Solved:
• Keep messages for longer period
• Data rewind
• Order Guarantee
Problems Unsolved:
11. 0
Consumer-1 Consumer-2 Consumer-3 New consumer
0
Queue Partition-0 Partition-1 Partition-2
1
2
3
0 1 2 3
1
2
3
0
1
2
3
0
1
2
3
Consumer-1
0,1,2,3
Consumer-1 Consumer-1 New consumer
0,1,2,3 0,1,2,3
Better consumption parallelism Better order guarantee
Why Building an Event Center
12. Why Building an Event Center
RabbitMQ is better for work queue use cases, more consumers can increase
consumption. Kafka need more partitions to increase consumption.
We used RabbitMQ a lot for work queue use cases.
13. Why Building an Event Center
Kafka integrates well with the data processing ecosystem (Flink, Spark),
and provides high throughput.
We used Kafka a lot for data processing.
14. Why Building an Event Center
The cost of operating two different message systems is high
Data sits at two different silos
But
We need a unified platform to handle both scenarios
22. Unification - Apache Pulsar
Online Services
Apache Pulsar
• No Data Silos
• Queue + Streaming
• Disaster Recovery
• Infinite Message Storage (via Tiered Storage)
• Data rewinding
Problem Solved:
Data Processing
Queue Streaming
23. Milestones
POC
2018/07 2018/09
Pulsar on Production
2018/10
Pulsar based Event Center
1 billion msgs/day
2018/11
Win the best innovative
platform award at Zhaopin
2018/12
3 billion msgs/day
2019/02
6 billion msgs/day
26. Pulsar at Zhaopin
1. One copy of data, single source-of-truth.
2. Don’t worry about data consistency between RabbitMQ and Kafka
3. Multi-tenancy makes topic management easier
4. Strong data durability allows us to stop worrying about message
loss
32. Zhaopin’s Contributions to Pulsar
Client interceptors
We use this feature to track message between producer and consumers
Dead Letter Topic
Time partitioned message tracker
Service url provider
We use this feature to dynamically switching traffic
Hive Pulsar integration
Muti-version Schema and more…