This is an overview of interesting features from Apache Pulsar. Keep in mind that by the time I did this presentation I did not have used Pulsar yet. It's just my first impressions from the list of features.
3. 3 •
Kafka is an amazing tool, with increadible througput and resilience, but it has some
drawbacks or lacks few features:
Capacity of a partition is limited by the smallest node
Ops - Add/remove a new broker requires cluster rebalancing
No long term storage
Only sub/pub client pattern (no work queue)
No namespace or tenancy management
No multi-cluster replication
Motivation
13. 13 •
It uses BookKeeper but other schema registry can be plugged
Can be uploaded when a typed Producer is created or via REST API
Versioned
Defined at topic level
Format types:
String (used for UTF-8-encoded strings)
JSON
Protobuf
Avro
Only works with Java
Schema Registry
16. 16 •
Message Retention
Applies to messages that are marked as acknowledged and set to be deleted
It’s a time limit applied on a topic whereas.
TTL
Applies to messages that were not consumed
It’s a time limit on consumption with a subscription.
Retention
25. 25 •
Geo Replication (Sync)
Requires global Zookeeper installation
Region Aware Placement Policy
Higher latency
26. 26 •
Geo Replication (ASync)
Rack Aware Placement Policy
First persisted to the local cluster and
then replicated asynchronously to the
remote clusters
Enabled on a per-tenant basis
Types:
master-slave replication
active-active bidirectional
replication
full-mesh replication between
multiple data centers
27. 27 •
Per producer/topic sequence numbers to detect duplicates
Each topic owner broker maintains an in-memory hashmap of the latest sequence number
per topic/producer.
The broker periodically snapshots the latest sequence number to a cursor, which allows the
map to be reconstructed by another broker after a fail-over.
Deduplication
https://jack-vanlightly.com/blog/2018/10/25/testing-producer-deduplication-in-apache-kafka-and-apache-pulsar
28. 28 •
Lightweight compute framework
for Pulsar
Can run inside or outside the
cluster
State storage is handled by
BookKeeper
"Serverless" idea
Pulsar Functions
Notas do Editor
Do quick presentation of each other
short agenda (first kafka basics + seconds design choice that made it a great tool for our scale)