O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Distributed pub/sub platform
github.com/yahoo/pulsar
Matteo Merli — mmerli@yahoo-inc.com
11/17/2016
Agenda
1. Pulsar Overview
2. Common use cases
3. Messaging API
4. Architecture
5. Future
6. Q&A
What is Pulsar?
3
▪ Hosted pub-sub messaging
▪ Simple messaging model
▪ Highly scalable
› Topics, Message throughput
▪ Ord...
Pulsar production usage stats
4
▪ 1.5+ years
▪ 1.4 Million topics
▪ Publishes 100 billion messages/day (delivery 7x)
▪ Ave...
Why build a new system?
5
▪ No existing solution to satisfy requirements
› Multi tenant — 1M topics — Low latency — Durabi...
Common use cases
Message queue
7
▪ Decouple online / background
▪ Provide high-availability
▪ Reliable data transport
Online
events
Pulsar
...
Notifications
8
▪ Listeners are frequently different tenants
▪ Quotas needs to ensure producer is not affected
Event
Pulsa...
Feedback system
9
External
inputs
Pulsar
topic 1
Serving
system
Serving
system
Serving
system
Pulsar
topic 2
Controller
Up...
Geo replication
10
▪ Asynchronous replication
▪ Integrated in the broker message flow
▪ Simple configuration to add/remove...
Platforms
11
▪ Pulsar used to build other platforms
▪ Provide high-level abstraction with strict guarantees
▪ Example: She...
Messaging API
Messaging Model
13
Consumer-A1 receives all messages published on T; B1, B2, B3 receive one third each
Shared
Exclusive
Co...
Producer example
14
PulsarClient client = PulsarClient.create(
“http://broker.usw.example.com:8080”);
Producer producer = ...
Consumer example
15
PulsarClient client = PulsarClient.create(
"http://broker.usw.example.com:8080");
Consumer consumer = ...
Additional client library features
16
▪ Partitioned topics
▪ Transparent batching of messages
▪ Compression
▪ End-to-end c...
Architecture
Architecture / 1
18
Broker
‣ Clients interacts only
with brokers
‣ No durable state
Bookie
‣ Apache BookKeeper
storage nod...
Architecture / 2
19
Separate layers
between brokers
bookies
‣ Broker and bookies can
be added
independently
‣ Traffic can ...
Architecture / 3
20
Pulsar Cluster
Broker
Bookie
ZK
Global
ZK
Service
discovery
Producer
App
Pulsar
lib
Replication
Manage...
Architecture / 4
21
Pulsar Cluster
Broker
Bookie
ZK
Global
ZK
Service
discovery
Producer
App
Pulsar
lib
Replication
Manage...
BookKeeper
22
▪ Replicated log service
▪ Offer consistency and durability
▪ Restores replication factor after node failure...
BookKeeper - Storage
23
▪ A single bookie can serve
and store thousands of
ledgers
▪ Write and read paths are
separated:
›...
Single topic — Throughput and latency
24
Throughput and 99pct publish latency — 1 Topic — 1 Producer
Latency(ms)
0
1
2
3
4...
Future
Future
26
▪ WebSocket API
› More language bindings based on top of it
▪ C++ API
› Existing C++ client library is being pre...
Final Remarks
• Check out the code and docs at github.com/yahoo/pulsar
• Give feedback or ask for more details on mailing ...
Próximos SlideShares
Carregando em…5
×

Pulsar - Distributed pub/sub platform

1.063 visualizações

Publicada em

Tech talk on Pulsar

Publicada em: Software
  • Seja o primeiro a comentar

Pulsar - Distributed pub/sub platform

  1. 1. Distributed pub/sub platform github.com/yahoo/pulsar Matteo Merli — mmerli@yahoo-inc.com 11/17/2016
  2. 2. Agenda 1. Pulsar Overview 2. Common use cases 3. Messaging API 4. Architecture 5. Future 6. Q&A
  3. 3. What is Pulsar? 3 ▪ Hosted pub-sub messaging ▪ Simple messaging model ▪ Highly scalable › Topics, Message throughput ▪ Ordering, durability & delivery guarantees ▪ Supports multi-tenancy ▪ Geo-replication ▪ Easy to operate (Amin APIs, Add capacity, replace machines) Pulsar Cluster Broker Bookie ZK Global ZK Producer Consumer Replication
  4. 4. Pulsar production usage stats 4 ▪ 1.5+ years ▪ 1.4 Million topics ▪ Publishes 100 billion messages/day (delivery 7x) ▪ Average latency < 5ms, 99% 15ms ▪ Zero data loss ▪ 80+ applications ▪ Critical component of major Yahoo systems: › Mail, Finance, Sports, Gemini Ads ▪ Self-Served provisioning ▪ Full-mesh cross-datacenter replication – 8 data centers
  5. 5. Why build a new system? 5 ▪ No existing solution to satisfy requirements › Multi tenant — 1M topics — Low latency — Durability — Geo replication ▪ Kafka doesn’t scale well with many topics: › Storage model based on individual directory per topic partition › Enabling durability kills the performance ▪ Operations are not very convenient › eg: replacing a server, manual commands to copy the data and involves clients › clients access to ZK clusters not desirable ▪ Ability to manage large backlogs ▪ No scalable support to keep consumer position
  6. 6. Common use cases
  7. 7. Message queue 7 ▪ Decouple online / background ▪ Provide high-availability ▪ Reliable data transport Online events Pulsar topic 1 Worker 1 Worker 2 Worker 3 Pulsar topic 2 Low latency publish Long running task Notification
  8. 8. Notifications 8 ▪ Listeners are frequently different tenants ▪ Quotas needs to ensure producer is not affected Event Pulsar topic Component 1 Component 2 Component 3 Listeners
  9. 9. Feedback system 9 External inputs Pulsar topic 1 Serving system Serving system Serving system Pulsar topic 2 Controller Updates Feedback ▪ Coordinate a large number of machines ▪ Propagate state
  10. 10. Geo replication 10 ▪ Asynchronous replication ▪ Integrated in the broker message flow ▪ Simple configuration to add/remove regions
  11. 11. Platforms 11 ▪ Pulsar used to build other platforms ▪ Provide high-level abstraction with strict guarantees ▪ Example: Sherpa distributed key-value store › Massive database powering most of Yahoo’s online data serving applications › Built upon the concept of a common message bus › Pulsar provides: • Durable log • Replication within and across geo-locations
  12. 12. Messaging API
  13. 13. Messaging Model 13 Consumer-A1 receives all messages published on T; B1, B2, B3 receive one third each Shared Exclusive Consumer-B1 Consumer-B2 Consumer-B3 Topic-T Subscription-B Subscription-A Consumer-A1 Producer-X Producer-Y
  14. 14. Producer example 14 PulsarClient client = PulsarClient.create( “http://broker.usw.example.com:8080”); Producer producer = client.createProducer( “persistent://my-property/us-west/my-namespace/my-topic”); // handles retries in case of failure producer.send("my-message".getBytes()); // Async version: producer.sendAsync("my-message".getBytes()).thenRun(() -> { // Message was persisted });
  15. 15. Consumer example 15 PulsarClient client = PulsarClient.create( "http://broker.usw.example.com:8080"); Consumer consumer = client.subscribe( "persistent://my-property/us-west/my-namespace/my-topic", "my-subscription-name"); while (true) { // Wait for a message Message msg = consumer.receive(); System.out.println("Received message: " + msg.getData()); // Acknowledge the message so that it can be deleted by broker consumer.acknowledge(msg); }
  16. 16. Additional client library features 16 ▪ Partitioned topics ▪ Transparent batching of messages ▪ Compression ▪ End-to-end checksum ▪ TLS encryption ▪ Individual and cumulative acknowledgment ▪ Client side stats
  17. 17. Architecture
  18. 18. Architecture / 1 18 Broker ‣ Clients interacts only with brokers ‣ No durable state Bookie ‣ Apache BookKeeper storage nodes ‣ Distributed write-ahead log ‣ Each machine stores data from many topicsPulsar Cluster ZK Producer Consumer Broker 1 Broker 3 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5 Broker 2
  19. 19. Architecture / 2 19 Separate layers between brokers bookies ‣ Broker and bookies can be added independently ‣ Traffic can be shifted very quickly across brokers ‣ New bookies will ramp up on traffic quickly Pulsar Cluster ZK Producer Consumer Broker 1 Broker 3 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5 Broker 2
  20. 20. Architecture / 3 20 Pulsar Cluster Broker Bookie ZK Global ZK Service discovery Producer App Pulsar lib Replication Managed Ledger BK Client Global replicators Cache Dispatcher Consumer App Pulsar lib Load Balancer Client library ‣ Lookup correct broker through service discovery ‣ Direct connection to broker ‣ When connection is established, authentication and authorization are enforced ‣ Reconnect with back off strategy
  21. 21. Architecture / 4 21 Pulsar Cluster Broker Bookie ZK Global ZK Service discovery Producer App Pulsar lib Replication Managed Ledger BK Client Global replicators Cache Dispatcher Consumer App Pulsar lib Load Balancer Dispatcher ‣ End-to-end async message processing ‣ Messages are relayed across producers, bookies and consumers with no copies ‣ Pooled ref-counted buffers Managed Ledger ‣ Abstraction for single topic storage ‣ Cache recent messages
  22. 22. BookKeeper 22 ▪ Replicated log service ▪ Offer consistency and durability ▪ Restores replication factor after node failures ▪ Why is it a good choice for Pulsar? › Very efficient storage for sequential data › Very good distribution of IO across all bookies • For each topic we are creating multiple ledgers over time › Isolation of write and reads › Flexible model for quorum writes with different tradeoffs
  23. 23. BookKeeper - Storage 23 ▪ A single bookie can serve and store thousands of ledgers ▪ Write and read paths are separated: › Avoid read activity to impact write latency › Writes are added to in- memory write-cache and committed to journal › Write cache is flushed in background to separated device ▪ Entries are sorted to allow for mostly sequential reads
  24. 24. Single topic — Throughput and latency 24 Throughput and 99pct publish latency — 1 Topic — 1 Producer Latency(ms) 0 1 2 3 4 5 6 Throughput (msg/s) 1,000 10,000 100,000 1,000,000 10,000,000 1,800,000 10 Bytes 100 Bytes 1KB
  25. 25. Future
  26. 26. Future 26 ▪ WebSocket API › More language bindings based on top of it ▪ C++ API › Existing C++ client library is being prepared for OSS release ▪ End-to-End data encryption › Use symmetric/asymmetric encryption from producer to consumer › Data encrypted in flight and at rest › Don’t need to trust the service for security ▪ Globally consistent topics › Store the data in multiple regions › Can migrate across regions with consistency
  27. 27. Final Remarks • Check out the code and docs at github.com/yahoo/pulsar • Give feedback or ask for more details on mailing lists: • Pulsar-Users • Pulsar-Dev

×