SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
Distributed pub/sub platform
github.com/yahoo/pulsar
Matteo Merli — mmerli@yahoo-inc.com
11/17/2016
Agenda
1. Pulsar Overview
2. Common use cases
3. Messaging API
4. Architecture
5. Future
6. Q&A
What is Pulsar?
3
▪ Hosted pub-sub messaging
▪ Simple messaging model
▪ Highly scalable
› Topics, Message throughput
▪ Ordering, durability & delivery guarantees
▪ Supports multi-tenancy
▪ Geo-replication
▪ Easy to operate (Amin APIs, Add capacity, replace machines)
Pulsar Cluster
Broker
Bookie
ZK
Global
ZK
Producer Consumer
Replication
Pulsar production usage stats
4
▪ 1.5+ years
▪ 1.4 Million topics
▪ Publishes 100 billion messages/day (delivery 7x)
▪ Average latency < 5ms, 99% 15ms
▪ Zero data loss
▪ 80+ applications
▪ Critical component of major Yahoo systems:
› Mail, Finance, Sports, Gemini Ads
▪ Self-Served provisioning
▪ Full-mesh cross-datacenter replication – 8 data centers
Why build a new system?
5
▪ No existing solution to satisfy requirements
› Multi tenant — 1M topics — Low latency — Durability — Geo replication
▪ Kafka doesn’t scale well with many topics:
› Storage model based on individual directory per topic partition
› Enabling durability kills the performance
▪ Operations are not very convenient
› eg: replacing a server, manual commands to copy the data and involves clients
› clients access to ZK clusters not desirable
▪ Ability to manage large backlogs
▪ No scalable support to keep consumer position
Common use cases
Message queue
7
▪ Decouple online / background
▪ Provide high-availability
▪ Reliable data transport
Online
events
Pulsar
topic 1
Worker 1
Worker 2
Worker 3
Pulsar
topic 2
Low latency
publish
Long running task
Notification
Notifications
8
▪ Listeners are frequently different tenants
▪ Quotas needs to ensure producer is not affected
Event
Pulsar
topic
Component 1
Component 2
Component 3
Listeners
Feedback system
9
External
inputs
Pulsar
topic 1
Serving
system
Serving
system
Serving
system
Pulsar
topic 2
Controller
Updates
Feedback
▪ Coordinate a large
number of machines
▪ Propagate state
Geo replication
10
▪ Asynchronous replication
▪ Integrated in the broker message flow
▪ Simple configuration to add/remove regions
Platforms
11
▪ Pulsar used to build other platforms
▪ Provide high-level abstraction with strict guarantees
▪ Example: Sherpa distributed key-value store
› Massive database powering most of Yahoo’s online data serving applications
› Built upon the concept of a common message bus
› Pulsar provides:
• Durable log
• Replication within and across geo-locations
Messaging API
Messaging Model
13
Consumer-A1 receives all messages published on T; B1, B2, B3 receive one third each
Shared
Exclusive
Consumer-B1
Consumer-B2
Consumer-B3
Topic-T
Subscription-B
Subscription-A Consumer-A1
Producer-X
Producer-Y
Producer example
14
PulsarClient client = PulsarClient.create(
“http://broker.usw.example.com:8080”);
Producer producer = client.createProducer(
“persistent://my-property/us-west/my-namespace/my-topic”);
// handles retries in case of failure
producer.send("my-message".getBytes());
// Async version:
producer.sendAsync("my-message".getBytes()).thenRun(() -> {
// Message was persisted
});
Consumer example
15
PulsarClient client = PulsarClient.create(
"http://broker.usw.example.com:8080");
Consumer consumer = client.subscribe(
"persistent://my-property/us-west/my-namespace/my-topic",
"my-subscription-name");
while (true) {
// Wait for a message
Message msg = consumer.receive();
System.out.println("Received message: " + msg.getData());
// Acknowledge the message so that it can be deleted by broker
consumer.acknowledge(msg);
}
Additional client library features
16
▪ Partitioned topics
▪ Transparent batching of messages
▪ Compression
▪ End-to-end checksum
▪ TLS encryption
▪ Individual and cumulative acknowledgment
▪ Client side stats
Architecture
Architecture / 1
18
Broker
‣ Clients interacts only
with brokers
‣ No durable state
Bookie
‣ Apache BookKeeper
storage nodes
‣ Distributed write-ahead
log
‣ Each machine stores
data from many topicsPulsar Cluster
ZK
Producer Consumer
Broker 1 Broker 3
Bookie
1
Bookie
2
Bookie
3
Bookie
4
Bookie
5
Broker 2
Architecture / 2
19
Separate layers
between brokers
bookies
‣ Broker and bookies can
be added
independently
‣ Traffic can be shifted
very quickly across
brokers
‣ New bookies will ramp
up on traffic quickly
Pulsar Cluster
ZK
Producer Consumer
Broker 1 Broker 3
Bookie
1
Bookie
2
Bookie
3
Bookie
4
Bookie
5
Broker 2
Architecture / 3
20
Pulsar Cluster
Broker
Bookie
ZK
Global
ZK
Service
discovery
Producer
App
Pulsar
lib
Replication
Managed
Ledger
BK
Client
Global
replicators
Cache
Dispatcher
Consumer
App
Pulsar
lib
Load
Balancer
Client library
‣ Lookup correct broker
through service
discovery
‣ Direct connection to
broker
‣ When connection is
established,
authentication and
authorization are
enforced
‣ Reconnect with back
off strategy
Architecture / 4
21
Pulsar Cluster
Broker
Bookie
ZK
Global
ZK
Service
discovery
Producer
App
Pulsar
lib
Replication
Managed
Ledger
BK
Client
Global
replicators
Cache
Dispatcher
Consumer
App
Pulsar
lib
Load
Balancer
Dispatcher
‣ End-to-end async
message processing
‣ Messages are relayed
across producers,
bookies and
consumers with no
copies
‣ Pooled ref-counted
buffers
Managed Ledger
‣ Abstraction for single
topic storage
‣ Cache recent
messages
BookKeeper
22
▪ Replicated log service
▪ Offer consistency and durability
▪ Restores replication factor after node failures
▪ Why is it a good choice for Pulsar?
› Very efficient storage for sequential data
› Very good distribution of IO across all bookies
• For each topic we are creating multiple ledgers over time
› Isolation of write and reads
› Flexible model for quorum writes with different tradeoffs
BookKeeper - Storage
23
▪ A single bookie can serve
and store thousands of
ledgers
▪ Write and read paths are
separated:
› Avoid read activity to impact
write latency
› Writes are added to in-
memory write-cache and
committed to journal
› Write cache is flushed in
background to separated
device
▪ Entries are sorted to allow
for mostly sequential reads
Single topic — Throughput and latency
24
Throughput and 99pct publish latency — 1 Topic — 1 Producer
Latency(ms)
0
1
2
3
4
5
6
Throughput (msg/s)
1,000 10,000 100,000 1,000,000 10,000,000
1,800,000
10 Bytes
100 Bytes
1KB
Future
Future
26
▪ WebSocket API
› More language bindings based on top of it
▪ C++ API
› Existing C++ client library is being prepared for OSS release
▪ End-to-End data encryption
› Use symmetric/asymmetric encryption from producer to consumer
› Data encrypted in flight and at rest
› Don’t need to trust the service for security
▪ Globally consistent topics
› Store the data in multiple regions
› Can migrate across regions with consistency
Final Remarks
• Check out the code and docs at github.com/yahoo/pulsar
• Give feedback or ask for more details on mailing lists:
• Pulsar-Users
• Pulsar-Dev

Mais conteúdo relacionado

Mais procurados

Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developersconfluent
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...HostedbyConfluent
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryJean-Paul Azar
 
NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?Anton Zadorozhniy
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - KafkaMayank Bansal
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overviewiamtodor
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Kai Wähner
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planningconfluent
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache KafkaAmir Sedighi
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connectconfluent
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practicesconfluent
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafkaconfluent
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Casesconfluent
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...HostedbyConfluent
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Worksconfluent
 
Kafka tiered-storage-meetup-2022-final-presented
Kafka tiered-storage-meetup-2022-final-presentedKafka tiered-storage-meetup-2022-final-presented
Kafka tiered-storage-meetup-2022-final-presentedSumant Tambe
 

Mais procurados (20)

Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra... Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
Disaster Recovery Options Running Apache Kafka in Kubernetes with Rema Subra...
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Benefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use CasesBenefits of Stream Processing and Apache Kafka Use Cases
Benefits of Stream Processing and Apache Kafka Use Cases
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ... A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
A Hitchhiker's Guide to Apache Kafka Geo-Replication with Sanjana Kaundinya ...
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 
Kafka tiered-storage-meetup-2022-final-presented
Kafka tiered-storage-meetup-2022-final-presentedKafka tiered-storage-meetup-2022-final-presented
Kafka tiered-storage-meetup-2022-final-presented
 

Semelhante a Pulsar - Distributed pub/sub platform

October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...Yahoo Developer Network
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarSijie Guo
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarKarthik Ramasamy
 
Cloud Messaging Service: Technical Overview
Cloud Messaging Service: Technical OverviewCloud Messaging Service: Technical Overview
Cloud Messaging Service: Technical OverviewMessaging Meetup
 
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...Streamlio
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scaleMatteo Merli
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022StreamNative
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPCMax Alexejev
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesLINE Corporation
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouseAltinity Ltd
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLEdunomica
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...Lucas Jellema
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafkaSamuel Kerrien
 
High performance messaging with Apache Pulsar
High performance messaging with Apache PulsarHigh performance messaging with Apache Pulsar
High performance messaging with Apache PulsarMatteo Merli
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stackNitin Mehta
 

Semelhante a Pulsar - Distributed pub/sub platform (20)

October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
October 2016 HUG: Pulsar,  a highly scalable, low latency pub-sub messaging s...
 
Hands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache PulsarHands-on Workshop: Apache Pulsar
Hands-on Workshop: Apache Pulsar
 
Linked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache PulsarLinked In Stream Processing Meetup - Apache Pulsar
Linked In Stream Processing Meetup - Apache Pulsar
 
Cloud Messaging Service: Technical Overview
Cloud Messaging Service: Technical OverviewCloud Messaging Service: Technical Overview
Cloud Messaging Service: Technical Overview
 
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...Messaging, storage, or both?  The real time story of Pulsar and Apache Distri...
Messaging, storage, or both? The real time story of Pulsar and Apache Distri...
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
 
Pulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scalePulsar - flexible pub-sub for internet scale
Pulsar - flexible pub-sub for internet scale
 
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
Pulsar's Journey in Yahoo!: On-prem, Cloud and Hybrid - Pulsar Summit SF 2022
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPC
 
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messagesMulti-Tenancy Kafka cluster for LINE services with 250 billion daily messages
Multi-Tenancy Kafka cluster for LINE services with 250 billion daily messages
 
Redpanda and ClickHouse
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
AMIS SIG - Introducing Apache Kafka - Scalable, reliable Event Bus & Message ...
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
High performance messaging with Apache Pulsar
High performance messaging with Apache PulsarHigh performance messaging with Apache Pulsar
High performance messaging with Apache Pulsar
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
 
kafka
kafkakafka
kafka
 

Último

Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 

Último (20)

Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 

Pulsar - Distributed pub/sub platform

  • 1. Distributed pub/sub platform github.com/yahoo/pulsar Matteo Merli — mmerli@yahoo-inc.com 11/17/2016
  • 2. Agenda 1. Pulsar Overview 2. Common use cases 3. Messaging API 4. Architecture 5. Future 6. Q&A
  • 3. What is Pulsar? 3 ▪ Hosted pub-sub messaging ▪ Simple messaging model ▪ Highly scalable › Topics, Message throughput ▪ Ordering, durability & delivery guarantees ▪ Supports multi-tenancy ▪ Geo-replication ▪ Easy to operate (Amin APIs, Add capacity, replace machines) Pulsar Cluster Broker Bookie ZK Global ZK Producer Consumer Replication
  • 4. Pulsar production usage stats 4 ▪ 1.5+ years ▪ 1.4 Million topics ▪ Publishes 100 billion messages/day (delivery 7x) ▪ Average latency < 5ms, 99% 15ms ▪ Zero data loss ▪ 80+ applications ▪ Critical component of major Yahoo systems: › Mail, Finance, Sports, Gemini Ads ▪ Self-Served provisioning ▪ Full-mesh cross-datacenter replication – 8 data centers
  • 5. Why build a new system? 5 ▪ No existing solution to satisfy requirements › Multi tenant — 1M topics — Low latency — Durability — Geo replication ▪ Kafka doesn’t scale well with many topics: › Storage model based on individual directory per topic partition › Enabling durability kills the performance ▪ Operations are not very convenient › eg: replacing a server, manual commands to copy the data and involves clients › clients access to ZK clusters not desirable ▪ Ability to manage large backlogs ▪ No scalable support to keep consumer position
  • 7. Message queue 7 ▪ Decouple online / background ▪ Provide high-availability ▪ Reliable data transport Online events Pulsar topic 1 Worker 1 Worker 2 Worker 3 Pulsar topic 2 Low latency publish Long running task Notification
  • 8. Notifications 8 ▪ Listeners are frequently different tenants ▪ Quotas needs to ensure producer is not affected Event Pulsar topic Component 1 Component 2 Component 3 Listeners
  • 9. Feedback system 9 External inputs Pulsar topic 1 Serving system Serving system Serving system Pulsar topic 2 Controller Updates Feedback ▪ Coordinate a large number of machines ▪ Propagate state
  • 10. Geo replication 10 ▪ Asynchronous replication ▪ Integrated in the broker message flow ▪ Simple configuration to add/remove regions
  • 11. Platforms 11 ▪ Pulsar used to build other platforms ▪ Provide high-level abstraction with strict guarantees ▪ Example: Sherpa distributed key-value store › Massive database powering most of Yahoo’s online data serving applications › Built upon the concept of a common message bus › Pulsar provides: • Durable log • Replication within and across geo-locations
  • 13. Messaging Model 13 Consumer-A1 receives all messages published on T; B1, B2, B3 receive one third each Shared Exclusive Consumer-B1 Consumer-B2 Consumer-B3 Topic-T Subscription-B Subscription-A Consumer-A1 Producer-X Producer-Y
  • 14. Producer example 14 PulsarClient client = PulsarClient.create( “http://broker.usw.example.com:8080”); Producer producer = client.createProducer( “persistent://my-property/us-west/my-namespace/my-topic”); // handles retries in case of failure producer.send("my-message".getBytes()); // Async version: producer.sendAsync("my-message".getBytes()).thenRun(() -> { // Message was persisted });
  • 15. Consumer example 15 PulsarClient client = PulsarClient.create( "http://broker.usw.example.com:8080"); Consumer consumer = client.subscribe( "persistent://my-property/us-west/my-namespace/my-topic", "my-subscription-name"); while (true) { // Wait for a message Message msg = consumer.receive(); System.out.println("Received message: " + msg.getData()); // Acknowledge the message so that it can be deleted by broker consumer.acknowledge(msg); }
  • 16. Additional client library features 16 ▪ Partitioned topics ▪ Transparent batching of messages ▪ Compression ▪ End-to-end checksum ▪ TLS encryption ▪ Individual and cumulative acknowledgment ▪ Client side stats
  • 18. Architecture / 1 18 Broker ‣ Clients interacts only with brokers ‣ No durable state Bookie ‣ Apache BookKeeper storage nodes ‣ Distributed write-ahead log ‣ Each machine stores data from many topicsPulsar Cluster ZK Producer Consumer Broker 1 Broker 3 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5 Broker 2
  • 19. Architecture / 2 19 Separate layers between brokers bookies ‣ Broker and bookies can be added independently ‣ Traffic can be shifted very quickly across brokers ‣ New bookies will ramp up on traffic quickly Pulsar Cluster ZK Producer Consumer Broker 1 Broker 3 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Bookie 5 Broker 2
  • 20. Architecture / 3 20 Pulsar Cluster Broker Bookie ZK Global ZK Service discovery Producer App Pulsar lib Replication Managed Ledger BK Client Global replicators Cache Dispatcher Consumer App Pulsar lib Load Balancer Client library ‣ Lookup correct broker through service discovery ‣ Direct connection to broker ‣ When connection is established, authentication and authorization are enforced ‣ Reconnect with back off strategy
  • 21. Architecture / 4 21 Pulsar Cluster Broker Bookie ZK Global ZK Service discovery Producer App Pulsar lib Replication Managed Ledger BK Client Global replicators Cache Dispatcher Consumer App Pulsar lib Load Balancer Dispatcher ‣ End-to-end async message processing ‣ Messages are relayed across producers, bookies and consumers with no copies ‣ Pooled ref-counted buffers Managed Ledger ‣ Abstraction for single topic storage ‣ Cache recent messages
  • 22. BookKeeper 22 ▪ Replicated log service ▪ Offer consistency and durability ▪ Restores replication factor after node failures ▪ Why is it a good choice for Pulsar? › Very efficient storage for sequential data › Very good distribution of IO across all bookies • For each topic we are creating multiple ledgers over time › Isolation of write and reads › Flexible model for quorum writes with different tradeoffs
  • 23. BookKeeper - Storage 23 ▪ A single bookie can serve and store thousands of ledgers ▪ Write and read paths are separated: › Avoid read activity to impact write latency › Writes are added to in- memory write-cache and committed to journal › Write cache is flushed in background to separated device ▪ Entries are sorted to allow for mostly sequential reads
  • 24. Single topic — Throughput and latency 24 Throughput and 99pct publish latency — 1 Topic — 1 Producer Latency(ms) 0 1 2 3 4 5 6 Throughput (msg/s) 1,000 10,000 100,000 1,000,000 10,000,000 1,800,000 10 Bytes 100 Bytes 1KB
  • 26. Future 26 ▪ WebSocket API › More language bindings based on top of it ▪ C++ API › Existing C++ client library is being prepared for OSS release ▪ End-to-End data encryption › Use symmetric/asymmetric encryption from producer to consumer › Data encrypted in flight and at rest › Don’t need to trust the service for security ▪ Globally consistent topics › Store the data in multiple regions › Can migrate across regions with consistency
  • 27. Final Remarks • Check out the code and docs at github.com/yahoo/pulsar • Give feedback or ask for more details on mailing lists: • Pulsar-Users • Pulsar-Dev