THE DREAM STREAM TEAM FOR PULSAR AND SPRING
TIM SPANN - STREAMNATIVE
For building Java application, Spring is the universal answer as it supplies all the connectors and integrations one could want. The same is true for Apache Pulsar as it provides connectors, integration and flexibility to any use case. Apache Pulsar has a robust native Java library to use with Spring as well as other protocol options.
ApachePulsar provides a cloud native, geo-replicated unified messaging platform that allows for many messaging paradims. This lends it self well to upgrading existing applications as Pulsar supports using libraries for WebSockets, MQTT, Kafka, JMS, AMQP and RocketMQ. In this talk I will build some example applications utilizing several different protocols for building a variety of applications from IoT to Microservices to Log Analytics.
https://2022.springio.net/sessions/the-dream-stream-team-for-pulsar-and-spring
SPRING I/O 2022
THE CONFERENCE
Spring I/O is the leading european conference focused on the Spring Framework ecosystem.
Join us in our 9th in-person edition!
May 26/27, 2022 Barcelona, Spain
3. Tim Spann
Developer Advocate
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems/ Data Architect
● Experience:
○ 15+ years of experience with batch and streaming
technologies including Pulsar, Flink, Spark, NiFi, Spring,
Java, Big Data, Cloud, MXNet, Hadoop, Datalakes, IoT
and more.
4. Our bags are packed.
Let’s begin the
Journey to
Real-Time
Unified Messaging
And
Streaming
6. ● Introduction
● What is Apache Pulsar?
● Spring Apps
○ Native Pulsar
○ AMQP
○ MQTT
○ Kafka
● Demo
● Q&A
This is the cover art for the vinyl LP "Unknown
Pleasures" by the artist Joy Division. The cover
art copyright is believed to belong to the label,
Factory Records, or the graphic artist(s).
13. Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although
message data can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for
things like topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer
name, the default name is used. Message De-Duplication.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of
the message is its order in that sequence. Message De-Duplication.
Messages - The Basic Unit of Pulsar
14. Pulsar Subscription Modes
Different subscription modes
have different semantics:
Exclusive/Failover -
guaranteed order, single active
consumer
Shared - multiple active
consumers, no order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
15. Flexible Pub/Sub API for Pulsar - Shared
Consumer consumer = client.newConsumer()
.topic("my-topic")
.subscriptionName("work-q-1")
.subscriptionType(SubType.Shared)
.subscribe();
16. Flexible Pub/Sub API for Pulsar - Failover
Consumer consumer = client.newConsumer()
.topic("my-topic")
.subscriptionName("stream-1")
.subscriptionType(SubType.Failover)
.subscribe();
17. Reader Interface
byte[] msgIdBytes = // Some byte
array
MessageId id =
MessageId.fromByteArray(msgIdBytes);
Reader<byte[]> reader =
pulsarClient.newReader()
.topic(topic)
.startMessageId(id)
.create();
Create a reader that will read from
some message between earliest and
latest.
Reader
24. Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
25. Pulsar Functions
● Lightweight
computation similar to
AWS Lambda.
● Specifically designed to
use Apache Pulsar as a
message bus.
● Function runtime can
be located within
Pulsar Broker.
A serverless event streaming
framework
26. import org.apache.pulsar.client.impl.schema.JSONSchema;
import org.apache.pulsar.functions.api.*;
public class AirQualityFunction implements Function<byte[], Void> {
@Override
public Void process(byte[] input, Context context) {
context.getLogger().debug("DBG:” + new String(input));
context.newOutputMessage(“topicname”,
JSONSchema.of(Observation.class))
.key(UUID.randomUUID().toString())
.property(“prop1”, “value1”)
.value(observation)
.send();
}
}
Your Code Here
Pulsar
Function SDK
27. ● Consume messages from one
or more Pulsar topics.
● Apply user-supplied
processing logic to each
message.
● Publish the results of the
computation to another topic.
● Support multiple
programming languages
(Java, Python, Go)
● Can leverage 3rd-party
libraries to support the
execution of ML models on
the edge.
Pulsar Functions
28. You can start really
small!
● A Docker container
● A Standalone Node
● Small MiniKube Pods
● Free Tier SN Cloud
31. Spring - Kafka - Pulsar
https://www.baeldung.com/spring-kafka
@Bean
public KafkaTemplate<String, Observation> kafkaTemplate() {
KafkaTemplate<String, Observation> kafkaTemplate =
new KafkaTemplate<String, Observation>(producerFactory());
return kafkaTemplate;
}
ProducerRecord<String, Observation> producerRecord = new ProducerRecord<>(topicName,
uuidKey.toString(),
message);
kafkaTemplate.send(producerRecord);
32. Spring - MQTT - Pulsar
https://roytuts.com/publish-subscribe-message-onto-mqtt-using-spring/
@Bean
public IMqttClient mqttClient(
@Value("${mqtt.clientId}") String clientId,
@Value("${mqtt.hostname}") String hostname,
@Value("${mqtt.port}") int port)
throws MqttException {
IMqttClient mqttClient = new MqttClient(
"tcp://" + hostname + ":" + port, clientId);
mqttClient.connect(mqttConnectOptions());
return mqttClient;
}
MqttMessage mqttMessage = new MqttMessage();
mqttMessage.setPayload(DataUtility.serialize(payload));
mqttMessage.setQos(0);
mqttMessage.setRetained(true);
mqttClient.publish(topicName, mqttMessage);
33. Spring - AMQP - Pulsar
https://www.baeldung.com/spring-amqp
rabbitTemplate.convertAndSend(topicName,
DataUtility.serializeToJSON(observation));
@Bean
public CachingConnectionFactory
connectionFactory() {
CachingConnectionFactory ccf =
new CachingConnectionFactory();
ccf.setAddresses(serverName);
return ccf;
}
34. Spring - Websockets - Pulsar
https://www.baeldung.com/websockets-spring
https://docs.streamnative.io/cloud/stable/connect/client/connect-websocket
35. Spring - Presto SQL/Trino -
Pulsar via Presto or JDBC
https://github.com/ifengkou/spring-boot-starter-data-presto
https://github.com/tspannhw/phillycrime-springboot-phoenix
https://github.com/tspannhw/FLiP-Into-Trino
https://trino.io/docs/current/installation/jdbc.html
48. streamnative.io
Passionate and dedicated team.
Founded by the original developers of
Apache Pulsar.
StreamNative helps teams to capture,
manage, and leverage data using Pulsar’s
unified messaging and streaming
platform.
49. Founded By The
Creators Of Apache Pulsar
Sijie Guo
ASF Member
Pulsar/BookKeeper PMC
Founder and CEO
Jia Zhai
Pulsar/BookKeeper PMC
Co-Founder
Matteo Merli
ASF Member
Pulsar/BookKeeper PMC
CTO
Data veterans with extensive industry experience
52. Built-in
Back
Pressure
Producer<String> producer = client.newProducer(Schema.STRING)
.topic("SpringIOBarcelona2022")
.blockIfQueueFull(true) // enable blocking if queue is full
.maxPendingMessages(10) // max queue size
.create();
// During Back Pressure: the sendAsync call blocks
// with no room in queues
producer.newMessage()
.key("mykey")
.value("myvalue")
.sendAsync(); // can be a blocking call
54. Producer-Consumer
Producer Consumer
Publisher sends data and
doesn't know about the
subscribers or their status.
All interactions go through
Pulsar and it handles all
communication.
Subscriber receives data
from publisher and never
directly interacts with it
Topic
Topic
55. Apache Pulsar: Messaging vs Streaming
Message Queueing - Queueing
systems are ideal for work queues
that do not require tasks to be
performed in a particular order.
Streaming - Streaming works
best in situations where the
order of messages is important.
56. 56
➔ Perform in Real-Time
➔ Process Events as They Happen
➔ Joining Streams with SQL
➔ Find Anomalies Immediately
➔ Ordering and Arrival Semantics
➔ Continuous Streams of Data
DATA STREAMING
57. Pulsar’s Publish-Subscribe model
Broker
Subscription
Consumer 1
Consumer 2
Consumer 3
Topic
Producer 1
Producer 2
● Producers send messages.
● Topics are an ordered, named channel that producers
use to transmit messages to subscribed consumers.
● Messages belong to a topic and contain an arbitrary
payload.
● Brokers handle connections and routes
messages between producers / consumers.
● Subscriptions are named configuration rules
that determine how messages are delivered to
consumers.
● Consumers receive messages.
58. Using Pulsar for Java Apps
58
High performance
High security
Multiple data consumers:
Transactions, CDC, fraud
detection with ML
Large data
volumes, high
scalability
Multi-tenancy and
geo-replication
59. streamnative.io
Accessing historical as well as real-time data
Pub/sub model enables event streams to be sent from
multiple producers, and consumed by multiple consumers
To process large amounts of data in a highly scalable way
67. import java.util.function.Function;
public class MyFunction implements Function<String, String> {
public String apply(String input) {
return doBusinessLogic(input);
}
}
Your Code Here
Pulsar Function
Java
69. Subscribing to a Topic and setting
Subscription Name Java
Consumer<byte[]> consumer = pulsarClient.newConsumer()
.topic(topic)
.subscriptionName(“subscriptionName")
.subscribe();
70. Run a Local Standalone Bare Metal
wget
https://archive.apache.org/dist/pulsar/pulsar-2.10.0/apache-pulsar-2.10.0-
bin.tar.gz
tar xvfz apache-pulsar-2.10.0-bin.tar.gz
cd apache-pulsar-2.10.0
bin/pulsar standalone
(For Pulsar SQL Support)
bin/pulsar sql-worker start
https://pulsar.apache.org/docs/en/standalone/
71. Building Tenant, Namespace, Topics
bin/pulsar-admin tenants create conference
bin/pulsar-admin namespaces create conference/spring
bin/pulsar-admin tenants list
bin/pulsar-admin namespaces list conference
bin/pulsar-admin topics create persistent://conference/spring/first
bin/pulsar-admin topics list conference/spring
77. Producer
Steps
(Async)
PulsarClient client = PulsarClient.builder() // Step 1 Create a client.
.serviceUrl("pulsar://broker1:6650")
.build(); // Client discovers all brokers
// Step 2 Create a producer from client.
Producer<String> producer = client.newProducer(Schema.STRING)
.topic("hellotopic")
.create(); // Topic lookup occurs, producer registered with broker
// Step 3 Message is built, which will be buffered in async.
CompletableFuture<String> msgFuture = producer.newMessage()
.key("mykey")
.value("myvalue")
.sendAsync(); // Client returns when message is stored in buffer
// Step 4 Message is flushed (usually automatically).
producer.flush();
// Step 5 Future returns once the message is sent and ack is returned.
msgFuture.get();
78. Producer
Steps
(Sync)
// Step 1 Create a client.
PulsarClient client = PulsarClient.builder()
.serviceUrl("pulsar://broker1:6650")
.build(); // Client discovers all brokers
// Step 2 Create a producer from client.
Producer<String> producer = client.newProducer(Schema.STRING)
.topic("persistent://public/default/hellotopic")
.create(); // Topic lookup occurs, producer registered with
broker.
// Step 3 Message is built, which will be sent immediately.
producer.newMessage()
.key("mykey")
.value("myvalue")
.send(); // client waits for send and ack
79. Subscribing to a Topic and setting
Subscription Name Java
Consumer<byte[]> consumer = pulsarClient.newConsumer()
.topic(topic)
.subscriptionName(“subscriptionName")
.subscribe();
80. Fanout, Queueing, or Streaming
In Pulsar, you have flexibility to use different subscription modes.
If you want to... Do this... And use these subscription
modes...
achieve fanout messaging
among consumers
specify a unique subscription name
for each consumer
exclusive (or failover)
achieve message queuing
among consumers
share the same subscription name
among multiple consumers
shared
allow for ordered consumption
(streaming)
distribute work among any number of
consumers
exclusive, failover or key shared
82. Delayed
Message
Producing
// Producer creation like normal
Producer<String> producer = client.newProducer(Schema.STRING)
.topic("hellotopic")
.create();
// With an explicit time
producer.newMessage()
.key("mykey")
.value("myvalue")
.deliverAt(millisSinceEpoch)
.send();
// Relative time
producer.newMessage()
.key("mykey")
.value("myvalue")
.deliverAfter(1, TimeUnit.HOURS)
.send();
83. Messaging
versus
Streaming
Messaging Use Cases Streaming Use Cases
Service x commands service y to
make some change.
Example: order service removing
item from inventory service
Moving large amounts of data to
another service (real-time ETL).
Example: logs to elasticsearch
Distributing messages that
represent work among n
workers.
Example: order processing not in
main “thread”
Periodic jobs moving large amounts of
data and aggregating to more
traditional stores.
Example: logs to s3
Sending “scheduled” messages.
Example: notification service for
marketing emails or push
notifications
Computing a near real-time aggregate
of a message stream, split among n
workers, with order being important.
Example: real-time analytics over page
views
84. Differences in consumption
Messaging Use Case Streaming Use Case
Retention The amount of data retained is
relatively small - typically only a day
or two of data at most.
Large amounts of data are retained,
with higher ingest volumes and
longer retention periods.
Throughput Messaging systems are not designed
to manage big “catch-up” reads.
Streaming systems are designed to
scale and can handle use cases
such as catch-up reads.
89. Messaging Ordering Guarantees
To guarantee message ordering, architect Pulsar to
take advantage of subscription modes and topic
ordering guarantees.
Topic Ordering Guarantees
● Messages sent to a single topic or partition DO have an ordering guarantee.
● Messages sent to different partitions DO NOT have an ordering guarantee.
Subscription Mode Guarantees
● A single consumer can receive messages from the same partition in order using an exclusive or failover
subscription mode.
● Multiple consumers can receive messages from the same key in order using the key_shared subscription
mode.