Python web conference 2022 apache pulsar development 101 with python (f li-p-py)
18 de Mar de 2022•0 gostou
2 gostaram
Seja o primeiro a gostar disto
mostrar mais
•280 visualizações
visualizações
Vistos totais
0
No Slideshare
0
De incorporações
0
Número de incorporações
0
Baixar para ler offline
Denunciar
Tecnologia
Python web conference 2022 apache pulsar development 101 with python (FLiP-Py)
What is Apache Pulsar?
Python 3 Coding
Python Consumers
Python Producers
Python via MQTT, Web Sockets, Kafka
Python for Pulsar Functions
Schemas
● What is Apache Pulsar?
● Python 3 Coding
● Python Consumers
● Python Producers
● Python via MQTT, Web
Sockets, Kafka
● Python for Pulsar Functions
● Schemas
Tim Spann
Developer Advocate
Tim Spann, Developer Advocate at StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFI Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Pulsar,
Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more.
○ Today, he helps to grow the Pulsar community sharing rich technical
knowledge and experience at both global conferences and through
individual conversations.
streamnative.io
Passionate and dedicated team.
Founded by the original developers of
Apache Pulsar.
StreamNative helps teams to capture,
manage, and leverage data using Pulsar’s
unified messaging and streaming
platform.
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
Apache Pulsar Training
● Instructor-led courses
○ Pulsar Fundamentals
○ Pulsar Developers
○ Pulsar Operations
● On-demand learning with labs
● 300+ engineers, admins and architects trained!
StreamNative Academy
Now Available
On-Demand
Pulsar Training
Academy.StreamNative.io
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Pulsar Cluster
Metadata Store
(ZK, RocksDB, etcd, …)
Pulsar’s Publish-Subscribe model
Broker
Subscription
Consumer 1
Consumer 2
Consumer 3
Topic
Producer 1
Producer 2
● Producers send messages.
● Topics are an ordered, named channel that
producers use to transmit messages to
subscribed consumers.
● Messages belong to a topic and contain an
arbitrary payload.
● Brokers handle connections and routes
messages between producers / consumers.
● Subscriptions are named configuration
rules that determine how messages are
delivered to consumers.
● Consumers receive messages.
Messages - the Basic Unit of Pulsar
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data
can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like
topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer name, the
default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the
message is its order in that sequence.
Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Pulsar Functions
● Lightweight computation
similar to AWS Lambda.
● Specifically designed to use
Apache Pulsar as a message
bus.
● Function runtime can be
located within Pulsar Broker.
● Python Functions
A serverless event streaming
framework
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries
to support the execution of ML
models on the edge.
Pulsar Functions
Function Mesh
Pulsar Functions, along with Pulsar
IO/Connectors, provide a powerful API for
ingesting, transforming, and outputting data.
Function Mesh, another StreamNative
project, makes it easier for developers to
create entire applications built from sources,
functions, and sinks all through a declarative
API.
Run a Local Standalone Bare Metal
wget
https://archive.apache.org/dist/pulsar/pulsar-2.9.1/apache-pulsar-2.9.1-bi
n.tar.gz
tar xvfz apache-pulsar-2.9.1-bin.tar.gz
cd apache-pulsar-2.9.1
bin/pulsar standalone
(For Pulsar SQL Support)
bin/pulsar sql-worker start
https://pulsar.apache.org/docs/en/standalone/
<or> Run in StreamNative Cloud
Scan the QR code to earn
$200 in cloud credit
Building Tenant, Namespace, Topics
bin/pulsar-admin tenants create conference
bin/pulsar-admin namespaces create conference/pythonweb
bin/pulsar-admin tenants list
bin/pulsar-admin namespaces list conference
bin/pulsar-admin topics create persistent://conference/pythonweb/first
bin/pulsar-admin topics list conference/pythonweb
Install Python 3 Pulsar Client
pip3 install pulsar-client=='2.9.1[all]'
# Depending on Platform May Need to Build C++ Client
For Python on Pulsar on Pi https://github.com/tspannhw/PulsarOnRaspberryPi
https://pulsar.apache.org/docs/en/client-libraries-python/
Building a Python 3 Producer
import pulsar
client = pulsar.Client('pulsar://localhost:6650')
producer = client.create_producer('persistent://conference/pythonweb/first')
producer.send(('Simple Text Message').encode('utf-8'))
client.close()
Pulsar IO Functions in Python
from pulsar import Function
import json
class Chat(Function):
def __init__(self):
pass
def process(self, input, context):
logger = context.get_logger()
msg_id = context.get_message_id()
fields = json.loads(input)
https://github.com/tspannhw/pulsar-pychat-function
Python For Pulsar on Pi
● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
● https://github.com/tspannhw/FLiP-Pi-Thermal
● https://github.com/tspannhw/FLiP-Pi-Weather
● https://github.com/tspannhw/FLiP-RP400
● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal
● https://github.com/tspannhw/FLiP-PY-FakeDataPulsar
● https://github.com/tspannhw/FLiP-Py-Pi-EnviroPlus
● https://github.com/tspannhw/PythonPulsarExamples
● https://github.com/tspannhw/pulsar-pychat-function
Connect with the Community
● Join the Pulsar Slack channel - Apache-Pulsar.slack.com
● Follow @streamnativeio and @apache_pulsar on Twitter
● Subscribe to Monthly Pulsar Newsletter for major news, events, project
updates, and resources in the Pulsar community
40
Let’s Keep
in Touch!
Tim Spann
Developer Advocate
@PassDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
Pulsar Subscription Modes
Different subscription modes have
different semantics:
Exclusive/Failover - guaranteed
order, single active consumer
Shared - multiple active consumers,
no order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
<or> Run in Docker
docker run -it
-p 6650:6650
-p 8080:8080
--mount source=pulsardata,target=/pulsar/data
--mount source=pulsarconf,target=/pulsar/conf
apachepulsar/pulsar:2.9.1
bin/pulsar standalone
https://pulsar.apache.org/docs/en/standalone-docker/
<or> Run in K8
https://github.com/streamnative/terraform-provider-pulsar#requirements
https://pulsar.apache.org/docs/en/helm-overview/
https://docs.streamnative.io/platform/v1.0.0/quickstart
Using NVIDIA Jetson Devices With Pulsar
https://dev.to/tspannhw/unboxing-the-most-amazing-edge-ai-devic
e-part-1-of-3-nvidia-jetson-xavier-nx-595k
https://github.com/tspannhw/minifi-xaviernx/
https://github.com/tspannhw/minifi-jetson-nano
https://github.com/tspannhw/Flip-iot
https://www.datainmotion.dev/2020/10/flank-streaming-edgeai-on-
new-nvidia.html
https://github.com/tspannhw/FLiP-Mobile/blob/30bcc1ec98fc31e0
39b51a06180d98545c1e0542/python3/enviro.py
https://github.com/tspannhw/FLiP-Energy
https://github.com/tspannhw/FLiP-ApacheCon2021WrapUp
Messaging Ordering Guarantees
To guarantee message ordering, architect Pulsar to take advantage of subscription
modes and topic ordering guarantees.
Topic Ordering Guarantees
● Messages sent to a single topic or partition DO have an ordering guarantee.
● Messages sent to different partitions DO NOT have an ordering guarantee.
Subscription Mode Guarantees
● A single consumer can receive messages from the same partition in order using an
exclusive or failover subscription mode.
● Multiple consumers can receive messages from the same key in order using the
key_shared subscription mode.