Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrumgaard | Current 2022
At home, I monitor the temperature, humidity, gas levels, ozone, air quality, and other features around my desk.
Let's bring this to the different spots around the conference including lunch tables, vendor booths, hotel rooms, and more. I need to know about these readings now, not when I get back home from the conference. We need to get these sensor readings immediately in case we need to turn on a fan or move to another area. We will also see if my talk produces a lot of hot air!?!??
My setup is pretty simple, a raspberry pi, a breakout garden sensor mount, and as many sensors as I am willing to fly to Austin. The software stack is Python and Java, Apache Pulsar, MQTT, HTML, JQuery, and Apache Kafka.
https://dzone.com/articles/five-sensors-real-time-with-pulsar-and-python-on-a
https://www.datainmotion.dev/2022/04/flip-py-pi-enviroplus-using-apache.html
https://dzone.com/articles/pulsar-in-python-on-pi
3. Tim Spann
Developer Advocate
Tim Spann, Developer Advocate at StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFI Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Pulsar,
Flink, Kafka, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and
more.
○ Today, he helps to grow the Pulsar community sharing rich technical
knowledge and experience at both global conferences and through
individual conversations.
4. David Kjerrumgaard
Developer Advocate
● Apache Pulsar Committer | Author of Pulsar
In Action
● Former Principal Software Engineer on
Splunk’s messaging team responsible for
Splunk’s internal Pulsar-as-a-Service
platform
● Former Director of Solution Architecture at
Streamlio
4
5. FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
10. ● Any computation happening
outside of the cloud, closer
to the edge of the network
● Operates on real-time data
generated by sensors or users
● Improves response times in
applications where real-time
processing of data is required
Edge Computing
12. streamnative.io
● Apache Pulsar’s two-tier architecture separates the compute and
storage layers and interact with one another over a TCP/IP
connection. This allows us to run the computing layer (Broker) on
either Edge servers or IoT Gateway devices.
● Our example native applications can stream data via MQTT. We
can also write small apps in Java, Python, Golang and other
languages to send messages via WebSockets, HTTP, Pulsar, Kafka
or other protocols from modern Edge computers.
● Pulsar’s serverless computing framework, know as Pulsar
Functions, can run inside the Broker as threads. Effectively
“stretching” the data processing layer.
Edge Computing with Pulsar
13. streamnative.io
● Pulsar’s Serverless computing framework can run inside the Pulsar Broker
as a thread pool. This framework can be used as the execution environment
for ML models.
● The Apache Pulsar Broker supports the MQTT protocol and therefore can
directly receive incoming data from the sensor hubs and store it in a topic.
Benefits of Running Pulsar Broker on the Edge
PULSAR
Edge Compute
14. streamnative.io
● Containers
● 64 bit processors and operating systems
● 8-64 GB Modern RAM
● Fast WiFi / Bluetooth
● 300+ Core GPUs
● eMMC Fast Storage
● TBs of SSD
● Examples: NVIDIA JETSON XAVIER NX
Edge Computing Power - Edge Server
23. Apache Pulsar is a Cloud-Native
Messaging and Event-Streaming Platform.
24.
25. Unified Messaging Model
Simplify your data infrastructure and
enable new use cases with queuing and
streaming capabilities in one platform.
Multi-tenancy
Enable multiple user groups to share the
same cluster, either via access control, or
in entirely different namespaces.
Scalability
Decoupled data computing and storage
enable horizontal scaling to handle data
scale and management complexity.
Geo-replication
Support for multi-datacenter replication
with both asynchronous and
synchronous replication for built-in
disaster recovery.
Tiered storage
Enable historical data to be offloaded to
cloud-native storage and store event
streams for indefinite periods of time.
Pulsar Benefits
26. ● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Key Pulsar Concepts: Architecture
MetaData
Storage
27. Pulsar Subscription Modes
Different subscription modes
have different semantics:
Exclusive/Failover - guaranteed
order, single active consumer
Shared - multiple active
consumers, no order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
28. Messaging
Ordering Guarantees
Topic Ordering Guarantees:
● Messages sent to a single topic or
partition DO have an ordering
guarantee.
● Messages sent to different partitions
DO NOT have an ordering guarantee.
28
Subscription Mode Guarantees:
● A single consumer can receive
messages from the same partition in
order using an exclusive or failover
subscription mode.
● Multiple consumers can receive
messages from the same key in order
using the key_shared subscription
mode.
29. Messaging
Ordering Guarantees
Topic Ordering Guarantees:
● Messages sent to a single topic or
partition DO have an ordering
guarantee.
● Messages sent to different partitions
DO NOT have an ordering guarantee.
29
Subscription Mode Guarantees:
● A single consumer can receive
messages from the same partition in
order using an exclusive or failover
subscription mode.
● Multiple consumers can receive
messages from the same key in order
using the key_shared subscription
mode.
37. Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
40. SQL
select aqi, parameterName, dateObserved, hourObserved, latitude,
longitude, localTimeZone, stateCode, reportingArea from
airquality;
select max(aqi) as MaxAQI, parameterName, reportingArea from
airquality group by parameterName, reportingArea;
select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as
AvgAQI, count(aqi) as RowCount, parameterName, reportingArea
from airquality group by parameterName, reportingArea;
41. Building Spark SQL View
val dfPulsar = spark.readStream.format("pulsar")
.option("service.url", "pulsar://pulsar1:6650")
.option("admin.url", "http://pulsar1:8080")
.option("topic", "persistent://public/default/pi-sensors")
.load()
dfPulsar.printSchema()
val pQuery = dfPulsar.selectExpr("*")
.writeStream.format("console")
.option("truncate", false)
.start()
https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
42. IoT Data
IoT Ingestion: High-volume
streaming sources, sensors,
multiple message formats,
diverse protocols and
multi-vendor devices
creates data ingestion
challenges.
Other Sources: Transit data,
news, twitter, status feeds,
REST data, stock data and
more.
46. Resources
● For a first look at Pulsar benchmark report, share your email in the chat
● Join the Pulsar Slack channel - Apache-Pulsar.slack.com
● Follow @streamnativeio and @apache_pulsar on Twitter
● Contact StreamNative Sales - doug@streamnative.io
47. Too Many Tim Links
● https://dzone.com/articles/five-sensors-real-time-with-pulsar-and-python-on-a
● https://github.com/tspannhw/airquality
● https://github.com/tspannhw/FLiPN-AirQuality-REST
● https://github.com/tspannhw/pulsar-airquality-function
● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
● https://github.com/tspannhw/FLiPN-DEVNEXUS-2022
● https://github.com/tspannhw/FLiP-Pi-Thermal
● https://github.com/tspannhw/FLiP-Pi-Weather
● https://github.com/tspannhw/FLiP-RP400
● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal
48. StreamNative: By the Creators Of Apache Pulsar
✓ Original creators of Apache
Pulsar & BookKeeper
✓ Operated the largest
Pulsar/BookKeeper cluster
✓ Data veterans with extensive
industry experience
CONFIDENTIAL. DO NOT SHARE.
ASF Member
Pulsar/BookKeeper PMC
Founder and CEO
Sijie Guo
ASF Member
Pulsar/BookKeeper PMC
CTO
Matteo Merli
Pulsar/BookKeeper PMC
Co-Founder
Jia Zhai