SlideShare uma empresa Scribd logo
1 de 49
Baixar para ler offline
Let's Monitor The Conditions at
the Conference
2
Tim Spann
Developer Advocate
Tim Spann, Developer Advocate at StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFI Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Pulsar,
Flink, Kafka, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and
more.
○ Today, he helps to grow the Pulsar community sharing rich technical
knowledge and experience at both global conferences and through
individual conversations.
David Kjerrumgaard
Developer Advocate
● Apache Pulsar Committer | Author of Pulsar
In Action
● Former Principal Software Engineer on
Splunk’s messaging team responsible for
Splunk’s internal Pulsar-as-a-Service
platform
● Former Director of Solution Architecture at
Streamlio
4
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
streamnative.io
Agenda
• Collect the Data
• Stream the Data
• Store the Data
• Share the Data
streamnative.io
What?
streamnative.io
Temp, Humidity, Air Quality, Energy, …
streamnative.io
Is this (I)IoT?
Edge Computing?
● Any computation happening
outside of the cloud, closer
to the edge of the network
● Operates on real-time data
generated by sensors or users
● Improves response times in
applications where real-time
processing of data is required
Edge Computing
streamnative.io
Extending the Data Processing Layer
PULSAR
Edge Compute
streamnative.io
● Apache Pulsar’s two-tier architecture separates the compute and
storage layers and interact with one another over a TCP/IP
connection. This allows us to run the computing layer (Broker) on
either Edge servers or IoT Gateway devices.
● Our example native applications can stream data via MQTT. We
can also write small apps in Java, Python, Golang and other
languages to send messages via WebSockets, HTTP, Pulsar, Kafka
or other protocols from modern Edge computers.
● Pulsar’s serverless computing framework, know as Pulsar
Functions, can run inside the Broker as threads. Effectively
“stretching” the data processing layer.
Edge Computing with Pulsar
streamnative.io
● Pulsar’s Serverless computing framework can run inside the Pulsar Broker
as a thread pool. This framework can be used as the execution environment
for ML models.
● The Apache Pulsar Broker supports the MQTT protocol and therefore can
directly receive incoming data from the sensor hubs and store it in a topic.
Benefits of Running Pulsar Broker on the Edge
PULSAR
Edge Compute
streamnative.io
● Containers
● 64 bit processors and operating systems
● 8-64 GB Modern RAM
● Fast WiFi / Bluetooth
● 300+ Core GPUs
● eMMC Fast Storage
● TBs of SSD
● Examples: NVIDIA JETSON XAVIER NX
Edge Computing Power - Edge Server
streamnative.io
Device 1 - AdaFruit Funhouse
• https://github.com/tspannhw/pulsar-adafruit-funhouse
(MQTT)
Raw JSON:
{"pressure": 1009.08,
"button_sel": "off",
"pir_sensor": "off",
"humidity": 36.0422, "temperature": 80.9526,
"button_down": "off", "captouch6": "off",
"captouch7": "off", "button_up": "off", "captouch8": "off",
"light": 6990}
Processor 240MHz / RAM 2+4MB
streamnative.io
Device 2 - Raspberry Pi
• https://github.com/tspannhw/FLiP-Pi-DeltaLake-Thermal
Pulsar Protocol
Raw JSON:
Processor 1.5 GHz, 64-bit quad-core / RAM 2-8 GB LPDDR4-3200 SDRAM
{"uuid": "thrml_zda_20220715182748", "ipaddress": "192.168.1.204",
"cputempf": 108, "runtime": 0, "host": "thermal", "hostname": "thermal",
"macaddress": "e4:5f:01:7c:3f:34", "endtime": "1657909668.7279365",
"te": "0.0007398128509521484", "cpu": 1.8,
"diskusage": "105078.0 MB",
"memory": 9.0, "rowid": "20220715182748_fc4cbbb1-79da-4c1a-8991-78bd23c9f221",
"systemtime": "07/15/2022 14:27:53", "ts": 1657909673,
"starttime": "07/15/2022 14:27:48",
"datetimestamp": "2022-07-15 18:27:52.492469+00:00", "
temperature": 28.238,
"humidity": 29.61, "co2": 992.0}
streamnative.io
Device 2 - RPI 4 - 2GB
streamnative.io
Device 3 - Mac M1 PowerBook
https://github.com/search?q=user%3Atspannhw+airquality&type=repositories
Pulsar, AMQP, MQTT, Kafka Protocols
Raw JSON: {"dateObserved":"2022-08-03",
"hourObserved":13,"localTimeZone":"CST",
"reportingArea":"El
Paso","stateCode":"TX","latitude":31.8493,
"longitude":-106.4375,
"parameterName":"PM10","aqi":23,
"category":{"number":1,"name":"Good","additionalP
roperties":{}},"additionalProperties":{}}
Processor Apple M1 Pro 10-core 3.2GHz CPU 16-core GPU/ RAM 32 GB
streamnative.io
HS100 Meter - Electric
https://github.com/tspannhw/FLiP-Py-Energy
streamnative.io
streamnative.io
streamnative.io
Apache Pulsar is a Cloud-Native
Messaging and Event-Streaming Platform.
Unified Messaging Model
Simplify your data infrastructure and
enable new use cases with queuing and
streaming capabilities in one platform.
Multi-tenancy
Enable multiple user groups to share the
same cluster, either via access control, or
in entirely different namespaces.
Scalability
Decoupled data computing and storage
enable horizontal scaling to handle data
scale and management complexity.
Geo-replication
Support for multi-datacenter replication
with both asynchronous and
synchronous replication for built-in
disaster recovery.
Tiered storage
Enable historical data to be offloaded to
cloud-native storage and store event
streams for indefinite periods of time.
Pulsar Benefits
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Key Pulsar Concepts: Architecture
MetaData
Storage
Pulsar Subscription Modes
Different subscription modes
have different semantics:
Exclusive/Failover - guaranteed
order, single active consumer
Shared - multiple active
consumers, no order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
Messaging
Ordering Guarantees
Topic Ordering Guarantees:
● Messages sent to a single topic or
partition DO have an ordering
guarantee.
● Messages sent to different partitions
DO NOT have an ordering guarantee.
28
Subscription Mode Guarantees:
● A single consumer can receive
messages from the same partition in
order using an exclusive or failover
subscription mode.
● Multiple consumers can receive
messages from the same key in order
using the key_shared subscription
mode.
Messaging
Ordering Guarantees
Topic Ordering Guarantees:
● Messages sent to a single topic or
partition DO have an ordering
guarantee.
● Messages sent to different partitions
DO NOT have an ordering guarantee.
29
Subscription Mode Guarantees:
● A single consumer can receive
messages from the same partition in
order using an exclusive or failover
subscription mode.
● Multiple consumers can receive
messages from the same key in order
using the key_shared subscription
mode.
Streaming
Consumer
Consumer
Consumer
Subscription
Shared
Failover
Consumer
Consumer
Subscription
In case of failure in
Consumer B-0
Consumer
Consumer
Subscription
Exclusive
X
Consumer
Consumer
Key-Shared
Subscription
Pulsar
Topic/Partition
Messaging
Unified Messaging
Model
Topics
Tenants
(Compliance)
Tenants
(Data Services)
Namespace
(Microservices)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-1
(Risk Detection)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Tenants
(Marketing)
Namespace
(Risk Assessment)
Pulsar Cluster
Pulsar Cluster
Kafka
On Pulsar
(KoP)
MQTT
On Pulsar
(MoP)
AMQP
On Pulsar
(AoP)
Connectivity
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Offloaders - Tiered Storage - (S3)
Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Presto/Trino workers can read segments
directly from bookies (or offloaded storage) in
parallel. Bookie
1
Segment 1
Producer Consumer
Broker 1
Topic1-Part1
Broker 2
Topic1-Part2
Broker 3
Topic1-Part3
Segment
2
Segment
3
Segment
4
Segment X
Segment 1
Segment
1 Segment 1
Segment 3
Segment
3
Segment 3
Segment 2
Segment
2
Segment 2
Segment 4
Segment 4
Segment
4
Segment X
Segment X
Segment X
Bookie
2
Bookie
3
Query
Coordin
ator
.
.
.
.
.
.
SQL
Worker
SQL
Worker
SQL
Worker
SQL
Worker
Query
Topic
Metadata
Pulsar SQL
Apache NiFi Pulsar Connector
https://streamnative.io/apache-nifi-connector/
SQL
select aqi, parameterName, dateObserved, hourObserved, latitude,
longitude, localTimeZone, stateCode, reportingArea from
airquality;
select max(aqi) as MaxAQI, parameterName, reportingArea from
airquality group by parameterName, reportingArea;
select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as
AvgAQI, count(aqi) as RowCount, parameterName, reportingArea
from airquality group by parameterName, reportingArea;
Building Spark SQL View
val dfPulsar = spark.readStream.format("pulsar")
.option("service.url", "pulsar://pulsar1:6650")
.option("admin.url", "http://pulsar1:8080")
.option("topic", "persistent://public/default/pi-sensors")
.load()
dfPulsar.printSchema()
val pQuery = dfPulsar.selectExpr("*")
.writeStream.format("console")
.option("truncate", false)
.start()
https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
IoT Data
IoT Ingestion: High-volume
streaming sources, sensors,
multiple message formats,
diverse protocols and
multi-vendor devices
creates data ingestion
challenges.
Other Sources: Transit data,
news, twitter, status feeds,
REST data, stock data and
more.
Demo Time
Q&A
Now Available
On-Demand Pulsar
Training
Academy.StreamNative.io
45
Resources
● For a first look at Pulsar benchmark report, share your email in the chat
● Join the Pulsar Slack channel - Apache-Pulsar.slack.com
● Follow @streamnativeio and @apache_pulsar on Twitter
● Contact StreamNative Sales - doug@streamnative.io
Too Many Tim Links
● https://dzone.com/articles/five-sensors-real-time-with-pulsar-and-python-on-a
● https://github.com/tspannhw/airquality
● https://github.com/tspannhw/FLiPN-AirQuality-REST
● https://github.com/tspannhw/pulsar-airquality-function
● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
● https://github.com/tspannhw/FLiPN-DEVNEXUS-2022
● https://github.com/tspannhw/FLiP-Pi-Thermal
● https://github.com/tspannhw/FLiP-Pi-Weather
● https://github.com/tspannhw/FLiP-RP400
● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal
StreamNative: By the Creators Of Apache Pulsar
✓ Original creators of Apache
Pulsar & BookKeeper
✓ Operated the largest
Pulsar/BookKeeper cluster
✓ Data veterans with extensive
industry experience
CONFIDENTIAL. DO NOT SHARE.
ASF Member
Pulsar/BookKeeper PMC
Founder and CEO
Sijie Guo
ASF Member
Pulsar/BookKeeper PMC
CTO
Matteo Merli
Pulsar/BookKeeper PMC
Co-Founder
Jia Zhai
Tim Spann
Developer Advocate
@PaaSDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw

Mais conteúdo relacionado

Semelhante a Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrumgaard | Current 2022

Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...Timothy Spann
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...Timothy Spann
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...Timothy Spann
 
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lakeTimothy Spann
 
Big data conference europe real-time streaming in any and all clouds, hybri...
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...Timothy Spann
 
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays
 
Building an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarScyllaDB
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
 
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8Timothy Spann
 
[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event Streaming[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event StreamingTimothy Spann
 
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Timothy Spann
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Timothy Spann
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
Serverless Event Streaming Applications as Functionson K8
Serverless Event Streaming Applications as Functionson K8Serverless Event Streaming Applications as Functionson K8
Serverless Event Streaming Applications as Functionson K8Timothy Spann
 
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022Timothy Spann
 
Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022Timothy Spann
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsTimothy Spann
 

Semelhante a Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrumgaard | Current 2022 (20)

Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
 
Music city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
 
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
 
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
 
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
 
Big data conference europe real-time streaming in any and all clouds, hybri...
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
 
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
apidays New York 2022 - Leveraging Event Streaming to Super-Charge your Busin...
 
Building an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache Pulsar
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
 
[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event Streaming[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event Streaming
 
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
 
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
Serverless Event Streaming Applications as Functionson K8
Serverless Event Streaming Applications as Functionson K8Serverless Event Streaming Applications as Functionson K8
Serverless Event Streaming Applications as Functionson K8
 
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
Using FLiP with InfluxDB for EdgeAI IoT at Scale 2022
 
Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
DBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
 
OSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
 

Mais de HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

Mais de HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrumgaard | Current 2022

  • 1. Let's Monitor The Conditions at the Conference
  • 2. 2
  • 3. Tim Spann Developer Advocate Tim Spann, Developer Advocate at StreamNative ● FLiP(N) Stack = Flink, Pulsar and NiFI Stack ● Streaming Systems & Data Architecture Expert ● Experience: ○ 15+ years of experience with streaming technologies including Pulsar, Flink, Kafka, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ○ Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  • 4. David Kjerrumgaard Developer Advocate ● Apache Pulsar Committer | Author of Pulsar In Action ● Former Principal Software Engineer on Splunk’s messaging team responsible for Splunk’s internal Pulsar-as-a-Service platform ● Former Director of Solution Architecture at Streamlio 4
  • 5. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends. https://bit.ly/32dAJft
  • 6. streamnative.io Agenda • Collect the Data • Stream the Data • Store the Data • Share the Data
  • 8. streamnative.io Temp, Humidity, Air Quality, Energy, …
  • 10. ● Any computation happening outside of the cloud, closer to the edge of the network ● Operates on real-time data generated by sensors or users ● Improves response times in applications where real-time processing of data is required Edge Computing
  • 11. streamnative.io Extending the Data Processing Layer PULSAR Edge Compute
  • 12. streamnative.io ● Apache Pulsar’s two-tier architecture separates the compute and storage layers and interact with one another over a TCP/IP connection. This allows us to run the computing layer (Broker) on either Edge servers or IoT Gateway devices. ● Our example native applications can stream data via MQTT. We can also write small apps in Java, Python, Golang and other languages to send messages via WebSockets, HTTP, Pulsar, Kafka or other protocols from modern Edge computers. ● Pulsar’s serverless computing framework, know as Pulsar Functions, can run inside the Broker as threads. Effectively “stretching” the data processing layer. Edge Computing with Pulsar
  • 13. streamnative.io ● Pulsar’s Serverless computing framework can run inside the Pulsar Broker as a thread pool. This framework can be used as the execution environment for ML models. ● The Apache Pulsar Broker supports the MQTT protocol and therefore can directly receive incoming data from the sensor hubs and store it in a topic. Benefits of Running Pulsar Broker on the Edge PULSAR Edge Compute
  • 14. streamnative.io ● Containers ● 64 bit processors and operating systems ● 8-64 GB Modern RAM ● Fast WiFi / Bluetooth ● 300+ Core GPUs ● eMMC Fast Storage ● TBs of SSD ● Examples: NVIDIA JETSON XAVIER NX Edge Computing Power - Edge Server
  • 15. streamnative.io Device 1 - AdaFruit Funhouse • https://github.com/tspannhw/pulsar-adafruit-funhouse (MQTT) Raw JSON: {"pressure": 1009.08, "button_sel": "off", "pir_sensor": "off", "humidity": 36.0422, "temperature": 80.9526, "button_down": "off", "captouch6": "off", "captouch7": "off", "button_up": "off", "captouch8": "off", "light": 6990} Processor 240MHz / RAM 2+4MB
  • 16. streamnative.io Device 2 - Raspberry Pi • https://github.com/tspannhw/FLiP-Pi-DeltaLake-Thermal Pulsar Protocol Raw JSON: Processor 1.5 GHz, 64-bit quad-core / RAM 2-8 GB LPDDR4-3200 SDRAM {"uuid": "thrml_zda_20220715182748", "ipaddress": "192.168.1.204", "cputempf": 108, "runtime": 0, "host": "thermal", "hostname": "thermal", "macaddress": "e4:5f:01:7c:3f:34", "endtime": "1657909668.7279365", "te": "0.0007398128509521484", "cpu": 1.8, "diskusage": "105078.0 MB", "memory": 9.0, "rowid": "20220715182748_fc4cbbb1-79da-4c1a-8991-78bd23c9f221", "systemtime": "07/15/2022 14:27:53", "ts": 1657909673, "starttime": "07/15/2022 14:27:48", "datetimestamp": "2022-07-15 18:27:52.492469+00:00", " temperature": 28.238, "humidity": 29.61, "co2": 992.0}
  • 18. streamnative.io Device 3 - Mac M1 PowerBook https://github.com/search?q=user%3Atspannhw+airquality&type=repositories Pulsar, AMQP, MQTT, Kafka Protocols Raw JSON: {"dateObserved":"2022-08-03", "hourObserved":13,"localTimeZone":"CST", "reportingArea":"El Paso","stateCode":"TX","latitude":31.8493, "longitude":-106.4375, "parameterName":"PM10","aqi":23, "category":{"number":1,"name":"Good","additionalP roperties":{}},"additionalProperties":{}} Processor Apple M1 Pro 10-core 3.2GHz CPU 16-core GPU/ RAM 32 GB
  • 19. streamnative.io HS100 Meter - Electric https://github.com/tspannhw/FLiP-Py-Energy
  • 23. Apache Pulsar is a Cloud-Native Messaging and Event-Streaming Platform.
  • 24.
  • 25. Unified Messaging Model Simplify your data infrastructure and enable new use cases with queuing and streaming capabilities in one platform. Multi-tenancy Enable multiple user groups to share the same cluster, either via access control, or in entirely different namespaces. Scalability Decoupled data computing and storage enable horizontal scaling to handle data scale and management complexity. Geo-replication Support for multi-datacenter replication with both asynchronous and synchronous replication for built-in disaster recovery. Tiered storage Enable historical data to be offloaded to cloud-native storage and store event streams for indefinite periods of time. Pulsar Benefits
  • 26. ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Key Pulsar Concepts: Architecture MetaData Storage
  • 27. Pulsar Subscription Modes Different subscription modes have different semantics: Exclusive/Failover - guaranteed order, single active consumer Shared - multiple active consumers, no order Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1, V 10 > < K 1, V 11 > < K 1, V 12 > < K 2 ,V 2 0 > < K 2 ,V 2 1> < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1, V 10 > < K 2, V 21 > < K 1, V 12 > < K 2 ,V 2 0 > < K 1, V 11 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover
  • 28. Messaging Ordering Guarantees Topic Ordering Guarantees: ● Messages sent to a single topic or partition DO have an ordering guarantee. ● Messages sent to different partitions DO NOT have an ordering guarantee. 28 Subscription Mode Guarantees: ● A single consumer can receive messages from the same partition in order using an exclusive or failover subscription mode. ● Multiple consumers can receive messages from the same key in order using the key_shared subscription mode.
  • 29. Messaging Ordering Guarantees Topic Ordering Guarantees: ● Messages sent to a single topic or partition DO have an ordering guarantee. ● Messages sent to different partitions DO NOT have an ordering guarantee. 29 Subscription Mode Guarantees: ● A single consumer can receive messages from the same partition in order using an exclusive or failover subscription mode. ● Multiple consumers can receive messages from the same key in order using the key_shared subscription mode.
  • 30. Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging Unified Messaging Model
  • 31. Topics Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Cluster Pulsar Cluster
  • 35. Connectivity • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3)
  • 36.
  • 37. Schema Registry Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  • 38. Presto/Trino workers can read segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordin ator . . . . . . SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata Pulsar SQL
  • 39. Apache NiFi Pulsar Connector https://streamnative.io/apache-nifi-connector/
  • 40. SQL select aqi, parameterName, dateObserved, hourObserved, latitude, longitude, localTimeZone, stateCode, reportingArea from airquality; select max(aqi) as MaxAQI, parameterName, reportingArea from airquality group by parameterName, reportingArea; select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as AvgAQI, count(aqi) as RowCount, parameterName, reportingArea from airquality group by parameterName, reportingArea;
  • 41. Building Spark SQL View val dfPulsar = spark.readStream.format("pulsar") .option("service.url", "pulsar://pulsar1:6650") .option("admin.url", "http://pulsar1:8080") .option("topic", "persistent://public/default/pi-sensors") .load() dfPulsar.printSchema() val pQuery = dfPulsar.selectExpr("*") .writeStream.format("console") .option("truncate", false) .start() https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
  • 42. IoT Data IoT Ingestion: High-volume streaming sources, sensors, multiple message formats, diverse protocols and multi-vendor devices creates data ingestion challenges. Other Sources: Transit data, news, twitter, status feeds, REST data, stock data and more.
  • 44. Q&A
  • 46. Resources ● For a first look at Pulsar benchmark report, share your email in the chat ● Join the Pulsar Slack channel - Apache-Pulsar.slack.com ● Follow @streamnativeio and @apache_pulsar on Twitter ● Contact StreamNative Sales - doug@streamnative.io
  • 47. Too Many Tim Links ● https://dzone.com/articles/five-sensors-real-time-with-pulsar-and-python-on-a ● https://github.com/tspannhw/airquality ● https://github.com/tspannhw/FLiPN-AirQuality-REST ● https://github.com/tspannhw/pulsar-airquality-function ● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden ● https://github.com/tspannhw/FLiPN-DEVNEXUS-2022 ● https://github.com/tspannhw/FLiP-Pi-Thermal ● https://github.com/tspannhw/FLiP-Pi-Weather ● https://github.com/tspannhw/FLiP-RP400 ● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal
  • 48. StreamNative: By the Creators Of Apache Pulsar ✓ Original creators of Apache Pulsar & BookKeeper ✓ Operated the largest Pulsar/BookKeeper cluster ✓ Data veterans with extensive industry experience CONFIDENTIAL. DO NOT SHARE. ASF Member Pulsar/BookKeeper PMC Founder and CEO Sijie Guo ASF Member Pulsar/BookKeeper PMC CTO Matteo Merli Pulsar/BookKeeper PMC Co-Founder Jia Zhai