O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Carregando em…3
×

Confira estes a seguir

1 de 48 Anúncio

Mais Conteúdo rRelacionado

Semelhante a Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud) (20)

Mais de Timothy Spann (17)

Anúncio

Mais recentes (20)

Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)

  1. 1. Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud) Tim Spann Developer Advocate David Kjerrumgaard Developer Advocate
  2. 2. Proprietary & Confidential | ● Apache Pulsar Committer | Author of Pulsar In Action ● Former Principal Software Engineer on Splunk’s messaging team that is responsible for Splunk’s internal Pulsar-as-a-Service platform. ● Former Director of Solution Architecture at Streamlio. David Kjerrumgaard Developer Advocate
  3. 3. 3 Apache Pulsar in Action Please enjoy David’s complete book which is the ultimate guide to Pulsar.
  4. 4. Proprietary & Confidential | Agenda 4 ● Introduction to Apache Pulsar ● Let’s Build an App! ● Demo ● Resources
  5. 5. Proprietary & Confidential | 5 Why Pulsar?
  6. 6. Pulsar Has a Built-in Super Set of OSS Features Durability Scalability Geo-Replication Multi-Tenancy Unified Messaging Model Reduced Vendor Dependency Functions Open-Source Features
  7. 7. CREATED Originally developed inside Yahoo! as Cloud Messaging Service GROWTH 10x Contributors 10MM+ Downloads Ecosystem Expands Kafka on Pulsar AMQ on Pulsar Functions . . . 2012 2016 2018 TODAY APACHE TLP Pulsar becomes Apache top level project. OPEN SOURCE Pulsar committed to open source. Apache Pulsar Timeline
  8. 8. Key features Apache Pulsar Apache Kafka Message retention (time-based) ✔ ✔ Message replay ✔ ✔ Message retention (acknowledge-based) ✔ ✘ Built-in tiered storage ✔ ✘ Processing capabilities (fully managed) Pulsar Functions KStream Queue semantics (round robin) ✔ ✘ Queue semantics (key based) ✔ ✘ Dead letter queue ✔ ✘
  9. 9. Key features Apache Pulsar Apache Kafka Scheduled and delayed delivery ✔ ✔ Rebalance-free scaling ✔ ✔ Elastically scalable ✔ ✘ Maximum number of topics Millions Up to 100k Built-in multi-tenancy ✔ ✘ Built-in geo-replication ✔ ✘ Built-in schema management ✔ ✘ End-to-end encryption ✔ ✘
  10. 10. Proprietary & Confidential | Guaranteed Message Delivery 10 Unified Messaging Platform Resiliency The basics Infinite Scalability
  11. 11. Proprietary & Confidential | 11 Pulsar Cluster ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Metadata Storage
  12. 12. Proprietary & Confidential | Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Cluster 12 Tenant - Namespaces - Topics
  13. 13. Proprietary & Confidential | Messages - the basic unit of Pulsar 13 Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence.
  14. 14. Proprietary & Confidential | 14 Producer-Consumer Publisher sends data and doesn't know about the subscribers or their status. Producer Consumer Topic All interactions go through Pulsar and it handles all communication. Subscriber receives data from publisher and never directly interacts with it.
  15. 15. Proprietary & Confidential | Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging 15
  16. 16. Proprietary & Confidential | Pulsarʼs Publish-Subscribe model 16 ● Producers send messages. ● Topics are an ordered, named channels that producers use to transmit messages to subscribed consumers. ● Messages belong to a topic and contain an arbitrary payload. ● Brokers handle connections and routes messages between producers / consumers. ● Subscriptions are named configuration rules that determine how messages are delivered to consumers. ● Consumers receive messages. Broker Subscription Consumer 1 Consumer 2 Consumer 3 Topic Producer 1 Producer 2
  17. 17. Proprietary & Confidential | Pulsar Subscription Modes 17 Different subscription modes have different semantics: ● Exclusive/Failover - guaranteed order, single active consumer ● Shared - multiple active consumers, no order ● Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1, V 10 > < K 1, V 11 > < K 1, V 12 > < K 2 ,V 2 0 > < K 2 ,V 2 1> < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1, V 10 > < K 2, V 21 > < K 1, V 12 > < K 2 ,V 2 0 > < K 1, V 11 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover
  18. 18. Proprietary & Confidential | 18
  19. 19. Proprietary & Confidential | Integrated Schema Registry 19 Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  20. 20. Proprietary & Confidential | Data Offloaders (Tiered Storage) Client Libraries StreamNative Pulsar ecosystem hub.streamnative.io Connectors (Sources & Sinks) Protocol Handlers Pulsar Functions (Lightweight Stream Processing) Processing Engines … and more! … and more!
  21. 21. Proprietary & Confidential | The FliPN kitten crosses the stream 4 ways with Apache Pulsar 21
  22. 22. Proprietary & Confidential | Kafka on Pulsar (KoP) 22
  23. 23. Proprietary & Confidential | MQTT on Pulsar (MoP) 23
  24. 24. Proprietary & Confidential | AMQP on Pulsar (AoP) 24
  25. 25. Proprietary & Confidential | Kafka to Pulsar 25
  26. 26. Proprietary & Confidential | Lakehouse to Pulsar 26
  27. 27. Proprietary & Confidential | 28 A serverless event streaming framework Pulsar Functions ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. ● Java Functions
  28. 28. Proprietary & Confidential | 29 Pulsar Functions ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge.
  29. 29. Proprietary & Confidential | 30 Tim Spann Developer Advocate at StreamNative FLiP(N) Stack = Flink, Pulsar and NiFi Stack Streaming Systems & Data Architecture Expert Experience: ● 15+ years of experience with streaming technologies including Pulsar, Kafka, Flink, Spark, NiFi, Spring, Big Data, Cloud, ML, IoT, Python and more. ● Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  30. 30. Proprietary & Confidential | 31 Building Real-Time Requires a Team
  31. 31. Proprietary & Confidential | 32 NiFi-Pulsar Two-Way Connector
  32. 32. Proprietary & Confidential | Apache NiFi - Apache Pulsar Connector 33 https://github.com/streamnative/pulsar-nifi-bundle
  33. 33. Proprietary & Confidential | Apache NiFi - Apache Pulsar Connector 34
  34. 34. Proprietary & Confidential | Apache NiFi - Apache Pulsar Connector 35 https://github.com/david-streamlio/pulsar-nifi-bundle/releases/tag/v1.14.0
  35. 35. Proprietary & Confidential | 36 Flink-Pulsar Sink Connector
  36. 36. ● Unified computing engine ● Batch processing is a special case of stream processing ● Stateful processing ● Massive Scalability ● Flink SQL for queries, inserts against Pulsar Topics ● Streaming Analytics ● Continuous SQL ● Continuous ETL ● Complex Event Processing ● Standard SQL Powered by Apache Calcite Apache Flink
  37. 37. streamnative.io SQL select aqi, parameterName, dateObserved, hourObserved, latitude, longitude, localTimeZone, stateCode, reportingArea from airquality; select max(aqi) as MaxAQI, parameterName, reportingArea from airquality group by parameterName, reportingArea; select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as AvgAQI, count(aqi) as RowCount, parameterName, reportingArea from airquality group by parameterName, reportingArea;
  38. 38. streamnative.io SQL
  39. 39. Demo
  40. 40. REST + Spring Boot + Pulsar + Friends
  41. 41. https://streamnative.io/blog/engineering/2022-05-26-spring-into-pulsar/ Introduction to Spring with Apache Pulsar
  42. 42. Spring into Pulsar Part 2
  43. 43. Proprietary & Confidential | https://bit.ly/32dAJft 47 FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends.
  44. 44. @PaaSDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw https://streamnative.io/pulsar-python/ 48 Tim Spann Developer Advocate at StreamNative

×