Pulsar Functions is a succinct framework provided by Apache Pulsar to conduct real-time data processing. Its use cases include ETL pipeline, event-driven applications, and simple data analytics. While Pulsar Functions already provides an extremely simple programming interface, we want to further lower the barrier for users to access real-time data. Since SQL is one of the universal languages in the technology world and well accepted by the vast majority of data engineers, we decided to add a SQL expressing layer on top of Pulsar Functions runtime. In this talk, we will discuss the architecture and implementation of this new service. We will see how SQL syntax, Pulsar Functions, and Function Mesh can work together to deliver a unique user development experience for real-time data jobs in the cloud environment. We will also walk through use cases like filtering, routing, and projecting messages as well as integrating with the Pulsar IO Connectors framework.
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Simplify Pulsar Functions Development with SQL - Pulsar Summit SF 2022
1. Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Ecosystem
Simplify Pulsar
Functions
Development with SQL
Neng Lu
Platform Engineering Lead • StreamNative
2. Neng Lu is the platform engineering
lead of compute at StreamNative. He
drives the development of Pulsar
Functions, Serverless Computing and
ecosystem integration. He is also a
committer of Apache Pulsar.
Neng Lu
Platform Engineering Lead
StreamNative
Rui Fu is a senior software engineer at
StreamNative and a committer of
Apache Pulsar. He actively contributes
to Pulsar Functions, Function Mesh and
Serverless Computing
Rui Fu
Senior Software Engineer
StreamNative
9. Function Worker Recap
● Function Worker interleaves with Pulsar Broker
● Need to set up separate Function Worker cluster
● Function Worker relies on Pulsar Topics for scheduling
● Function Worker’s k8s runtime not truly cloud native
11. Function Mesh – Recap
● Serverless framework to run Pulsar Functions in a cloud native way
● Consists of:
○ Set of CRDs for defining Pulsar Functions and Connectors
■ Function
■ Source
■ Sink
○ Operator that constantly reconciles the submitted CR
■ create sts, service, configmap, etc.
■ update according to user change
■ auto-scale if configured
13. Function Mesh – Summary
● Scheduling by Kubernetes not Function Worker
○ Simplicity
○ Reliability
○ Stability (both for function & brokers)
○ Extensibility (HPA, VPA, Scale-To-Zero etc)
● Compatible with Pulsar Admin Rest API
○ Seamless user experience
15. Use Case 1 – Filtering/Routing
● Commonly used for different business purposes → duplicated
code development
● Go through the whole Pulsar Functions dev life cycle
○ (Learn)
○ Develop
○ Package
○ Debug
○ Deploy
16. Use Case 2 – Connector with Transformations
● Long pipeline:
○ Connector
○ Transformation Function (Often duplicated with minor diffs)
○ Intermediate topic
● Go through the Pulsar Functions life cycle TWICE:
○ Connector
■ Develop(optional)
■ …
○ Transformation Function
■ Develop
■ Package
■ …
25. SQL Abstraction – Syntax
● Value Expression
○ Literal: Primitive value, like string, number, or boolean
○ Field: message payload field
○ KEY: message key
○ PROPERTIES[P_KEY]: message property
● WITH Item Definition
○ WITH MERGE KEYVALUE: Merge the fields of KeyValue
schema
○ WITH UNWRAP KEY|VALUE: Extract Key or Value fields from
KeyValue schema
29. SQL Abstraction – Runner
● An implementation of Pulsar
Functions API
● Accept the JSON
representation
● Generate Filtering/Routing
processor during initialization
● Utilize `GenericObject` to
handle different schemas
● Directly push result into target
topic
30. SQL Abstraction – Runner
● Processor
○ An interface for classes that
implement data transformations
○ schema projections
○ data manipulations
○ data type conversions
● Chain Compiler
○ List<Processor>
○ Compiled from the SQL Context
31. SQL Gateway – REST APIs
Query Management /snsql/query POST
/snsql/query/pause/$NAME GET
/snsql/query/resume/$NAME GET
/snsql/query/delete/$NAME GET
/snsql/query/status/$NAME GET
/snsql/query/stats/$NAME GET
Gateway Information /snsql/info GET
/snsql/healthcheck GET
32. SQL Gateway – REST Server
● Quarkus Framework
○ easy to implement
○ cloud-native support
● Metadata Management
○ write into Pulsar topic
○ read with TableView API
33. SQL Abstraction – CLI
● Terminal based tool
● Interact with the
SQL gateway APIs
● Query management