Citizen Streaming Engineer - A How To
https://apachecon.com/acasia2022/sessions/integration-1010.html
Democratizing the ability to build streaming data pipelines will help turn everyone who needs streaming data to be able to do it themselves. By utilizing the open source streaming stack known as FLiPNS will let us achieve this.
FLiPNS is a stack of Apache Flink, Apache Pulsar, Apache Spark and Apache NiFi. Apache NiFi provides a Web UI for cse’s to build their own data pipelines. These citizen applications can be used to ingest data, route, transform, enrich, join and store it. As part of these applications Pulsar will provide a streaming data hub for ML model access as well as plugging in additional processing components.
Speakers:
Timothy SPann: StreamNative, Developer Advocate, Tim Spann is a Developer Advocate for StreamNative. He works with StreamNative Cloud, Apache Pulsar, Apache Flink, Flink SQL, Apache NiFi, MiniFi, Apache MXNet, TensorFlow, Apache Spark, Big Data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.
2. Tim Spann
Developer Advocate
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems/ Data Architect
● Experience:
○ 15+ years of experience with batch and streaming
technologies including Pulsar, Flink, Spark, NiFi, Spring,
Java, Big Data, Cloud, MXNet, Hadoop, Datalakes, IoT
and more.
9. • Ingest Data
• Route, Transform, Enrich
• Join Data
• ML Model Access
• Store
Easy to Build Streaming Data Pipelines
10. Why Apache NiFi?
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a 300 components
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
11. Use Apache NiFi For Ingest
https://streamnative.io/apache-nifi-connector/
• Ingest Data
• Cleanse
21. Use Apache Spark To Store
val dfPulsar = spark.readStream.format("pulsar")
.option("service.url",
"pulsar://pulsar1:6650")
.option("admin.url",
"http://pulsar1:8080")
.option("topic",
"persistent://public/default/airquality").load()
val pQuery = dfPulsar.selectExpr("*")
.writeStream.format("parquet")
.option("truncate", false).start()
https://pulsar.apache.org/docs/en/adaptors-spark/