How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How to build streaming data pipelines with Akka Streams, Flink, and Spark using Cloudflow
1.
2. Who is Lightbend?
Total OSS Downloads
> 40m / month
Open Source Leaders Cloud Native
Application Platform
Tier 1 Enterprise
Customer Base
2
3. Digital Transformation and Application Architecture
3
• Digital transformation requires building modern, data-centric applications
• These applications need to be cloud native in design
• Lightbend delivers the architecture for building applications optimized for cloud infrastructure
4. Common Use Cases for Lightbend Platform
Real-time analytics Real-time personalization
Application modernization
IoT
Modern eCommerce Real-time financial processes
5. 5
In many of these use cases, the requirement is to build
run-the-business systems that allow real-time data to be
infused with intelligence.
This requires building and operationalizing streaming data
pipelines…
10. Data Processing Engines (more coming):
• Akka Streams low latency, efficient, complex event
processing
• Alpakka Built on Akka Streams for streaming
integration
• Apache Spark larger-scale, complex analytics
• Apache Flink large-scale, complex analytics when low
latency is essential
Heterogeneous Is Necessary!
11. • Some operations are fast - e.g. filtering
• Some are expensive – e.g. deep learning models
Different Latency and Scalability Needs
12. • Data never stops coming
• Eventually all rare problems will happen
Keeping Streams Healthy Is Hard
13. • Wire together the components? (and add new ones later?)
• Deploy and manage all this, including scaling and upgrading?
• Observe what’s going on?
How Do I…
14. Even more new things to learn…
X
And I’m Upgrading to New Infrastructure
15. Who Operates the Operators?
• Kubernetes operator
framework to codifies ops
• Operators per framework
• But: how do you operate an
application comprising
multiple frameworks?
17. Accelerator for the streaming app dev lifecycle
sbt> runLocal # Run whole app locally
sbt> buildAndPublish # Build and upload app
$ kubectl plugin pipelines deploy … # Run it!
What Is Lightbend Cloudflow?
18. Development Productivity
object MovingAverageSparkStreamlet extends SparkProcessor {
val out = AvroInlet[Data]("in")
val out = AvroOutlet[Agg]("out", _.id)
val shape = StreamletShape(out)
override def createLogic() = new SparkStreamletLogic {
override def buildStreamingQueries = {
val outStream = process(super.session)
writeStream(outStream, out, OutputMode.Append).toQueryExecution
}
}
protected def process(session: SparkSession): Dataset[Data] ⇒ Dataset[Agg] = {
session ⇒ session.readStream.load.as[Data]
.withColumn("ts", $"timestamp".cast(TimestampType))
.withWatermark("ts", "1 minutes")
.groupBy(window($"ts", "1 minute", "30 seconds"), $"src", $"gauge")
.agg(avg($"value") as "avg")
.select($"src", $"gauge", $"avg" as “value").as[Agg]
}
}
Drastic reduction in boilerplate
19. Easily integrate streamlets written in Akka Streams, Spark
Structured Streaming, and Flink
Merge
different input
streams
Validate record
formats, field
values
Use ML for more
sophisticated analysis
Compute aggregations
(e.g., statistics)
Send results
downstream
Development Productivity
20. Deploys to production easily, properly configured
sbt> buildAndPublish
my-app:12345 image uploaded to your cluster.
…
bash $ kubectl plugin pipelines deploy my-app:12345
Kubernetes kubectl
plugin for deploying,
scaling, managing, etc.
Production Success
21. Scale pods using the CLI and Pipelines operator
$ kubectl plugin pipelines scale my-app merge 2
Scale the merge pod to
2 instances.
Production Success
23. sbt> runLocal # Run whole app locally
sbt> buildAndPublish # Build and upload app
$ kubectl plugin pipelines deploy … # Run it!
Build tool
plugin (rapid
dev cycle,
full-integration
tests locally) Runtime observability
Cloudflow-custom GUI
kubectl plugin (deployment,
runtime management
including the Spark, Kafka,
Flink, Cloudflow installation
and runtime operators)
Open Source
What’s “In The Box”
24. Cloudflow Capabilities Matrix (Roadmap)
Cloudflow
Sandbox
Cloudflow Open
Source
Cloudflow with
Lightbend Platform
Installation sbt Helm/script Operator
Develop streamlets in Akka Streams/Flink/Spark ✓ ✓ ✓
Blueprint ✓ ✓ ✓
Messaging between streamlets in-memory persistent persistent
Deployment Local JVM Kubernetes Kubernetes
Observability / Console ✓
Upgrade Support ✓
Schema Evolution Support ✓
Autoscale ✓
25. Get Started with Open Source
• Go to Cloudflow.io
• Start with Sandbox to run locally
• Deploy to full Kubernetes cluster
• Use our sample apps
• More samples coming soon!
25