SlideShare uma empresa Scribd logo
1 de 40
Baixar para ler offline
Technische Universität Berlin
DIMA – Databases and Information Management Group
The Apache Flink Platform
for Parallel Batch and Stream Analysis
Jonas Traub | Tilmann Rabl | Fabian Hueske | Till Rohrmann | Volker Markl
In this talk
 Apache Flink Primer
• Architecture
• Execution Engine
• API Examples
 Stream Processing with Apache Flink
• Micro Batching vs. Native Streaming
• Flexible Windows/Stream Discretization
• Fault Tolerance with distributed snapshotting
 Conclusion
2Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Apache Flink Primer
3
What is Flink?
4
A platform for distributed
batch and streaming analytics
Streaming dataflow runtime
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Flink in the Analytics Ecosystem
55
MapReduce
Hive
Flink
Spark Storm
Yarn Mesos
HDFS
Mahout
Cascading
Tez
Pig
Data processing
engines
App and resource
management
Applications
Storage, streams KafkaHBase
Crunch
…
Giraph
5
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
What can I do with it?
6
An engine that can natively support all these workloads.
Flink
Stream
processing
Batch
processing
Machine Learning at scale
Graph Analysis
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Sneak peak: Two of Flink’s APIs
7
case class Word (word: String, frequency: Int)
val lines: DataStream[String] = env.fromSocketStream(...)
lines.flatMap {line => line.split(" ")
.map(word => Word(word,1))}
.keyBy("word")
.window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS))
.sum("frequency”)
.print()
val lines: DataSet[String] = env.readTextFile(...)
lines.flatMap {line => line.split(" ")
.map(word => Word(word,1))}
.groupBy("word").sum("frequency")
.print()
DataSet API (batch):
DataStream API (streaming):
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Execution Model
 Flink program = DAG* of operators and intermediate results
 Operator = computation + state
 Intermediate result = logical stream of records
8
map
join sum
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Architecture
 Pipelined/Streaming engine
• Complete DAG deployed
Worker 1
Worker 3 Worker 4
Worker 2
Job Manager
9Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Flink Stream Processing
10
Ingredients of a Streaming System
 Pipelined Execution Engine
 Streaming Windows/Discretization
 Fault Tolerance
 High Level Programming API (or language)
11Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Micro Batching vs Native Streaming
12
Stream
discretizer
Job Job Job Jobwhile (true) {
// get next few records
// issue batch computation
}
Discretized Streams (D-Streams)
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Micro Batching vs Native Streaming
13
Stream
discretizer
Job Job Job Jobwhile (true) {
// get next few records
// issue batch computation
}
while (true) {
// process next record
}
Long-standing
operators
Discretized Streams (D-Streams)
Native streaming
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Stream Discretization
 Data is unbounded
• Interested in a (recent) part of it e.g. last 10 days
 Most common windows around: time, and count
• Mostly in sliding, fixed, and tumbling form
 Need for data-driven window definitions
• e.g., user sessions (periods of user activity followed by inactivity), price changes, etc.
14
The world beyond batch: Streaming 101, Tyler Akidau
https://beta.oreilly.com/ideas/the-world-beyond-batch-
streaming-101
Great read!
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Flink’s Discretization
 Allows very flexible windowing
 Borrows ideas, and extends IBM’s SPL
• SLIDE = Trigger = When to emit a window
• RANGE = Eviction = What the window contains
 Allows for lots of optimization
• Not part of this talk
15Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
The Discretizer Operator
16
Streams are represented as
FIFO-Queue of data-items
The window
operator keeps a
FIFO-Buffer
After some time,
data-items expire
(they are deleted)
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
The Discretizer Operator
17
The window operator is
event driven by
data-item arrivals
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
The Discretizer Operator
18
The window operator is
event driven by
data-item arrivals
1.) Trigger Policies (TPs)
Specify when to emit the current
buffer content as a window.
2.) Eviction Policies (EPs)
Specify when data-items are
removed from the buffer.
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
The Discretizer Operator
19
1.) Trigger Policies (TPs)
Specify when to emit the current
buffer content as a window.
2.) Eviction Policies (EPs)
Specify when data-items are
removed from the buffer.
Query Example (window of size 3):
dataStream.window(Count.of(3))
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
The Discretizer Operator
20
2.) Eviction Policies (EPs)
Specify when data-items are
removed from the buffer.
1.) Trigger Policies (TPs)
Specify when to emit the current
buffer content as a window.
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
The Discretizer Operator
21
1.) Trigger Policies (TPs)
Specify when to emit the current
buffer content as a window.
2.) Eviction Policies (EPs)
Specify when data-items are
removed from the buffer.
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
The Discretizer Operator
22
1.) Trigger Policies (TPs)
Specify when to emit the current
buffer content as a window.
2.) Eviction Policies (EPs)
Specify when data-items are
removed from the buffer.
1.) Trigger Policies (TPs)
Specify when to emit the current
buffer content as a window.
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
The Discretizer Operator
23
1.) Trigger Policies (TPs)
Specify when to emit the current
buffer content as a window.
2.) Eviction Policies (EPs)
Specify when data-items are
removed from the buffer.
2.) Eviction Policies (EPs)
Specify when data-items are
removed from the buffer.
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Flexible Windowing
 Windows can be any combination of (multiple) triggers & evictions
• Arbitrary tumbling, sliding, session, etc. windows can be constructed.
 Common triggers/evictions part of the API
• Time, Count & Delta.
 Even more flexibility: define your own UDF trigger/eviction
24Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Fault Tolerance and
Operator State
25
Comparing Fault Tolerance Solutions
• Based on consistent global snapshots
• Algorithm inspired by Chandy-Lamport
• Low runtime overhead
• Stateful exactly-once semantics
Message tracking/acks
(at least once guarantee)
RDD re-computation
26Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Example: A Stateful Map (counter)
27
public class Counter implements MapFunction<Long>, Checkpointed<Long> {
//persistent counter
private long counter = 0;
public Long map(Long value){
return ++counter;
}
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Example: A Stateful Map (counter)
28
public class Counter implements MapFunction<Long>, Checkpointed<Long> {
//persistent counter
private long counter = 0;
public Long map(Long value){
return ++counter;
}
// regularly persists state during normal operation
public Serializable snapshotState(long checkpointId, long checkpointTimestamp){
return new Long(counter);
}
// restores state on recovery from failure
public void restoreState(Serializable state){
counter = (Long) state;
}
}
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Distributed Snapshots
reset from snap t2
t3t2t1
snap - t1 snap - t2
Assumptions
• repeatable sources
• reliable FIFO channels
29Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Taking Snapshots
reset from snap t2
t3t2t1
snap - t1 snap - t2
Initial approach (e.g.,Naiad)
• Pause execution on t1,t2,..
• Collect state
• Restore execution
30Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Asynchronous Snapshots in Flink
[Carbone et. al. 2015] “Lightweight Asynchronous Snapshots
for Distributed Dataflows”, Tech. Report.
http://arxiv.org/abs/1506.08603
Push checkpoint barriers through the data flow
Data Stream
barrier
Before barrier
 part of the snapshot
After barrier
 Not in snapshot
(backup till next snapshot)
31Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Asynchronous Snapshots in Flink
Push checkpoint barriers through the data flow
Data Stream
barrier
Before barrier
 part of the snapshot
After barrier
 Not in snapshot
(backup till next snapshot)
Operator checkpoint
starting
Checkpoint done
Checkpoint done
checkpoint in progress
32Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
[Carbone et. al. 2015] “Lightweight Asynchronous Snapshots
for Distributed Dataflows”, Tech. Report.
http://arxiv.org/abs/1506.08603
Closing
33
Community
34
Flink started as the Stratosphere
project in in 2009, led by TU Berlin.
Entered incubation April 2014
graduated on December 2014.
Now one of the most active big data
projects after over a year in the
Apache Software Foundation.
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
tl;dr: what was this about?
• The Berlin Big Data Center
• Native Streaming with Apache Flink
• Flexible Windowing
• Fault Tolerance with exactly once guarantees
• Large (and growing!) community
35Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Outlook: Introducing the BBDC
36
http://bbdc.berlin
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
BBDC Technology (10.000 feet view)
37Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
38
http://flink-forward.org
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Thank you
39
If you find this exciting,
get involved on Flink‘s mailing list
or stay tuned by
subscribing to news@flink.apache.org,
following flink.apache.org/blog, and
@ApacheFlink on Twitter
Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
Technische Universität Berlin
DIMA – Databases and Information Management Group
The Apache Flink Platform
for Parallel Batch and Stream Analysis
Jonas Traub | Tilmann Rabl | Fabian Hueske | Till Rohrmann | Volker Markl

Mais conteúdo relacionado

Mais procurados

Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...Flink Forward
 
Aggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream WindowsAggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream WindowsParis Carbone
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Stephan Ewen
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Flink Forward
 
Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...
Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...
Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...Flink Forward
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateFlink Forward
 
Let’s Build a Python Profiler in 25 LOC
Let’s Build a Python Profiler in 25 LOCLet’s Build a Python Profiler in 25 LOC
Let’s Build a Python Profiler in 25 LOCNoam Elfanbaum
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingKostas Tzoumas
 
Streaming Topic Maps API
Streaming Topic Maps APIStreaming Topic Maps API
Streaming Topic Maps APItmra
 
Monitoring Flink with Prometheus
Monitoring Flink with PrometheusMonitoring Flink with Prometheus
Monitoring Flink with PrometheusMaximilian Bode
 
Runtimeenvironment
RuntimeenvironmentRuntimeenvironment
RuntimeenvironmentAnusuya123
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingKostas Tzoumas
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingRobert Metzger
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward
 
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...Daniele Dell'Aglio
 
Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)
Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)
Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)Bhasker Kode
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemFlink Forward
 
Building ETL pipelines for tranSMART 17.X - New tools for the data loader
Building ETL pipelines for tranSMART 17.X - New tools for the data loaderBuilding ETL pipelines for tranSMART 17.X - New tools for the data loader
Building ETL pipelines for tranSMART 17.X - New tools for the data loaderAlessia Peviani
 
Archiving Oracle Primavera project plans with software development tools
Archiving Oracle Primavera project plans with software development toolsArchiving Oracle Primavera project plans with software development tools
Archiving Oracle Primavera project plans with software development toolsGunther Pippèrr
 

Mais procurados (20)

Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
 
Aggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream WindowsAggregate Sharing for User-Define Data Stream Windows
Aggregate Sharing for User-Define Data Stream Windows
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
 
Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...
Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...
Flink Forward Berlin 2017: Matt Zimmer - Custom, Complex Windows at Scale Usi...
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large State
 
Let’s Build a Python Profiler in 25 LOC
Let’s Build a Python Profiler in 25 LOCLet’s Build a Python Profiler in 25 LOC
Let’s Build a Python Profiler in 25 LOC
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream Processing
 
Streaming Topic Maps API
Streaming Topic Maps APIStreaming Topic Maps API
Streaming Topic Maps API
 
Monitoring Flink with Prometheus
Monitoring Flink with PrometheusMonitoring Flink with Prometheus
Monitoring Flink with Prometheus
 
Runtimeenvironment
RuntimeenvironmentRuntimeenvironment
Runtimeenvironment
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
A Gomez T Tat Cesga
A Gomez T Tat CesgaA Gomez T Tat Cesga
A Gomez T Tat Cesga
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer Checkpointing
 
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache F...
 
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...
RSEP-QL: A Query Model to Capture Event Pattern Matching in RDF Stream Proces...
 
Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)
Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)
Instrumenting Go (Gopherconindia Lightning talk by Bhasker Kode)
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
Building ETL pipelines for tranSMART 17.X - New tools for the data loader
Building ETL pipelines for tranSMART 17.X - New tools for the data loaderBuilding ETL pipelines for tranSMART 17.X - New tools for the data loader
Building ETL pipelines for tranSMART 17.X - New tools for the data loader
 
Archiving Oracle Primavera project plans with software development tools
Archiving Oracle Primavera project plans with software development toolsArchiving Oracle Primavera project plans with software development tools
Archiving Oracle Primavera project plans with software development tools
 

Semelhante a LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis

K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteFlink Forward
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Taiwan User Group
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in StreamsJamie Grier
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonStephan Ewen
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Kostas Tzoumas
 
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...Paris Carbone
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkFabian Hueske
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming AnalyticsSlim Baltagi
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and visionStephan Ewen
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Guido Schmutz
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)Yuuki Takano
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Dan Halperin
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiBowen Li
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache FlinkAljoscha Krettek
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Robert Metzger
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir OverviewPTIHPA
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkRobert Metzger
 

Semelhante a LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis (20)

K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System OverviewApache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
Apache Flink Training Workshop @ HadoopCon2016 - #1 System Overview
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
Apache flink
Apache flinkApache flink
Apache flink
 
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
Asynchronous Epoch Commits for Fast and Reliable Data Stream Execution in Apa...
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and vision
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
SF-TAP: Scalable and Flexible Traffic Analysis Platform (USENIX LISA 2015)
 
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
Introduction to Apache Beam & No Shard Left Behind: APIs for Massive Parallel...
 
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen LiTowards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
Towards Apache Flink 2.0 - Unified Data Processing and Beyond, Bowen Li
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache Flink
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir Overview
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
 

Mais de Jonas Traub

Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...Jonas Traub
 
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...Jonas Traub
 
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...Jonas Traub
 
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...Jonas Traub
 
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...Jonas Traub
 
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Jonas Traub
 
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Jonas Traub
 
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...Jonas Traub
 
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingFlink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingJonas Traub
 
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingScotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingJonas Traub
 
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...Jonas Traub
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLJonas Traub
 
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...Jonas Traub
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...Jonas Traub
 
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...Jonas Traub
 
I²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataI²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataJonas Traub
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)Jonas Traub
 

Mais de Jonas Traub (17)

Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
 
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
 
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Proces...
 
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General ...
 
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
 
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
 
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
 
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
 
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingFlink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
 
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingScotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
 
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCL
 
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
 
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
 
I²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataI²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming Data
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)
 

Último

VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 

Último (20)

VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 

LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis

  • 1. Technische Universität Berlin DIMA – Databases and Information Management Group The Apache Flink Platform for Parallel Batch and Stream Analysis Jonas Traub | Tilmann Rabl | Fabian Hueske | Till Rohrmann | Volker Markl
  • 2. In this talk  Apache Flink Primer • Architecture • Execution Engine • API Examples  Stream Processing with Apache Flink • Micro Batching vs. Native Streaming • Flexible Windows/Stream Discretization • Fault Tolerance with distributed snapshotting  Conclusion 2Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 4. What is Flink? 4 A platform for distributed batch and streaming analytics Streaming dataflow runtime Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 5. Flink in the Analytics Ecosystem 55 MapReduce Hive Flink Spark Storm Yarn Mesos HDFS Mahout Cascading Tez Pig Data processing engines App and resource management Applications Storage, streams KafkaHBase Crunch … Giraph 5 Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 6. What can I do with it? 6 An engine that can natively support all these workloads. Flink Stream processing Batch processing Machine Learning at scale Graph Analysis Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 7. Sneak peak: Two of Flink’s APIs 7 case class Word (word: String, frequency: Int) val lines: DataStream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .keyBy("word") .window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS)) .sum("frequency”) .print() val lines: DataSet[String] = env.readTextFile(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() DataSet API (batch): DataStream API (streaming): Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 8. Execution Model  Flink program = DAG* of operators and intermediate results  Operator = computation + state  Intermediate result = logical stream of records 8 map join sum Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 9. Architecture  Pipelined/Streaming engine • Complete DAG deployed Worker 1 Worker 3 Worker 4 Worker 2 Job Manager 9Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 11. Ingredients of a Streaming System  Pipelined Execution Engine  Streaming Windows/Discretization  Fault Tolerance  High Level Programming API (or language) 11Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 12. Micro Batching vs Native Streaming 12 Stream discretizer Job Job Job Jobwhile (true) { // get next few records // issue batch computation } Discretized Streams (D-Streams) Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 13. Micro Batching vs Native Streaming 13 Stream discretizer Job Job Job Jobwhile (true) { // get next few records // issue batch computation } while (true) { // process next record } Long-standing operators Discretized Streams (D-Streams) Native streaming Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 14. Stream Discretization  Data is unbounded • Interested in a (recent) part of it e.g. last 10 days  Most common windows around: time, and count • Mostly in sliding, fixed, and tumbling form  Need for data-driven window definitions • e.g., user sessions (periods of user activity followed by inactivity), price changes, etc. 14 The world beyond batch: Streaming 101, Tyler Akidau https://beta.oreilly.com/ideas/the-world-beyond-batch- streaming-101 Great read! Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 15. Flink’s Discretization  Allows very flexible windowing  Borrows ideas, and extends IBM’s SPL • SLIDE = Trigger = When to emit a window • RANGE = Eviction = What the window contains  Allows for lots of optimization • Not part of this talk 15Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 16. The Discretizer Operator 16 Streams are represented as FIFO-Queue of data-items The window operator keeps a FIFO-Buffer After some time, data-items expire (they are deleted) Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 17. The Discretizer Operator 17 The window operator is event driven by data-item arrivals Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 18. The Discretizer Operator 18 The window operator is event driven by data-item arrivals 1.) Trigger Policies (TPs) Specify when to emit the current buffer content as a window. 2.) Eviction Policies (EPs) Specify when data-items are removed from the buffer. Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 19. The Discretizer Operator 19 1.) Trigger Policies (TPs) Specify when to emit the current buffer content as a window. 2.) Eviction Policies (EPs) Specify when data-items are removed from the buffer. Query Example (window of size 3): dataStream.window(Count.of(3)) Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 20. The Discretizer Operator 20 2.) Eviction Policies (EPs) Specify when data-items are removed from the buffer. 1.) Trigger Policies (TPs) Specify when to emit the current buffer content as a window. Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 21. The Discretizer Operator 21 1.) Trigger Policies (TPs) Specify when to emit the current buffer content as a window. 2.) Eviction Policies (EPs) Specify when data-items are removed from the buffer. Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 22. The Discretizer Operator 22 1.) Trigger Policies (TPs) Specify when to emit the current buffer content as a window. 2.) Eviction Policies (EPs) Specify when data-items are removed from the buffer. 1.) Trigger Policies (TPs) Specify when to emit the current buffer content as a window. Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 23. The Discretizer Operator 23 1.) Trigger Policies (TPs) Specify when to emit the current buffer content as a window. 2.) Eviction Policies (EPs) Specify when data-items are removed from the buffer. 2.) Eviction Policies (EPs) Specify when data-items are removed from the buffer. Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 24. Flexible Windowing  Windows can be any combination of (multiple) triggers & evictions • Arbitrary tumbling, sliding, session, etc. windows can be constructed.  Common triggers/evictions part of the API • Time, Count & Delta.  Even more flexibility: define your own UDF trigger/eviction 24Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 26. Comparing Fault Tolerance Solutions • Based on consistent global snapshots • Algorithm inspired by Chandy-Lamport • Low runtime overhead • Stateful exactly-once semantics Message tracking/acks (at least once guarantee) RDD re-computation 26Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 27. Example: A Stateful Map (counter) 27 public class Counter implements MapFunction<Long>, Checkpointed<Long> { //persistent counter private long counter = 0; public Long map(Long value){ return ++counter; } Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 28. Example: A Stateful Map (counter) 28 public class Counter implements MapFunction<Long>, Checkpointed<Long> { //persistent counter private long counter = 0; public Long map(Long value){ return ++counter; } // regularly persists state during normal operation public Serializable snapshotState(long checkpointId, long checkpointTimestamp){ return new Long(counter); } // restores state on recovery from failure public void restoreState(Serializable state){ counter = (Long) state; } } Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 29. Distributed Snapshots reset from snap t2 t3t2t1 snap - t1 snap - t2 Assumptions • repeatable sources • reliable FIFO channels 29Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 30. Taking Snapshots reset from snap t2 t3t2t1 snap - t1 snap - t2 Initial approach (e.g.,Naiad) • Pause execution on t1,t2,.. • Collect state • Restore execution 30Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 31. Asynchronous Snapshots in Flink [Carbone et. al. 2015] “Lightweight Asynchronous Snapshots for Distributed Dataflows”, Tech. Report. http://arxiv.org/abs/1506.08603 Push checkpoint barriers through the data flow Data Stream barrier Before barrier  part of the snapshot After barrier  Not in snapshot (backup till next snapshot) 31Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 32. Asynchronous Snapshots in Flink Push checkpoint barriers through the data flow Data Stream barrier Before barrier  part of the snapshot After barrier  Not in snapshot (backup till next snapshot) Operator checkpoint starting Checkpoint done Checkpoint done checkpoint in progress 32Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015 [Carbone et. al. 2015] “Lightweight Asynchronous Snapshots for Distributed Dataflows”, Tech. Report. http://arxiv.org/abs/1506.08603
  • 34. Community 34 Flink started as the Stratosphere project in in 2009, led by TU Berlin. Entered incubation April 2014 graduated on December 2014. Now one of the most active big data projects after over a year in the Apache Software Foundation. Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 35. tl;dr: what was this about? • The Berlin Big Data Center • Native Streaming with Apache Flink • Flexible Windowing • Fault Tolerance with exactly once guarantees • Large (and growing!) community 35Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 36. Outlook: Introducing the BBDC 36 http://bbdc.berlin Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 37. BBDC Technology (10.000 feet view) 37Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 38. 38 http://flink-forward.org Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 39. Thank you 39 If you find this exciting, get involved on Flink‘s mailing list or stay tuned by subscribing to news@flink.apache.org, following flink.apache.org/blog, and @ApacheFlink on Twitter Technische Universität Berlin - The Apache Flink Platform for Parallel Batch and Stream Analysis - FGDB 2015
  • 40. Technische Universität Berlin DIMA – Databases and Information Management Group The Apache Flink Platform for Parallel Batch and Stream Analysis Jonas Traub | Tilmann Rabl | Fabian Hueske | Till Rohrmann | Volker Markl