SlideShare a Scribd company logo
1 of 44
Stephan Ewen
Flink committer
co-founder / CTO @ data Artisans
@StephanEwen
Apache
Flink
Looking back one year
2
April 16, 2014
3
Stratosphere 0.4
4
Stratosphere Optimizer
Pact API (Java)
Stratosphere Runtime
DataSet API (Scala)
Local Remote
Batch processing on a pipelining engine, with iterations …
Looking at now…
5
Flink
Historic data
Kafka, RabbitMQ, ...
HDFS, JDBC, ...
ETL, Graphs,
Machine Learning
Relational, …
Low latency,
windowing,
aggregations, ...
Event logs
Real-time data
streams
What is Apache Flink?
(master)
What is Apache Flink?
7
Python
Gelly
Table
ML
SAMOA
Flink Optimizer
DataSet (Java/Scala) DataStream (Java/Scala)
Stream Builder
Hadoop
M/R
Local Remote Yarn Tez Embedded
Dataflow
Dataflow
Flink Dataflow Runtime
HDFS
HBase
Kafka
RabbitMQ
Flume
HCatalog
JDBC
Batch / Steaming APIs
8
case class Word (word: String, frequency: Int)
val lines: DataStream[String] = env.fromSocketStream(...)
lines.flatMap {line => line.split(" ")
.map(word => Word(word,1))}
.window(Count.of(1000)).every(Count.of(100))
.groupBy("word").sum("frequency")
.print()
val lines: DataSet[String] = env.readTextFile(...)
lines.flatMap {line => line.split(" ")
.map(word => Word(word,1))}
.groupBy("word").sum("frequency")
.print()
DataSet API (batch):
DataStream API (streaming):
Technology inside Flink
case class Path (from: Long, to:
Long)
val tc = edges.iterate(10) {
paths: DataSet[Path] =>
val next = paths
.join(edges)
.where("to")
.equalTo("from") {
(path, edge) =>
Path(path.from, edge.to)
}
.union(paths)
.distinct()
next
}
Cost-based
optimizer
Type extraction
stack
Task
scheduling
Recovery
metadata
Pre-flight (Client)
Master
Workers
DataSourc
e
orders.tbl
Filter
Map
DataSourc
e
lineitem.tbl
Join
Hybrid Hash
build
HT
probe
hash-part [0] hash-part [0]
GroupRed
sort
forward
Program
Dataflow
Graph
Memory
manager
Out-of-core
algos
Batch &
Streaming
State &
Checkpoints
deploy
operators
track
intermediate
results
Flink by Feature / Use Case
10
Data Streaming Analysis
11
Life of data streams
 Create: create streams from event sources
(machines, databases, logs, sensors, …)
 Collect: collect and make streams available for
consumption (e.g., Apache Kafka)
 Process: process streams, possibly generating
derived streams (e.g., Apache Flink)
12
Stream Analysis in Flink
13
More at: http://flink.apache.org/news/2015/02/09/streaming-example.html
Defining windows in Flink
 Trigger policy
• When to trigger the computation on current window
 Eviction policy
• When data points should leave the window
• Defines window width/size
 E.g., count-based policy
• evict when #elements > n
• start a new window every n-th element
 Built-in: Count, Time, Delta policies
14
Checkpointing / Recovery
 Flink acknowledges batches of records
• Less overhead in failure-free case
• Currently tied to fault tolerant data sources
(e.g., Kafka)
 Flink operators can keep state
• State is checkpointed
• Checkpointing and record acks go together
 Exactly one semantics for state
15
Checkpointing / Recovery
16
Chandy-Lamport Algorithm for consistent asynchronous distributed snapshots
Pushes checkpoint barriers
through the data flow
Operator checkpoint
starting
Checkpoint done
Data Stream
barrier
Before barrier =
part of the snapshot
After barrier =
Not in snapshot
Checkpoint done
checkpoint in progress
(backup till next snapshot)
Heavy ETL Pipelines
17
Heavy Data Pipelines
18
Complex ETL programs
Apology: Graph had to be blurred for
online slides, due to confidentiality
Memory Management
public class WC {
public String word;
public int count;
}
empty
page
Pool of Memory Pages
Sorting,
hashing,
caching
Shuffling,
broadcasts
User code
objects
ManagedUnmanaged
19
Flink contains its own memory management stack. Memory is
allocated, de-allocated, and used strictly using an internal buffer pool
implementation. To do that, Flink contains its own type extraction and
serialization components.
More at: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525
Smooth out-of-core performance
20
More at: http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html
Single-core join of 1KB Java objects beyond memory (4 GB)
Blue bars are in-memory, orange bars (partially) out-of-core
Benefits of managed memory
 More reliable and stable performance (less GC
effects, easy to go to disk)
21
Table API
22
val customers = envreadCsvFile(…).as('id, 'mktSegment)
.filter( 'mktSegment === "AUTOMOBILE" )
val orders = env.readCsvFile(…)
.filter( o => dateFormat.parse(o.orderDate).before(date) )
.as('orderId, 'custId, 'orderDate, 'shipPrio)
val items = orders
.join(customers).where('custId === 'id)
.join(lineitems).where('orderId === 'id)
.select('orderId,'orderDate,'shipPrio,
'extdPrice * (Literal(1.0f) - 'discount) as 'revenue)
val result = items
.groupBy('orderId, 'orderDate, 'shipPrio)
.select('orderId, 'revenue.sum, 'orderDate, 'shipPrio)
Iterations in Data Flows
 Machine Learning Algorithms
23
Iterate by looping
 for/while loop in client submits one job per
iteration step
 Data reuse by caching in memory and/or disk
Step Step Step Step Step
Client
24
Iterate in the Dataflow
25
partial
solution
partial
solutionX
other
datasets
Y
initial
solution
iteration
result
Replace
Step function
Large-Scale Machine Learning
26
Factorizing a matrix with
28 billion ratings for
recommendations
(Scale of Netflix
or Spotify)
More at: http://data-artisans.com/computing-recommendations-with-flink.html
State in Iterations
 Graphs and Machine Learning
27
Iterate natively with deltas
28
partial
solution
delta
setX
other
datasets
Y
initial
solution
iteration
result
workset A B workset
Merge deltas
Replace
initial
workset
Effect of delta iterations…
0
5000000
10000000
15000000
20000000
25000000
30000000
35000000
40000000
45000000
1 6 11 16 21 26 31 36 41 46 51 56 61
#ofelementsupdated
iteration
… very fast graph analysis
30
… and mix and match
ETL-style and graph analysis
in one program
Performance competitive
with dedicated graph
analysis systems
More at: http://data-artisans.com/data-analysis-with-flink.html
Closing
31
Flink Roadmap for 2015
 Out-of-core state in Streaming
 Monitoring and scaling for streaming
 Streaming Machine Learning with SAMOA
 More additions to the libraries
• Batch Machine Learning
• Graph library additions (more algorithms)
 SQL on top of expression language
 Master failover
32
Flink community
0
20
40
60
80
100
120
Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13 Nov-13 Jun-14 Dec-14 Jul-15
#unique contributor ids by git commits
flink.apache.org
@ApacheFlink
35
Backup
Cornerpoints of Flink Design
36
Robust Algorithms on
Managed Memory
Pipelined Execution
of Batch Programs
 Better shuffle performance
 No OutOfMemory Errors
 Scales to very large JVMs
 Efficient an robust processing
Flexible Data
Streaming Engine
 Low Latency Steam Proc.
 Highly flexible windows
Native Iterations
 Very fast Graph Processing
 Stateful Iterations for ML
High-level APIs,
beyond key/value pairs
 Java/Scala/Python (upcoming)
 Relational-style optimizer
 Graphs / Machine Learning
 Streaming ML (coming)
 Scales to very large groups
Active Library Development
Program optimization
37
A simple program
38
val orders = …
val lineitems = …
val filteredOrders = orders
.filter(o => dataFormat.parse(l.shipDate).after(date))
.filter(o => o.shipPrio > 2)
val lineitemsOfOrders = filteredOrders
.join(lineitems)
.where(“orderId”).equalTo(“orderId”)
.apply((o,l) => new SelectedItem(o.orderDate, l.extdPrice))
val priceSums = lineitemsOfOrders
.groupBy(“orderDate”).sum(“l.extdPrice”);
Two execution plans
39
DataSource
orders.tbl
Filter
Map DataSource
lineitem.tbl
Join
Hybrid Hash
buildHT probe
broadcast forward
Combine
GroupRed
sort
DataSource
orders.tbl
Filter
Map DataSource
lineitem.tbl
Join
Hybrid Hash
buildHT probe
hash-part [0] hash-part [0]
hash-part [0,1]
GroupRed
sort
forwardBest plan
depends on
relative sizes
of input files
Examples of optimization
 Task chaining
• Coalesce map/filter/etc tasks
 Join optimizations
• Broadcast/partition, build/probe side, hash or sort-
merge
 Interesting properties
• Re-use partitioning and sorting for later operations
 Automatic caching
• E.g., for iterations
40
Visualization
41
Visualization tools
42
Visualization tools
43
Visualization tools
44

More Related Content

What's hot

Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream ProcessingGuido Schmutz
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkDatabricks
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataTimothy Spann
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationnathanmarz
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysisAmazon Web Services
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebookAniket Mokashi
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache SparkDatabricks
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?confluent
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internalsKostas Tzoumas
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overviewjins0618
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing FrameworksSirKetchup
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Till Rohrmann
 
Spark streaming
Spark streamingSpark streaming
Spark streamingWhiteklay
 

What's hot (20)

Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Introduction to Stream Processing
Introduction to Stream ProcessingIntroduction to Stream Processing
Introduction to Stream Processing
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
 
Storm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computationStorm: distributed and fault-tolerant realtime computation
Storm: distributed and fault-tolerant realtime computation
 
Streaming data for real time analysis
Streaming data for real time analysisStreaming data for real time analysis
Streaming data for real time analysis
 
XStream: stream processing platform at facebook
XStream:  stream processing platform at facebookXStream:  stream processing platform at facebook
XStream: stream processing platform at facebook
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache Spark
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overview
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing Frameworks
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 

Viewers also liked

Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Ververica
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Ververica
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiSlim Baltagi
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingRobert Metzger
 
Ansible pill09wp
Ansible pill09wpAnsible pill09wp
Ansible pill09wpIdeato
 
Apache Nifi - Custom Processor
Apache Nifi - Custom Processor Apache Nifi - Custom Processor
Apache Nifi - Custom Processor thotasrinath
 
January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016Robert Metzger
 
The Business Event Bus
The Business Event BusThe Business Event Bus
The Business Event BusJoris Meijer
 
Flink 1.0-slides
Flink 1.0-slidesFlink 1.0-slides
Flink 1.0-slidesJamie Grier
 
Machine Learning and Causal Inference
Machine Learning and Causal InferenceMachine Learning and Causal Inference
Machine Learning and Causal InferenceNBER
 
Themenstruktur Kafka
Themenstruktur KafkaThemenstruktur Kafka
Themenstruktur Kafkaschoolmeester
 
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) Andreas Chatzakis
 
Verwandlung Familienstruktur Schaubild
Verwandlung Familienstruktur SchaubildVerwandlung Familienstruktur Schaubild
Verwandlung Familienstruktur Schaubildschoolmeester
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextVerverica
 

Viewers also liked (20)

Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®Kostas Tzoumas - Stream Processing with Apache Flink®
Kostas Tzoumas - Stream Processing with Apache Flink®
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer Checkpointing
 
Ansible pill09wp
Ansible pill09wpAnsible pill09wp
Ansible pill09wp
 
Apache Nifi - Custom Processor
Apache Nifi - Custom Processor Apache Nifi - Custom Processor
Apache Nifi - Custom Processor
 
January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016
 
The Business Event Bus
The Business Event BusThe Business Event Bus
The Business Event Bus
 
Flink 1.0-slides
Flink 1.0-slidesFlink 1.0-slides
Flink 1.0-slides
 
Kafka Utrecht Meetup
Kafka Utrecht MeetupKafka Utrecht Meetup
Kafka Utrecht Meetup
 
Flume Case Study
Flume Case StudyFlume Case Study
Flume Case Study
 
Machine Learning and Causal Inference
Machine Learning and Causal InferenceMachine Learning and Causal Inference
Machine Learning and Causal Inference
 
Themenstruktur Kafka
Themenstruktur KafkaThemenstruktur Kafka
Themenstruktur Kafka
 
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr) ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
ElasticSearch on AWS - Real Estate portal case study (Spitogatos.gr)
 
Verwandlung Familienstruktur Schaubild
Verwandlung Familienstruktur SchaubildVerwandlung Familienstruktur Schaubild
Verwandlung Familienstruktur Schaubild
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
 

Similar to Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pre-Hadoop Summit Meetups)

Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonStephan Ewen
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and visionStephan Ewen
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureGyula Fóra
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingFabian Hueske
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestDataGyula Fóra
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Till Rohrmann
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Robert Metzger
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkRobert Metzger
 
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink:  Fast and reliable large-scale data processingJanuary 2015 HUG: Apache Flink:  Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processingYahoo Developer Network
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkRobert Metzger
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Apache Flink Taiwan User Group
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureRobert Metzger
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Data Analysis With Apache Flink
Data Analysis With Apache FlinkData Analysis With Apache Flink
Data Analysis With Apache FlinkDataWorks Summit
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Aljoscha Krettek
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the unionDatabricks
 

Similar to Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pre-Hadoop Summit Meetups) (20)

Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and vision
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data ProcessingApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
ApacheCon: Apache Flink - Fast and Reliable Large-Scale Data Processing
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestData
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
 
GOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache FlinkGOTO Night Amsterdam - Stream processing with Apache Flink
GOTO Night Amsterdam - Stream processing with Apache Flink
 
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink:  Fast and reliable large-scale data processingJanuary 2015 HUG: Apache Flink:  Fast and reliable large-scale data processing
January 2015 HUG: Apache Flink: Fast and reliable large-scale data processing
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Data Analysis With Apache Flink
Data Analysis With Apache FlinkData Analysis With Apache Flink
Data Analysis With Apache Flink
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)
 
Spark streaming state of the union
Spark streaming state of the unionSpark streaming state of the union
Spark streaming state of the union
 

More from Stephan Ewen

Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Stephan Ewen
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords Stephan Ewen
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Stephan Ewen
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupApache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupStephan Ewen
 
Flink use cases @ bay area meetup (october 2015)
Flink use cases @ bay area meetup (october 2015)Flink use cases @ bay area meetup (october 2015)
Flink use cases @ bay area meetup (october 2015)Stephan Ewen
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsStephan Ewen
 

More from Stephan Ewen (7)

Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
 
Apache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink MeetupApache Flink @ NYC Flink Meetup
Apache Flink @ NYC Flink Meetup
 
Flink use cases @ bay area meetup (october 2015)
Flink use cases @ bay area meetup (october 2015)Flink use cases @ bay area meetup (october 2015)
Flink use cases @ bay area meetup (october 2015)
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 

Recently uploaded

%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburgmasabamasaba
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 

Recently uploaded (20)

%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 

Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pre-Hadoop Summit Meetups)

  • 1. Stephan Ewen Flink committer co-founder / CTO @ data Artisans @StephanEwen Apache Flink
  • 4. Stratosphere 0.4 4 Stratosphere Optimizer Pact API (Java) Stratosphere Runtime DataSet API (Scala) Local Remote Batch processing on a pipelining engine, with iterations …
  • 6. Flink Historic data Kafka, RabbitMQ, ... HDFS, JDBC, ... ETL, Graphs, Machine Learning Relational, … Low latency, windowing, aggregations, ... Event logs Real-time data streams What is Apache Flink? (master)
  • 7. What is Apache Flink? 7 Python Gelly Table ML SAMOA Flink Optimizer DataSet (Java/Scala) DataStream (Java/Scala) Stream Builder Hadoop M/R Local Remote Yarn Tez Embedded Dataflow Dataflow Flink Dataflow Runtime HDFS HBase Kafka RabbitMQ Flume HCatalog JDBC
  • 8. Batch / Steaming APIs 8 case class Word (word: String, frequency: Int) val lines: DataStream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .window(Count.of(1000)).every(Count.of(100)) .groupBy("word").sum("frequency") .print() val lines: DataSet[String] = env.readTextFile(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() DataSet API (batch): DataStream API (streaming):
  • 9. Technology inside Flink case class Path (from: Long, to: Long) val tc = edges.iterate(10) { paths: DataSet[Path] => val next = paths .join(edges) .where("to") .equalTo("from") { (path, edge) => Path(path.from, edge.to) } .union(paths) .distinct() next } Cost-based optimizer Type extraction stack Task scheduling Recovery metadata Pre-flight (Client) Master Workers DataSourc e orders.tbl Filter Map DataSourc e lineitem.tbl Join Hybrid Hash build HT probe hash-part [0] hash-part [0] GroupRed sort forward Program Dataflow Graph Memory manager Out-of-core algos Batch & Streaming State & Checkpoints deploy operators track intermediate results
  • 10. Flink by Feature / Use Case 10
  • 12. Life of data streams  Create: create streams from event sources (machines, databases, logs, sensors, …)  Collect: collect and make streams available for consumption (e.g., Apache Kafka)  Process: process streams, possibly generating derived streams (e.g., Apache Flink) 12
  • 13. Stream Analysis in Flink 13 More at: http://flink.apache.org/news/2015/02/09/streaming-example.html
  • 14. Defining windows in Flink  Trigger policy • When to trigger the computation on current window  Eviction policy • When data points should leave the window • Defines window width/size  E.g., count-based policy • evict when #elements > n • start a new window every n-th element  Built-in: Count, Time, Delta policies 14
  • 15. Checkpointing / Recovery  Flink acknowledges batches of records • Less overhead in failure-free case • Currently tied to fault tolerant data sources (e.g., Kafka)  Flink operators can keep state • State is checkpointed • Checkpointing and record acks go together  Exactly one semantics for state 15
  • 16. Checkpointing / Recovery 16 Chandy-Lamport Algorithm for consistent asynchronous distributed snapshots Pushes checkpoint barriers through the data flow Operator checkpoint starting Checkpoint done Data Stream barrier Before barrier = part of the snapshot After barrier = Not in snapshot Checkpoint done checkpoint in progress (backup till next snapshot)
  • 18. Heavy Data Pipelines 18 Complex ETL programs Apology: Graph had to be blurred for online slides, due to confidentiality
  • 19. Memory Management public class WC { public String word; public int count; } empty page Pool of Memory Pages Sorting, hashing, caching Shuffling, broadcasts User code objects ManagedUnmanaged 19 Flink contains its own memory management stack. Memory is allocated, de-allocated, and used strictly using an internal buffer pool implementation. To do that, Flink contains its own type extraction and serialization components. More at: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741525
  • 20. Smooth out-of-core performance 20 More at: http://flink.apache.org/news/2015/03/13/peeking-into-Apache-Flinks-Engine-Room.html Single-core join of 1KB Java objects beyond memory (4 GB) Blue bars are in-memory, orange bars (partially) out-of-core
  • 21. Benefits of managed memory  More reliable and stable performance (less GC effects, easy to go to disk) 21
  • 22. Table API 22 val customers = envreadCsvFile(…).as('id, 'mktSegment) .filter( 'mktSegment === "AUTOMOBILE" ) val orders = env.readCsvFile(…) .filter( o => dateFormat.parse(o.orderDate).before(date) ) .as('orderId, 'custId, 'orderDate, 'shipPrio) val items = orders .join(customers).where('custId === 'id) .join(lineitems).where('orderId === 'id) .select('orderId,'orderDate,'shipPrio, 'extdPrice * (Literal(1.0f) - 'discount) as 'revenue) val result = items .groupBy('orderId, 'orderDate, 'shipPrio) .select('orderId, 'revenue.sum, 'orderDate, 'shipPrio)
  • 23. Iterations in Data Flows  Machine Learning Algorithms 23
  • 24. Iterate by looping  for/while loop in client submits one job per iteration step  Data reuse by caching in memory and/or disk Step Step Step Step Step Client 24
  • 25. Iterate in the Dataflow 25 partial solution partial solutionX other datasets Y initial solution iteration result Replace Step function
  • 26. Large-Scale Machine Learning 26 Factorizing a matrix with 28 billion ratings for recommendations (Scale of Netflix or Spotify) More at: http://data-artisans.com/computing-recommendations-with-flink.html
  • 27. State in Iterations  Graphs and Machine Learning 27
  • 28. Iterate natively with deltas 28 partial solution delta setX other datasets Y initial solution iteration result workset A B workset Merge deltas Replace initial workset
  • 29. Effect of delta iterations… 0 5000000 10000000 15000000 20000000 25000000 30000000 35000000 40000000 45000000 1 6 11 16 21 26 31 36 41 46 51 56 61 #ofelementsupdated iteration
  • 30. … very fast graph analysis 30 … and mix and match ETL-style and graph analysis in one program Performance competitive with dedicated graph analysis systems More at: http://data-artisans.com/data-analysis-with-flink.html
  • 32. Flink Roadmap for 2015  Out-of-core state in Streaming  Monitoring and scaling for streaming  Streaming Machine Learning with SAMOA  More additions to the libraries • Batch Machine Learning • Graph library additions (more algorithms)  SQL on top of expression language  Master failover 32
  • 33. Flink community 0 20 40 60 80 100 120 Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13 Nov-13 Jun-14 Dec-14 Jul-15 #unique contributor ids by git commits
  • 36. Cornerpoints of Flink Design 36 Robust Algorithms on Managed Memory Pipelined Execution of Batch Programs  Better shuffle performance  No OutOfMemory Errors  Scales to very large JVMs  Efficient an robust processing Flexible Data Streaming Engine  Low Latency Steam Proc.  Highly flexible windows Native Iterations  Very fast Graph Processing  Stateful Iterations for ML High-level APIs, beyond key/value pairs  Java/Scala/Python (upcoming)  Relational-style optimizer  Graphs / Machine Learning  Streaming ML (coming)  Scales to very large groups Active Library Development
  • 38. A simple program 38 val orders = … val lineitems = … val filteredOrders = orders .filter(o => dataFormat.parse(l.shipDate).after(date)) .filter(o => o.shipPrio > 2) val lineitemsOfOrders = filteredOrders .join(lineitems) .where(“orderId”).equalTo(“orderId”) .apply((o,l) => new SelectedItem(o.orderDate, l.extdPrice)) val priceSums = lineitemsOfOrders .groupBy(“orderDate”).sum(“l.extdPrice”);
  • 39. Two execution plans 39 DataSource orders.tbl Filter Map DataSource lineitem.tbl Join Hybrid Hash buildHT probe broadcast forward Combine GroupRed sort DataSource orders.tbl Filter Map DataSource lineitem.tbl Join Hybrid Hash buildHT probe hash-part [0] hash-part [0] hash-part [0,1] GroupRed sort forwardBest plan depends on relative sizes of input files
  • 40. Examples of optimization  Task chaining • Coalesce map/filter/etc tasks  Join optimizations • Broadcast/partition, build/probe side, hash or sort- merge  Interesting properties • Re-use partitioning and sorting for later operations  Automatic caching • E.g., for iterations 40

Editor's Notes

  1. Iterations, Yarn support, Local execution, accummulators, web frontend, HBase, JDBC, Windows compatibility, mvn central,
  2. dev list: 300-400 messages/month. record 1000 messages on