Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

1
Kostas Tzoumas
@kostas_tzoumas
Strata + Hadoop World NYC 2016
September 29, 2016
Apache Flink®: State of the Union and
What's Next

What I'd like to talk about
 Some highlights from Flink Forward 2016
 Streaming ecosystem evolution and Flink
 What's coming up in Flink
2

3
Original creators of Apache
Flink®
Providers of the
dA Platform, the supported
Flink distribution

Retail, e-commerce
 Better product
recommendations
 Process monitoring
 Inventory
management
Finance
 Differentiation via
tech
 Push-based
products
 Fraud detection
Telco, IoT,
Infrastructure
 Infrastructure
monitoring
 Anomaly detection
Internet & mobile
 Personalization
 User behavior
monitoring
 Analytics
8

30 Flink applications in production for more than one
year. 10 billion events (2TB) processed daily
Complex jobs of > 30 operators running 24/7,
processing 30 billion events daily, maintaining state
of 100s of GB with exactly-once guarantees
Largest job has > 20 operators, runs on > 5000
vCores in 1000-node cluster, processes millions of
events per second
9

Streaming ecosystem and
Flink
11

Streaming technology is enabling the
obvious: continuous processing on data that
is continuously produced
Hint: you already have streaming data
12

13
collect log analyze query
app state
history log

14
(Aside: streaming and "batch")
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am…
partition
partition
Stream (low latency)
Batch
(bounded stream)Stream (high latency)

What is Flink's unique contribution in the
streaming data ecosystem?
15

Before Flink, users had to make hard choices
between volume, latency, and accuracy
16

Flink eliminates these tradeoffs
 10s of millions events per second for stateful
applications
 Sub-second latency, as low as single-digit
milliseconds
 Accurate computation results
17

A broader definition of accuracy: the results that I
want when I want them
1. Accurate under failures and downtime
2. Accurate under out of order data
3. Results when you need them
4. Accurate modeling of the world
18

1. Failures and downtime
 Checkpoints & savepoints
 Exactly-once guarantees
2. Out of order and late data
 Event time support
 Watermarks
3. Results when you need them
 Low latency
 Triggers
4. Accurate modeling
 True streaming engine
 Sessions and flexible
windows
19

5. Batch + streaming
 One engine
 Dedicated APIs
6. Reprocessing
 High throughput, event
time support, and
savepoints
7. Ecosystem
 Rich connector ecosystem
and 3rd party packages
8. Community support
 One of the most active
projects with over 200
contributors
20
flink -s <savepoint> <job>

21
Having a dependable framework enables
more stateful applications to run as
streaming applications

 Provide state of the art streaming capabilities (✔)
 Operate in the largest infrastructures of the world
 Open up to a wider set of enterprise users
 Broaden the scope of stream processing
23

Flink's unique combination of features
24
Low latency
High Throughput
Well-behaved
flow control
(back pressure)
Consistency
Works on real-time
and historic data
Performance Event Time
APIs
Libraries
Stateful
Streaming
Savepoints
(replays, A/B testing,
upgrades, versioning)
Exactly-once semantics
for fault tolerance
Windows &
user-defined state
Flexible windows
(time, count, session, roll-your own)
Complex Event Processing
Fluent API
Out-of-order events
Fast and large
out-of-core state

Flink v1.1
25
Connectors
Metric
System
(Stream) SQL Session
Windows
Library
enhancements

Flink v1.1 + current threads
26
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Metrics &
Visualization
Dynamic Scaling
Savepoint
compatibility Checkpoints
to savepoints
More connectors Stream SQL
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Security
Mesos &
others
Dynamic Resource
Management
Authentication
Queryable State

27
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Operations
Ecosystem
Application
Features
Metrics &
Visualization
Dynamic Scaling
Savepoint
to savepoints
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Broader
Audience
Security
Mesos &
others
Dynamic Resource
Management
Authentication
Queryable State

28
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Operations
Ecosystem
Application
Features
Metrics &
Visualization
Dynamic Scaling
Savepoint
to savepoints
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Broader
Audience
Security
Mesos &
others
Dynamic Resource
Management
Authentication
Queryable State

Security / Authentication
29
No unauthorized data access
Secured clusters with Kerberos-based authentication
• Kafka, ZooKeeper, HDFS, YARN, HBase, …
No unencrypted traffic between Flink Processes
• RPC, Data Exchange, Web UI
Largely contributed by
Prevent malicious users to hook into Flink jobs

Checkpoints / Savepoints
30
Recover a running job into a new job
Recover a running job onto a new cluster
Application state backwards compatibility
• Flink 1.0 made the APIs backwards compatible
• Now making the savepoints backwards compatible
• Applications can be moved to newer versions of
Flink even when state backends or internals change
v1.x v2.0v1.y

Dynamic scaling
31
Changing load bears changing resource requirements
• Need to adjust parallelism of running streaming jobs
Re-scaling stateless operators is trivial
Re-scaling stateful operators is hard (windows, user state)
• Efficiently re-shard state
time
Workload
Resources
Re-scaling Flink jobs preserves
exactly-once guarantees

Cluster management
32
Series of improvements to seamlessly
interoperate with various cluster managers
• YARN, Mesos, Docker, Standalone, …
Driven by
Mesos integration contributed by
and

Stream SQL
33
SQL is the standard high-level query language
A natural way to open up streaming to more people
Problem: There is no Streaming SQL standard
• At least beyond the basic operations
• Challenging: Incorporate windows and time semantics
Flink community working with
Apache Calcite to draft a new model

State in stream processing
34
Stateless Streaming
(Apache Storm)
Stateful Streaming
(Apache Samza)
Accurate Stateful Streaming
(Apache Flink)
State sizes in Flink today: 10s gigabytes per operator
How to scale this to many terabytes?
• Queryable State
• Data driven triggers over large state

Large-state streaming
35
How to scale the stream processor state?
… and maintain fast checkpoint intervals?
… and have very fast recovery on machine failures?
More and more database techniques coming into Flink

36
I wrote a book!
Get it at
mapr.com/introduction-to-
apache-flink

3
@kostas_tzoumas | @ApacheFlink | @dataArtisans
Thank you!
We are hiring!

Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Kostas Tzoumas - Apache Flink®: State of the Union and What's Next

Semelhante a Kostas Tzoumas - Apache Flink®: State of the Union and What's Next (20)

Mais de Ververica

Mais de Ververica (12)

Último

Último (20)

Kostas Tzoumas - Apache Flink®: State of the Union and What's Next