5. BigData success story
Map /
Reduce
OSDI 04
Map /
Reduce
OSDI 04
Hadoop1
Dryad
EuroâSys
07
Dryad
EuroâSys
07 TEZ
RDDs
HotCloudâ10,
NSDIâ12
RDDs
HotCloudâ10,
NSDIâ12
Spark
PACTs
SOCCâ10, VLDBâ12
PACTs
SOCCâ10, VLDBâ12 Flink
Map/Reduce extended to DAG
Backtracking recovery
Map/Reduce extended to DAG
Backtracking recovery
Small recoverable tasks
Sequencial code
Small recoverable tasks
Sequencial code
Functional
implementation of Dryad
recovery
Functional
implementation of Dryad
recovery
Cyclic Graph (and incremental construction)
Query Processing runtime embed in DAG
engine
Cyclic Graph (and incremental construction)
Query Processing runtime embed in DAG
engine
Stonebraker/
Cetintemel /
Zdonik
2005
Stonebraker/
Cetintemel /
Zdonik
2005
6. â
Keep data moving
â
Low latency on critical path
â
Query on stream
â
High level language
â
Handle stream imperfection
â
Timeout (ex: avg of last 25 securities)
â
Out of order (must leave window open)
â
Generate predictable outcomes
â
Time ordered
Criteria for stream processing (1/2)
7. â
Integrate stored / streaming data
â
Uniform language for both stored and streamed data
â
Combine streamed and stored data
â
Data safety / availability
â
Resistant to failure
â
Partition and scale automatically
â
Process and respond instantaneously
â
100 000 msg / s
Criteria for stream processing (2/2)
9. The stack
Data Processing engineData Processing engine
User requirementUser requirement
App and ressource managementApp and ressource management
Storage / streamStorage / stream
13. Word count
ïThe hello world
// read test file or in Memory, and generate a set of String
DataSet<String> text = getTextDataSet(env);
DataSet<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1â
.groupBy(0)
.sum(1);
14. Word count
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
(or,1)(or,1)
Flatmap(tojenizer)
groupby
sum
15. Data in memory
public static final String[] WORDS = new String[] {
"To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
"Or to take arms against a sea of troubles,",
"And by opposing end them?--To die,--to sleep,--",
"No more; and by a sleep to say we end",
"The heartache, and the thousand natural shocks",
"That flesh is heir to,--'tis a consummation",
"Devoutly to be wish'd. To die,--to sleep;--",
âŠ.
17. With POJO
public static class Word {
// fields
private String word;
private Integer frequency;
// constructors
public Word() { }
public Word(String word, int i) {
this.word = word;
this.frequency = i; }
// getters setters
// to String
@Override
public String toString() {
return "Word="+word+" freq="+frequency;
}
18. Pojo
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
Word 1 {to,1}Word 1 {to,1}
Word 2 {be,1}Word 2 {be,1}
Word 3 {or,1}Word 3 {or,1}
Word 1 {to,1}
Word 5 {to,1}
Word 1 {to,1}
Word 5 {to,1}
Word 2 {be,2}
Word 6 {be,1}
Word 2 {be,2}
Word 6 {be,1}
Word 3 {be,1}Word 3 {be,1}
Word7 {to,2}Word7 {to,2}
Word8 {be,2}Word8 {be,2}
Word9 {or,1}Word9 {or,1}
Flatmap(tokenizer)
groupby
sum
19. JDBC
(âTo be, or not to be,--that is the question:--")(âTo be, or not to be,--that is the question:--")
("Whether 'tis nobler in the mind to suffer")("Whether 'tis nobler in the mind to suffer")
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
(or,1)(or,1)
Map +
Flatmap(tokenizer)
groupby
sum
hamlet
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
20. Stream
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
(or,1)(or,1)
21. Stream
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
22. Stream
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
23. Stream
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
(or,1)(or,1)
24. Stream
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)
(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
(or,1)(or,1)
25. Stream
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)
(or,1)
(to,2)(to,2)
(be,2)(be,2)
(or,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
(or,1)(or,1)
26. Multiple âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",âTo be, or not to be,--that is the question:--",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
(or,1)(or,1)
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",âTo be, or not to be,--that is the question:--",
âTo be, or not to be,--that is the question:--",âTo be, or not to be,--that is the question:--",
27. Multiple
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",âTo be, or not to be,--that is the question:--",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
......
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
âTo be, or not to be,--that is the question:--",âTo be, or not to be,--that is the question:--",
âTo be, or not to be,--that is the question:--",âTo be, or not to be,--that is the question:--",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)
(to,1)
(to,1)
(to,1)
(be,1)
(be,1)
(be,1)
(be,1)
Groupby + sum
(to,6)(to,6)
(be,6)(be,6)
(or,3)(or,3)
......
...... ......
33. ïTuples avec des types primitifs
DataSet<Tuple2<String, Integer>> wordCounts = env.fromElements(
new Tuple2<String, Integer>("hello", 1),
new Tuple2<String, Integer>("world", 2));
ïPojo (constructor + get/set)
public class WordWithCount {
public String word;
public int count;
public WordCount() {}
public WordCount(String word, int count) {
this.word = word;
this.count = count;
}
}
ïHadoop org.apache.hadoop.Writable interface
Data
34. ï//local file system
DataSet<String> localLines =
env.readTextFile("file:///path/to/my/textfile");
ï// read text file from a HDFS running at nnHost:nnPort
DataSet<String> hdfsLines =
env.readTextFile("hdfs://nnHost:nnPort/path/to/my/textfile");
ï// read a CSV file with three fields
DataSet<Tuple3<Integer, String, Double>> csvInput =
env.readCsvFile("hdfs:///the/CSV/file") .types(Integer.class, String.class,
Double.class);
ï// create a set from some given elements
DataSet<String> value = env.fromElements("Foo", "bar", "foobar", "fubar");
Data sources : File based
35. ï// Read data from a relational database using the JDBC input format
DataSet<Tuple2<String, Integer> dbData =
env.createInput( // create and configure input format
JDBCInputFormat.buildJDBCInputFormat()
.setDrivername("org.apache.derby.jdbc.EmbeddedDriver")
.setDBUrl("jdbc:derby:memory:persons")
.setQuery("select name, age from persons")
.finish(),
// specify type information for DataSet
new TupleTypeInfo(Tuple2.class, STRING_TYPE_INFO,
INT_TYPE_INFO) );
Data sources
36. // text
data DataSet<String> textData = // [...]
// write DataSet to a file on the local file system
textData.writeAsText("file:///my/result/on/localFS");
// write DataSet to a file on a HDFS with a namenode running at nnHost:nnPort
textData.writeAsText("hdfs://nnHost:nnPort/my/result/on/localFS");
// write DataSet to a file and overwrite the file if it exists
textData.writeAsText("file:///my/result/on/localFS", WriteMode.OVERWRITE);
// tuples as lines with pipe as the separator "a|b|c"
DataSet<Tuple3<String, Integer, Double>> values = // [...]
values.writeAsCsv("file:///path/to/the/result/file", "n", "|");
Data Sinks
37. Variable and storage
DataSet<Tuple...> large = env.readCsv(...);
DataSet<Tuple...> medium = env.readCsv(...);
DataSet<Tuple...> small = env.readCsv(...);
DataSet<Tuple...> LargeAndMedium = large.join(medium)
.where(3).equals(1)
.with(new JoinFunction() { ... });
DataSet<Tuple...> LargeMediumAndSmall= small.join(joined1)
.where(0).equals(2)
.with(new JoinFunction() { ... });
DataSet<Tuple...> result = LargeMediumAndSmall.groupBy(3).aggregate(MAX, 2);
DataSet<Tuple...> otherresult = LargeMedium.groupBy(3).aggregate(MAX, 2);
DataSet<Tuple...> oneMoreresult = Large.groupBy(3).aggregate(MAX, 2);
48. We have resources, letâs optimize it !
CodeCode
Flink
Job
Mana
ger
Job
Mana
ger
Execution
Plan
Execution
Plan
DataData
ResultResult
DataData
ResultResult
DataData
ResultResult
DataData
ResultResult
49. Distributed Runtime
49
ïMaster (Job Manager) handles
job submission, scheduling, and
metadata
ïWorkers (Task Managers)
execute operations
ïData can be streamed between
nodes
ïAll operators start
in-memory and gradually
go out-of-core
50. How the magic happen
- Flink Runtime
- Flink Optimizer
50
51. ï§ The optimizer is the
component that selects
an execution plan for a
Common API program
ï§ Think of an AI system
manipulating your
program for you ï
ï§ But donât be scared â it
works
âą Relational databases
have been doing this
for decades â Flink
ports the technology to
API-based systems
Flink Optimizer
51
53. ïForwarded fields
@ForwardedFields("f0->f2")
public class MyMap implements MapFunction<Tuple2<âŠ>, Tuple3<âŠ>> {
@Override public Tuple3<âŠ> map(Tuple2<âŠ> val) {
return new Tuple3<âŠ>("foo", val.f1 / 2, val.f0);} }
Some fancy stuff to help him
54. ïPartitioning
ïPartitioning controls how individual data points of a stream are
distributed/ordering among the parallel instances of the transformation operators.
There are several partitioning types supported in Flink Streaming:
ïEx :
ïForward(default): Forward partitioning directs the output data to the next operator
on the same machine (if possible) avoiding expensive network I/O
ïShuffle: Shuffle partitioning randomly partitions the output data stream to the next
operator using uniform distribution.
ïRebalance: Rebalance partitioning directs the output data stream to the next
operator in a round-robin fashion
ïBroadcast: Broadcast partitioning sends the output data stream to all parallel
instances of the next operator. Usage: dataStream.broadcast()
Some fancy stuff to help him
60. The growing Flink stack
60
Flink Optimizer Flink Stream Builder
Common API
Scala API Java API
Python
API
(upcoming)
Graph API
Apache
MRQL
Flink Local Runtime
Embedded
environment
(Java collections)
Local
Environment
(for debugging)
Remote environment
(Regular cluster execution)
Apache Tez
Data
storage
HDFSFiles S3 JDBC Redis
Rabbit
MQ
Kafka
Azure
tables
âŠ
Single node execution Standalone or YARN cluster
62. Flink Roadmap
ïCurrently being discussed by the Flink community
ïFlink has a major release every 3 months, and one or more bug-fixing
releases between major releases
ïCaveat: rough roadmap, depends on volunteer work, outcome of
community discussion, and Apache open source processes
62
63. Roadmap for 2015 (highlights)
Q1 Q2 Q3
APIs Logical
Query
integration
Additional
operators
Interactive
programs
Interactive
Scala shell
SQL-on-
Flink
Optimizer Semantic
annotations
HCatalog
integration
Optimizer
hints
Runtime Dual engine
(blocking &
pipelining)
Fine-grained
fault
tolerance
Dynamic
memory
allocation
Streaming Better
memory
manageme
nt
More
operators in
API
At-least-
once
processing
guarantees
Unify batch
and
streaming
Exactly-
once
processing
guarantees
ML library First version Additional
algorithms
Mahout
integration
Graph
library
First version
Integratio
n
Tez, Samoa Mahout
63
64. Integration with other projects
ïMachine Learning
â Samoa (incubating):
distributed streaming
machine learning (ML)
framework
ïApache Tez (run complex directed-
acyclic-graph of tasks for
processing data ) (simplify Pig,
Hive task definition)
ïStorage
â Tachyon(Tachyon is a
memory-centric distributed
storage system)
ïMahout (Data analytics)
â H2O (distributed scalable
machine learning system)
ïApache Hive (High level
langage for data processing)
â
Expected Q3/Q4 2015
ïApache Zepelin (inc.) A web-
based notebook that enables
interactive data analytics.
64
65. And many moreâŠ
ïRuntime: even better performance and robustness
ïUsing off-heap memory, dynamic memory allocation
ïImprovements to the Flink optimizer
ïIntegration with HCatalog, better statistics
ïRuntime optimization
ïStreaming graph and ML pipeline libraries
65
67. ïFlink is optimized for cyclic or iterative processes by using iterative
transformations on collections.
ïFlink streaming processes data streams as true streams, i.e., data
elements are immediately "pipelined" though a streaming program as
soon as they arrive. This allows to perform flexible window operations
on streams.
ïBuilt-in optimizer
Flink in one slide