SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
Apache Flink 
Fast and reliable big data processing 
Till Rohrmann 
trohrmann@apache.org
What is Apache Flink? 
• Project undergoing incubation in the Apache 
Software Foundation 
• Originating from the Stratosphere research project 
started at TU Berlin in 2009 
• http://flink.incubator.apache.org 
• 59 contributors (doubled in ~4 months) 
• Has awesome squirrel logo
What is Apache Flink? 
Flink Client Master' 
Worker' 
Worker'
Current state 
• Fast - much faster than Hadoop, faster than Spark 
in many cases 
• Reliable - does not suffer from memory problems
Outline of this talk 
• Introduction to Flink 
• Distributed PageRank with Flink 
• Other Flink features 
• Flink roadmap and closing
Where to locate Flink in the 
Open Source landscape? 
Crunch" 
5" 5 
Hive" 
Mahout" 
MapReduce" 
Flink" 
Spark" Storm" 
Yarn" Mesos" 
HDFS" 
Cascading" 
Tez" 
Pig" 
Applica3ons$ 
Data$processing$ 
engines$ 
App$and$resource$ 
management$ 
Storage,$streams$ HBase" KaAa" 
…"
Distributed data sets 
DataSet 
A 
DataSet 
B 
DataSet 
C 
A (1) 
A (2) 
B (1) 
B (2) 
C (1) 
C (2) 
X 
X 
Y 
Y 
Program 
Parallel Execution 
X Y 
Operator X Operator Y
Log Analysis 
LogFile 
Filter 
Users Join 
Result 
1 ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment(); 
2 
3 DataSet<Tuple2<Integer, String>> log = env.readCsvFile(logInput) 
.types(Integer.class, String.class); 
4 DataSet<Tuple2<String, Integer>> users = env.readCsvFile(userInput) 
.types(String.class, Integer.class); 
5 
6 DataSet<String> usersWhoDownloadedFlink = log 
7 .filter( 
8 (msg) -> msg.f1.contains(“flink.jar”) 
9 ) 
10 .join(users).where(0).equalTo(1) 
11 .with( 
12 (msg,user,Collector<String> out) -> { 
14 out.collect(user.f0); 
15 } 
16 ); 
17 
18 usersWhoDownloadedFlink.print(); 
19 
20 env.execute(“Log filter example”);
PageRank 
• Algorithm which made Google 
a multi billion dollar business 
• Ranking of search results 
• Model: Random surfer 
• Follows links 
• Randomly selects arbitrary 
website
How can we solve the 
problem? 
PageRankDS = { 
(1, 0.3) 
(2, 0.5) 
(3, 0.2) 
} 
AdjacencyDS = { 
(1, [2, 3]) 
(2, [1]) 
(3, [1, 2]) 
}
PageRank{ 
node: Int 
rank: Double 
} 
Adjacency{ 
node: Int 
neighbours: List[Int] 
}
case class Pagerank(node: Int, rank: Double) 
case class Adjacency(node: Int, neighbours: Array[Int]) 
! 
def main(args: Array[String]): Unit = { 
val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment 
! 
val initialPagerank:DataSet[Pagerank] = createInitialPagerank(numVertices, env) 
val adjacency: DataSet[Adjacency] = createRandomAdjacency(numVertices, sparsity, env) 
! 
val solution = initialPagerank.iterate(100){ 
pagerank => 
val partialPagerank = pagerank.join(adjacency). 
where(“node”). 
equalTo(“node”). 
flatMap{ 
// generating the partial pageranks 
pair => { 
val (Pagerank(node, rank), Adjacency(_, neighbours)) = pair 
val length = neighbours.length 
neighbours.map{ 
neighbour=> 
Pagerank(neighbour, dampingFactor*rank/length) 
} :+ Pagerank(node, (1-dampingFactor)/numVertices) 
} 
} 
! 
// adding the partial pageranks up 
partialPagerank. 
groupBy(“node”). 
reduce{ 
(left, right) => Pagerank(left.node, left.rank + right.rank) 
} 
! 
} 
solution.print() 
env.execute("Flink pagerank.") 
}
Common%API% 
Storage% 
Streams% 
Flink%Op;mizer% 
Hybrid%Batch/Streaming%Run;me% 
Files%! HDFS%! S3! 
Cluster% 
Manager%! Na;ve YARN%! EC2%! ! 
Scala%API% 
(batch)% 
Graph%API% 
(„Spargel“)% 
JDBC! Rabbit% Redis%! 
Azure! KaRa! MQ! …% 
Java% 
Collec;ons% 
Streams%Builder% 
Apache%Tez% 
Python%API% 
Java%API% 
(streaming)% 
Apache% 
MRQL% 
Batch! 
Streaming! 
Java%API% 
(batch)% 
Local% 
Execu;on%
Memory management 
• Flink manages its own memory on the heap 
• Caching and data processing happens in managed memory 
• Allows graceful spilling, never out of memory exceptions 
JVM$Heap) 
Unmanaged& 
Heap& 
Flink&Managed& 
Heap& 
Network&Buffers& 
User Code 
Flink Runtime 
Shuffles/Broadcasts
Hadoop compatibility 
• Flink supports out of the box 
• Hadoop data types (Writables) 
• Hadoop input/output formats 
• Hadoop functions and object model 
Input& Map& Reduce& Output& 
S DataSet Red DataSet Join DataSet 
Output& 
DataSet Map DataSet 
Input&
Flink Streaming 
Word count with Java API 
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); 
! 
DataSet<Tuple2<String, Integer>> result = env 
.readTextFile(input) 
.flatMap(sentence -> asList(sentence.split(“ “))) 
.map(word -> new Tuple2<>(word, 1)) 
.groupBy(0) 
.aggregate(SUM, 1); 
Word count with Flink Streaming 
StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment(); 
! 
DataStream<Tuple2<String, Integer>> result = env 
.readTextFile(input) 
.flatMap(sentence -> asList(sentence.split(“ “))) 
.map(word -> new Tuple2<>(word, 1)) 
.groupBy(0) 
.sum(1);
Streaming throughput
Write once, run everywhere! 
Cluster((Batch)( 
Local( 
Debugging( 
Cluster((Streaming)( 
Flink&Run)me&or&Apache&Tez& 
As(Java(Collec;on( 
Programs( 
Embedded( 
(e.g.,(Web(Container)(
Write once, run with any 
data! 
Execution$ Reusing%partition/sort, 
Run$on$a$sample$ 
on$the$laptop. 
Hash%vs.%Sort, 
Partition%vs.%Broadcast, 
Caching, 
Run$a$month$later$ 
after$the$data$evolved$. 
Plan$A. 
Execution$ 
Plan$B. 
Run$on$large$files$ 
on$the$cluster. 
Execution$ 
Plan$C.
Little tuning required 
• Requires no memory thresholds to configure 
• Requires no complicated network configs 
• Requires no serializers to be configured 
• Programs adjust to data automatically
Flink roadmap 
• Flink has a major release every 3 months 
• Finer grained fault-tolerance 
• Logical (SQL-like) field addressing 
• Python API 
• Flink Streaming, Lambda architecture 
support 
• Flink on Tez 
• ML on Flink (Mahout DSL) 
• Graph DSL on Flink 
• … and much more
http://flink.incubator.apache.org 
! 
github.com/apache/incubator-flink 
! 
@ApacheFlink

Mais conteúdo relacionado

Mais procurados

Introduction To Flink
Introduction To FlinkIntroduction To Flink
Introduction To FlinkKnoldus Inc.
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudNoritaka Sekiyama
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured StreamingKnoldus Inc.
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Spark streaming
Spark streamingSpark streaming
Spark streamingWhiteklay
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekVenkata Naga Ravi
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Databricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapKostas Tzoumas
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaDatabricks
 

Mais procurados (20)

Introduction To Flink
Introduction To FlinkIntroduction To Flink
Introduction To Flink
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Apache flink
Apache flinkApache flink
Apache flink
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Processing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeekProcessing Large Data with Apache Spark -- HasGeek
Processing Large Data with Apache Spark -- HasGeek
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 

Destaque

Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingFlink Forward
 
Aljoscha Krettek – Notions of Time
Aljoscha Krettek – Notions of TimeAljoscha Krettek – Notions of Time
Aljoscha Krettek – Notions of TimeFlink Forward
 
Flink Apachecon Presentation
Flink Apachecon PresentationFlink Apachecon Presentation
Flink Apachecon PresentationGyula Fóra
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemFlink Forward
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Flink Forward
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFlink Forward
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityFabian Hueske
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkFlink Forward
 
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkAnwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkFlink Forward
 
Assaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleAssaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleFlink Forward
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeSimon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeFlink Forward
 
Fabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFlink Forward
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinFlink Forward
 
Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Flink Forward
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Flink Forward
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionFlink Forward
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsFlink Forward
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaFlink Forward
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 

Destaque (20)

Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
 
Aljoscha Krettek – Notions of Time
Aljoscha Krettek – Notions of TimeAljoscha Krettek – Notions of Time
Aljoscha Krettek – Notions of Time
 
Flink Apachecon Presentation
Flink Apachecon PresentationFlink Apachecon Presentation
Flink Apachecon Presentation
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
Vyacheslav Zholudev – Flink, a Convenient Abstraction Layer for Yarn?
 
Fabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on FlinkFabian Hueske – Cascading on Flink
Fabian Hueske – Cascading on Flink
 
Apache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce CompatibilityApache Flink - Hadoop MapReduce Compatibility
Apache Flink - Hadoop MapReduce Compatibility
 
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
 
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkAnwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
 
Assaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at ScaleAssaf Araki – Real Time Analytics at Scale
Assaf Araki – Real Time Analytics at Scale
 
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-ComposeSimon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
Simon Laws – Apache Flink Cluster Deployment on Docker and Docker-Compose
 
Fabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and BytesFabian Hueske – Juggling with Bits and Bytes
Fabian Hueske – Juggling with Bits and Bytes
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
 
Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming
 
Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced Apache Flink Training: DataStream API Part 2 Advanced
Apache Flink Training: DataStream API Part 2 Advanced
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Kamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed ExperimentsKamal Hakimzadeh – Reproducible Distributed Experiments
Kamal Hakimzadeh – Reproducible Distributed Experiments
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & KafkaMohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 

Semelhante a Introduction to Apache Flink - Fast and reliable big data processing

Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Uwe Printz
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Uwe Printz
 
Introduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe SeilerIntroduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe SeilerCodemotion
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"Giivee The
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsDataStax Academy
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesCorley S.r.l.
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with PythonDonald Miner
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangaloreappaji intelhunt
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasMapR Technologies
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedwhoschek
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupNed Shawa
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyRohit Kulkarni
 
Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8Speedment, Inc.
 

Semelhante a Introduction to Apache Flink - Fast and reliable big data processing (20)

Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)Introduction to the Hadoop Ecosystem (SEACON Edition)
Introduction to the Hadoop Ecosystem (SEACON Edition)
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Introduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe SeilerIntroduction to the hadoop ecosystem by Uwe Seiler
Introduction to the hadoop ecosystem by Uwe Seiler
 
OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"OCF.tw's talk about "Introduction to spark"
OCF.tw's talk about "Introduction to spark"
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Big data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting LanguagesBig data, just an introduction to Hadoop and Scripting Languages
Big data, just an introduction to Hadoop and Scripting Languages
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with Python
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Java days gbg online
Java days gbg onlineJava days gbg online
Java days gbg online
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Ingesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmedIngesting hdfs intosolrusingsparktrimmed
Ingesting hdfs intosolrusingsparktrimmed
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, GuindyScaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
Scaling up with hadoop and banyan at ITRIX-2015, College of Engineering, Guindy
 
Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8Speedment - Reactive programming for Java8
Speedment - Reactive programming for Java8
 

Mais de Till Rohrmann

Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Till Rohrmann
 
Apache flink 1.7 and Beyond
Apache flink 1.7 and BeyondApache flink 1.7 and Beyond
Apache flink 1.7 and BeyondTill Rohrmann
 
Elastic Streams at Scale @ Flink Forward 2018 Berlin
Elastic Streams at Scale @ Flink Forward 2018 BerlinElastic Streams at Scale @ Flink Forward 2018 Berlin
Elastic Streams at Scale @ Flink Forward 2018 BerlinTill Rohrmann
 
Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkTill Rohrmann
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Till Rohrmann
 
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup BerlinApache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup BerlinTill Rohrmann
 
Apache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OSApache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OSTill Rohrmann
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4Till Rohrmann
 
Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017Till Rohrmann
 
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017Till Rohrmann
 
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...Till Rohrmann
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Till Rohrmann
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Till Rohrmann
 
Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Till Rohrmann
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Till Rohrmann
 
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Till Rohrmann
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinTill Rohrmann
 
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Till Rohrmann
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupTill Rohrmann
 

Mais de Till Rohrmann (19)

Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
Future of Apache Flink Deployments: Containers, Kubernetes and More - Flink F...
 
Apache flink 1.7 and Beyond
Apache flink 1.7 and BeyondApache flink 1.7 and Beyond
Apache flink 1.7 and Beyond
 
Elastic Streams at Scale @ Flink Forward 2018 Berlin
Elastic Streams at Scale @ Flink Forward 2018 BerlinElastic Streams at Scale @ Flink Forward 2018 Berlin
Elastic Streams at Scale @ Flink Forward 2018 Berlin
 
Scaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache FlinkScaling stream data pipelines with Pravega and Apache Flink
Scaling stream data pipelines with Pravega and Apache Flink
 
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
 
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup BerlinApache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
Apache Flink Meets Apache Mesos And DC/OS @ Mesos Meetup Berlin
 
Apache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OSApache Flink® Meets Apache Mesos® and DC/OS
Apache Flink® Meets Apache Mesos® and DC/OS
 
From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4From Apache Flink® 1.3 to 1.4
From Apache Flink® 1.3 to 1.4
 
Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017Apache Flink and More @ MesosCon Asia 2017
Apache Flink and More @ MesosCon Asia 2017
 
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
Redesigning Apache Flink's Distributed Architecture @ Flink Forward 2017
 
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Sys...
 
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
Dynamic Scaling: How Apache Flink Adapts to Changing Workloads (at FlinkForwa...
 
Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?
 
Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016
 
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
 
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
Fault Tolerance and Job Recovery in Apache Flink @ FlinkForward 2015
 
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in BerlinInteractive Data Analysis with Apache Flink @ Flink Meetup in Berlin
Interactive Data Analysis with Apache Flink @ Flink Meetup in Berlin
 
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
Computing recommendations at extreme scale with Apache Flink @Buzzwords 2015
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
 

Último

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Intelisync
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 

Último (20)

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)Introduction to Decentralized Applications (dApps)
Introduction to Decentralized Applications (dApps)
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 

Introduction to Apache Flink - Fast and reliable big data processing

  • 1. Apache Flink Fast and reliable big data processing Till Rohrmann trohrmann@apache.org
  • 2. What is Apache Flink? • Project undergoing incubation in the Apache Software Foundation • Originating from the Stratosphere research project started at TU Berlin in 2009 • http://flink.incubator.apache.org • 59 contributors (doubled in ~4 months) • Has awesome squirrel logo
  • 3. What is Apache Flink? Flink Client Master' Worker' Worker'
  • 4. Current state • Fast - much faster than Hadoop, faster than Spark in many cases • Reliable - does not suffer from memory problems
  • 5. Outline of this talk • Introduction to Flink • Distributed PageRank with Flink • Other Flink features • Flink roadmap and closing
  • 6. Where to locate Flink in the Open Source landscape? Crunch" 5" 5 Hive" Mahout" MapReduce" Flink" Spark" Storm" Yarn" Mesos" HDFS" Cascading" Tez" Pig" Applica3ons$ Data$processing$ engines$ App$and$resource$ management$ Storage,$streams$ HBase" KaAa" …"
  • 7. Distributed data sets DataSet A DataSet B DataSet C A (1) A (2) B (1) B (2) C (1) C (2) X X Y Y Program Parallel Execution X Y Operator X Operator Y
  • 8. Log Analysis LogFile Filter Users Join Result 1 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); 2 3 DataSet<Tuple2<Integer, String>> log = env.readCsvFile(logInput) .types(Integer.class, String.class); 4 DataSet<Tuple2<String, Integer>> users = env.readCsvFile(userInput) .types(String.class, Integer.class); 5 6 DataSet<String> usersWhoDownloadedFlink = log 7 .filter( 8 (msg) -> msg.f1.contains(“flink.jar”) 9 ) 10 .join(users).where(0).equalTo(1) 11 .with( 12 (msg,user,Collector<String> out) -> { 14 out.collect(user.f0); 15 } 16 ); 17 18 usersWhoDownloadedFlink.print(); 19 20 env.execute(“Log filter example”);
  • 9. PageRank • Algorithm which made Google a multi billion dollar business • Ranking of search results • Model: Random surfer • Follows links • Randomly selects arbitrary website
  • 10. How can we solve the problem? PageRankDS = { (1, 0.3) (2, 0.5) (3, 0.2) } AdjacencyDS = { (1, [2, 3]) (2, [1]) (3, [1, 2]) }
  • 11. PageRank{ node: Int rank: Double } Adjacency{ node: Int neighbours: List[Int] }
  • 12.
  • 13. case class Pagerank(node: Int, rank: Double) case class Adjacency(node: Int, neighbours: Array[Int]) ! def main(args: Array[String]): Unit = { val env: ExecutionEnvironment = ExecutionEnvironment.getExecutionEnvironment ! val initialPagerank:DataSet[Pagerank] = createInitialPagerank(numVertices, env) val adjacency: DataSet[Adjacency] = createRandomAdjacency(numVertices, sparsity, env) ! val solution = initialPagerank.iterate(100){ pagerank => val partialPagerank = pagerank.join(adjacency). where(“node”). equalTo(“node”). flatMap{ // generating the partial pageranks pair => { val (Pagerank(node, rank), Adjacency(_, neighbours)) = pair val length = neighbours.length neighbours.map{ neighbour=> Pagerank(neighbour, dampingFactor*rank/length) } :+ Pagerank(node, (1-dampingFactor)/numVertices) } } ! // adding the partial pageranks up partialPagerank. groupBy(“node”). reduce{ (left, right) => Pagerank(left.node, left.rank + right.rank) } ! } solution.print() env.execute("Flink pagerank.") }
  • 14. Common%API% Storage% Streams% Flink%Op;mizer% Hybrid%Batch/Streaming%Run;me% Files%! HDFS%! S3! Cluster% Manager%! Na;ve YARN%! EC2%! ! Scala%API% (batch)% Graph%API% („Spargel“)% JDBC! Rabbit% Redis%! Azure! KaRa! MQ! …% Java% Collec;ons% Streams%Builder% Apache%Tez% Python%API% Java%API% (streaming)% Apache% MRQL% Batch! Streaming! Java%API% (batch)% Local% Execu;on%
  • 15. Memory management • Flink manages its own memory on the heap • Caching and data processing happens in managed memory • Allows graceful spilling, never out of memory exceptions JVM$Heap) Unmanaged& Heap& Flink&Managed& Heap& Network&Buffers& User Code Flink Runtime Shuffles/Broadcasts
  • 16. Hadoop compatibility • Flink supports out of the box • Hadoop data types (Writables) • Hadoop input/output formats • Hadoop functions and object model Input& Map& Reduce& Output& S DataSet Red DataSet Join DataSet Output& DataSet Map DataSet Input&
  • 17. Flink Streaming Word count with Java API ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); ! DataSet<Tuple2<String, Integer>> result = env .readTextFile(input) .flatMap(sentence -> asList(sentence.split(“ “))) .map(word -> new Tuple2<>(word, 1)) .groupBy(0) .aggregate(SUM, 1); Word count with Flink Streaming StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); ! DataStream<Tuple2<String, Integer>> result = env .readTextFile(input) .flatMap(sentence -> asList(sentence.split(“ “))) .map(word -> new Tuple2<>(word, 1)) .groupBy(0) .sum(1);
  • 19. Write once, run everywhere! Cluster((Batch)( Local( Debugging( Cluster((Streaming)( Flink&Run)me&or&Apache&Tez& As(Java(Collec;on( Programs( Embedded( (e.g.,(Web(Container)(
  • 20. Write once, run with any data! Execution$ Reusing%partition/sort, Run$on$a$sample$ on$the$laptop. Hash%vs.%Sort, Partition%vs.%Broadcast, Caching, Run$a$month$later$ after$the$data$evolved$. Plan$A. Execution$ Plan$B. Run$on$large$files$ on$the$cluster. Execution$ Plan$C.
  • 21. Little tuning required • Requires no memory thresholds to configure • Requires no complicated network configs • Requires no serializers to be configured • Programs adjust to data automatically
  • 22. Flink roadmap • Flink has a major release every 3 months • Finer grained fault-tolerance • Logical (SQL-like) field addressing • Python API • Flink Streaming, Lambda architecture support • Flink on Tez • ML on Flink (Mahout DSL) • Graph DSL on Flink • … and much more