SlideShare uma empresa Scribd logo
1 de 15
Baixar para ler offline
Stream processing systems comparison
Yangjun Wang
Department of Information and Computer Science
Aalto University, School of Science
yangjun.wang@aalto.fi
January 20, 2016
Stream processing systems comparison
January 20, 2016
2/15
Introduction
Process model of many big data applications are changed
from batch processing to stream processing
batch processing has advantages in throughput, while
latency of stream processing is much shorter
stream processing could get very high throughput too
Stream processing systems comparison
January 20, 2016
3/15
Introduction
Process model of many big data applications are changed
from batch processing to stream processing
batch processing has advantages in throughput, while
latency of stream processing is much shorter
stream processing could get very high throughput too
Widely used stream processing systems: Storm, Spark
streaming, Flink, Samza
Stream processing systems comparison
January 20, 2016
4/15
Comparison
Processing model
Storm and Flink are real stream processing which process
record one by one
Spark streaming is micro-batch which process very small
batches continuously
Storm’s Trident also provides micro-batch API
Throughput of WordCount – skewed data
Flink – 300K/s (4 cores, 15 GB ROM)
Storm(ack enabled) – 5K/s node
Spark stream – (250 ∼ 2500(batch))K/s
Stream processing systems comparison
January 20, 2016
5/15
Comparison (cnt.)
Latency of WordCount – skewed data
Flink: around 50ms (90%)
Storm: around 55ms (90%)
Spark: 1s ∼ ... (depends on interval)
Stream processing systems comparison
January 20, 2016
6/15
Comparison (cnt.)
Usage of Spark and Flink
Flink and Spark provide many high-level operations which could
be used easily as:
stream1.flatMap(...)
.mapToPair(...)
.reduceByKey(...)
Usage of Storm
In storm applications, we need define stream sources(spout) all
process logic(bolt) by ourselves.
Stream processing systems comparison
January 20, 2016
7/15
Comparison (cnt.)
Usage
More work need be done in storm applications, but we get
more flexibility.
Flink provides low-level operators which are similar to
Storm Bolts such as OneInputStreamOperator,
TwoInputStreamOperator. These operators are not too
complex to use.
Spark streaming low-level operators are a little hard to use.
Spark streaming could also lose some ability because of
micro-batch processing model.
Stream processing systems comparison
January 20, 2016
8/15
Example
Problem
There are two streams: advertisement(advId, shownTime)
and click(advId, clickTime). How to get a stream that
contains all clicked advertisements (advId, shownTime,
clickTime) which are clicked in 10 minutes after shown?
Stream processing systems comparison
January 20, 2016
9/15
Example
Problem
There are two streams: advertisement(advId, shownTime)
and click(advId, clickTime). How to get a stream that
contains all clicked advertisements (advId, shownTime,
clickTime) which are clicked in 10 minutes after shown?
Solution of Storm
Implement a bolt which receives records from two spouts,
cache records and do join operation
Stream processing systems comparison
January 20, 2016
10/15
Example (cnt.)
Problems of Flink
1. Flink only provides join operation on the same window
2. Window without slides will cause data missing
3. Window with slides could introduce duplicate data
Stream processing systems comparison
January 20, 2016
11/15
Example (cnt.)
Problems of Flink
1. Flink only provides join operation on the same window
2. Window without slides will cause data missing
3. Window with slides could introduce duplicate data
Solution of Flink
Implement a join operator extend
TwoInputStreamOperator which is similar to
WindowOperator.
The self-implemented operator is similar to storm solution
at some point.
Stream processing systems comparison
January 20, 2016
12/15
Example (cnt.)
Problems of Spark
1. Spark doesn’t support event time join and watermark
2. Similar problems with Flink(2, 3)
Stream processing systems comparison
January 20, 2016
13/15
Example (cnt.)
Problems of Spark
1. Spark doesn’t support event time join and watermark
2. Similar problems with Flink(2, 3)
Solution of Spark
advertisement.window(11 mins, 1min)
.join(click.window(1min, 1min))
.filter(...)
Issues
Spark only supports join on processing time
Filter operations is base on event time
Data missing if advertisement records arrive later(delay)
Stream processing systems comparison
January 20, 2016
14/15
Summary
Comparison summary table:
Storm Spark Flink
Model stream micro-batch stream
Throughput low high high
Latency low high low
Usage complex easy easy
Flexible very flexible flexible inflexible
Stream processing systems comparison
January 20, 2016
15/15
Thanks

Mais conteúdo relacionado

Mais procurados

Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
Stephan Ewen
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 

Mais procurados (7)

I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
 
Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming Mikio Braun – Data flow vs. procedural programming
Mikio Braun – Data flow vs. procedural programming
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
Flink Forward Berlin 2017: Pramod Bhatotia, Do Le Quoc - StreamApprox: Approx...
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
FlowSim_presentation
FlowSim_presentationFlowSim_presentation
FlowSim_presentation
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 

Semelhante a Stream processing comparison

Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 

Semelhante a Stream processing comparison (20)

Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
 
An Optics Life
An Optics LifeAn Optics Life
An Optics Life
 
Comparison of Open-Source Data Stream Processing Engines: Spark Streaming, Fl...
Comparison of Open-Source Data Stream Processing Engines: Spark Streaming, Fl...Comparison of Open-Source Data Stream Processing Engines: Spark Streaming, Fl...
Comparison of Open-Source Data Stream Processing Engines: Spark Streaming, Fl...
 
Forecast 2014: TOSCA Proof of Concept
Forecast 2014: TOSCA Proof of ConceptForecast 2014: TOSCA Proof of Concept
Forecast 2014: TOSCA Proof of Concept
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Presemtation Tier Optimizations
Presemtation Tier OptimizationsPresemtation Tier Optimizations
Presemtation Tier Optimizations
 
Beyond the RTOS: A Better Way to Design Real-Time Embedded Software
Beyond the RTOS: A Better Way to Design Real-Time Embedded SoftwareBeyond the RTOS: A Better Way to Design Real-Time Embedded Software
Beyond the RTOS: A Better Way to Design Real-Time Embedded Software
 
Dataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice WayDataservices - Processing Big Data The Microservice Way
Dataservices - Processing Big Data The Microservice Way
 
Overview of QP Frameworks and QM Modeling Tools (Notes)
Overview of QP Frameworks and QM Modeling Tools (Notes)Overview of QP Frameworks and QM Modeling Tools (Notes)
Overview of QP Frameworks and QM Modeling Tools (Notes)
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 
Apache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms comparedApache Storm vs. Spark Streaming - two stream processing platforms compared
Apache Storm vs. Spark Streaming - two stream processing platforms compared
 
Spring Boot & Spring Cloud on PAS- Nate Schutta (1/2)
Spring Boot & Spring Cloud on PAS- Nate Schutta (1/2)Spring Boot & Spring Cloud on PAS- Nate Schutta (1/2)
Spring Boot & Spring Cloud on PAS- Nate Schutta (1/2)
 
NodeJS-OSN
NodeJS-OSNNodeJS-OSN
NodeJS-OSN
 
Apache flink
Apache flinkApache flink
Apache flink
 
Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016Trivento summercamp fast data 9/9/2016
Trivento summercamp fast data 9/9/2016
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 

Último

Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
David Celestin
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
ZurliaSoop
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Hung Le
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 

Último (20)

Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
Ready Set Go Children Sermon about Mark 16:15-20
Ready Set Go Children Sermon about Mark 16:15-20Ready Set Go Children Sermon about Mark 16:15-20
Ready Set Go Children Sermon about Mark 16:15-20
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORNLITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
LITTLE ABOUT LESOTHO FROM THE TIME MOSHOESHOE THE FIRST WAS BORN
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptxBEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
BEAUTIFUL PLACES TO VISIT IN LESOTHO.pptx
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptx
 
Call Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. Mumbai
Call Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. MumbaiCall Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. Mumbai
Call Girls Near The Byke Suraj Plaza Mumbai »¡¡ 07506202331¡¡« R.K. Mumbai
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Lions New Portal from Narsimha Raju Dichpally 320D.pptx
Lions New Portal from Narsimha Raju Dichpally 320D.pptxLions New Portal from Narsimha Raju Dichpally 320D.pptx
Lions New Portal from Narsimha Raju Dichpally 320D.pptx
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINESBIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
BIG DEVELOPMENTS IN LESOTHO(DAMS & MINES
 

Stream processing comparison

  • 1. Stream processing systems comparison Yangjun Wang Department of Information and Computer Science Aalto University, School of Science yangjun.wang@aalto.fi January 20, 2016
  • 2. Stream processing systems comparison January 20, 2016 2/15 Introduction Process model of many big data applications are changed from batch processing to stream processing batch processing has advantages in throughput, while latency of stream processing is much shorter stream processing could get very high throughput too
  • 3. Stream processing systems comparison January 20, 2016 3/15 Introduction Process model of many big data applications are changed from batch processing to stream processing batch processing has advantages in throughput, while latency of stream processing is much shorter stream processing could get very high throughput too Widely used stream processing systems: Storm, Spark streaming, Flink, Samza
  • 4. Stream processing systems comparison January 20, 2016 4/15 Comparison Processing model Storm and Flink are real stream processing which process record one by one Spark streaming is micro-batch which process very small batches continuously Storm’s Trident also provides micro-batch API Throughput of WordCount – skewed data Flink – 300K/s (4 cores, 15 GB ROM) Storm(ack enabled) – 5K/s node Spark stream – (250 ∼ 2500(batch))K/s
  • 5. Stream processing systems comparison January 20, 2016 5/15 Comparison (cnt.) Latency of WordCount – skewed data Flink: around 50ms (90%) Storm: around 55ms (90%) Spark: 1s ∼ ... (depends on interval)
  • 6. Stream processing systems comparison January 20, 2016 6/15 Comparison (cnt.) Usage of Spark and Flink Flink and Spark provide many high-level operations which could be used easily as: stream1.flatMap(...) .mapToPair(...) .reduceByKey(...) Usage of Storm In storm applications, we need define stream sources(spout) all process logic(bolt) by ourselves.
  • 7. Stream processing systems comparison January 20, 2016 7/15 Comparison (cnt.) Usage More work need be done in storm applications, but we get more flexibility. Flink provides low-level operators which are similar to Storm Bolts such as OneInputStreamOperator, TwoInputStreamOperator. These operators are not too complex to use. Spark streaming low-level operators are a little hard to use. Spark streaming could also lose some ability because of micro-batch processing model.
  • 8. Stream processing systems comparison January 20, 2016 8/15 Example Problem There are two streams: advertisement(advId, shownTime) and click(advId, clickTime). How to get a stream that contains all clicked advertisements (advId, shownTime, clickTime) which are clicked in 10 minutes after shown?
  • 9. Stream processing systems comparison January 20, 2016 9/15 Example Problem There are two streams: advertisement(advId, shownTime) and click(advId, clickTime). How to get a stream that contains all clicked advertisements (advId, shownTime, clickTime) which are clicked in 10 minutes after shown? Solution of Storm Implement a bolt which receives records from two spouts, cache records and do join operation
  • 10. Stream processing systems comparison January 20, 2016 10/15 Example (cnt.) Problems of Flink 1. Flink only provides join operation on the same window 2. Window without slides will cause data missing 3. Window with slides could introduce duplicate data
  • 11. Stream processing systems comparison January 20, 2016 11/15 Example (cnt.) Problems of Flink 1. Flink only provides join operation on the same window 2. Window without slides will cause data missing 3. Window with slides could introduce duplicate data Solution of Flink Implement a join operator extend TwoInputStreamOperator which is similar to WindowOperator. The self-implemented operator is similar to storm solution at some point.
  • 12. Stream processing systems comparison January 20, 2016 12/15 Example (cnt.) Problems of Spark 1. Spark doesn’t support event time join and watermark 2. Similar problems with Flink(2, 3)
  • 13. Stream processing systems comparison January 20, 2016 13/15 Example (cnt.) Problems of Spark 1. Spark doesn’t support event time join and watermark 2. Similar problems with Flink(2, 3) Solution of Spark advertisement.window(11 mins, 1min) .join(click.window(1min, 1min)) .filter(...) Issues Spark only supports join on processing time Filter operations is base on event time Data missing if advertisement records arrive later(delay)
  • 14. Stream processing systems comparison January 20, 2016 14/15 Summary Comparison summary table: Storm Spark Flink Model stream micro-batch stream Throughput low high high Latency low high low Usage complex easy easy Flexible very flexible flexible inflexible
  • 15. Stream processing systems comparison January 20, 2016 15/15 Thanks