The document evaluates and compares several streaming frameworks, including SQLStream, Pulsar, SPQR, Apache Spark, and Apache Flink. It assesses the frameworks based on usability, functionality, architecture, support, and non-functional requirements. For each framework, it provides information on architectural diagrams, window aggregation examples, and scores the frameworks in various categories. It concludes that Apache Spark and Apache Flink received the highest overall scores based on the evaluation.
17. Evaluation Streaming Frameworks
Alexander Kolb, Otto Group BI, Hamburg, Germany, 2015
Pulsar
17
Window Aggregation
1 create context MCContext start @now end after 60 seconds;
2
3 context MCContext
4 insert into ViewAgg select count(*) as views, prid
5 from PageView group by prid output snapshot when terminated;
21. Evaluation Streaming Frameworks
Alexander Kolb, Otto Group BI, Hamburg, Germany, 2015
SPQR
21
Window Aggregation
1 select productid, ecid, sum(quantity)
2 from views.win:time_batch(5 min)
3 group by productid, ecid
30. Evaluation Streaming Frameworks
Alexander Kolb, Otto Group BI, Hamburg, Germany, 2015
Summary
30
Use-case
Topic Unit
Framework
Pulsar.io SQLStre
am
SPQR Flink Spark
Time for building
the stream hours 40 35+
(POC)
8+
(POC)
13 4
Time for adding
missing
connector
hours 3 8 1 3 0.5
Points 3.14 2.06 3.44 4.16 4.45
31. Evaluation Streaming Frameworks
Alexander Kolb, Otto Group BI, Hamburg, Germany, 2015
List of Rating Aspects
31
DSL/DDL/UI for creating Pipelines / Required know-how to define new
Pipelines / Project documentation /Workflow / Testing Workflows / hot
deploying / redeploying of pipelines / dynamic topology changes /
Monitoring / Deployment / Dashboard for data visualization / Ease of
defining udf's / Merge / Sum / Count / Min/max/avg / Aggregate /
Transform / Parsing (xml/json/csv) / Group-by / Join / Ease of defining new
connectors / Kafka / WebSocket / JDBC / JMS / HDFS / File / Effort for
cluster deployment / Configuration effort / Supports YARN / Supports
Mesos / Scalability / Resilience /Predefined communication framework /
Dependencies / Flexibility / Expandability / Buffering/Pressure handling /
Partitioning/Parallelism / Strategy for Partitioning/ Parallelism? / Ordering /
Guarantees / State-Management / Fault tolerance / Licensing model /
Professional support available / Community Activity / License / Maturity /
Manageable code-base / Community Size