The Paper "Efficient Window Aggregation with General Stream Slicing" by Jonas Traub, Philipp M. Grulich, Alejandro Rodriguez Cuellar, Sebastian Breß, Tilmann Rabl, and Volker Markl was selected as best paper of the International Conference on Extending Database Technology (EDBT) 2019.
Abstract:
Window aggregation is a core operation in data stream processing. Existing aggregation techniques focus on reducing latency, eliminating redundant computations, and minimizing memory usage. However, each technique operates under different assumptions with respect to workload characteristics such as properties of aggregation functions (e.g., invertible, associative), window types
(e.g., sliding, sessions), windowing measures (e.g., time- or countbased), and stream (dis)order. Violating the assumptions of a technique can deem it unusable or drastically reduce its performance.
In this paper, we present the first general stream slicing technique for window aggregation. General stream slicing automatically adapts to workload characteristics to improve performance without sacrificing its general applicability. As a prerequisite, we identify workload characteristics which affect the performance and applicability of aggregation techniques. Our experiments show that general stream slicing outperforms alternative concepts by up to one order of magnitude.
Efficient Window Aggregation with General Stream Slicing
1. Jonas Traub Philipp M. Grulich Alejandro Rodríguez Cuéllar Sebastian Breß
Asterios Katsifodimos Tilmann Rabl Volker Markl
Efficient Window Aggregation with
General Stream Slicing
22nd International Conference on Extending Database Technology
March 26-29, 2019, Lisbon, Portugal
2. Stream Processing Pipelines
27.03.2019 Efficient Window Aggregation with General Stream Slicing 2
A stream processing pipeline is a series of concurrently running operators.
3. Stream Processing Pipelines
27.03.2019 Efficient Window Aggregation with General Stream Slicing 2
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
4. Stream Processing Pipelines
27.03.2019 Efficient Window Aggregation with General Stream Slicing 2
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
53
5. Stream Processing Pipelines
27.03.2019 Efficient Window Aggregation with General Stream Slicing 2
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
8
15. We store partial aggregates instead of all tuples. Small memory footprint.
Stream Slicing Example
27.03.2019 Efficient Window Aggregation with General Stream Slicing 9
17. We assign each tuple to exactly one slice. O(1) per-tuple complexity.
Stream Slicing Example
27.03.2019 Efficient Window Aggregation with General Stream Slicing 10
19. We require just a few computation steps to calculate final aggregates. Low latency.
Stream Slicing Example
27.03.2019 Efficient Window Aggregation with General Stream Slicing 11
21. We share partial aggregations among all users and queries. Efficiency by preventing redundancy.
Stream Slicing Example
27.03.2019 Efficient Window Aggregation with General Stream Slicing 12
31. General Slicing Core
The General Slicing Core adapts to work load characteristics
and provides extension point for user-defined window types and aggregation functions.
27.03.2019 Efficient Window Aggregation with General Stream Slicing 15
32. General Stream Slicing Internals
27.03.2019 Efficient Window Aggregation with General Stream Slicing 16
33. General Stream Slicing Internals
27.03.2019 Efficient Window Aggregation with General Stream Slicing 16
Part 1: Three Fundamental Operations on Slices
34. General Stream Slicing Internals
27.03.2019 Efficient Window Aggregation with General Stream Slicing 16
Merge Slices
Part 1: Three Fundamental Operations on Slices
35. General Stream Slicing Internals
27.03.2019 Efficient Window Aggregation with General Stream Slicing 16
Merge Slices Split Slices
Part 1: Three Fundamental Operations on Slices
36. General Stream Slicing Internals
27.03.2019 Efficient Window Aggregation with General Stream Slicing 16
Merge Slices Split Slices Update Slices
Part 1: Three Fundamental Operations on Slices
37. General Stream Slicing Internals
27.03.2019 Efficient Window Aggregation with General Stream Slicing 16
Merge Slices Split Slices Update Slices
Part 1: Three Fundamental Operations on Slices
Part 2: Adapt to Workload Characteristics:
38. General Stream Slicing Internals
27.03.2019 Efficient Window Aggregation with General Stream Slicing 16
Merge Slices Split Slices Update Slices
Part 1: Three Fundamental Operations on Slices
Part 2: Adapt to Workload Characteristics:
Do we need to store original tuples?
39. General Stream Slicing Internals
27.03.2019 Efficient Window Aggregation with General Stream Slicing 16
Merge Slices Split Slices Update Slices
Part 1: Three Fundamental Operations on Slices
Part 2: Adapt to Workload Characteristics:
Do we need to store original tuples?
Do we potentially need to split slices?
40. General Stream Slicing Internals
27.03.2019 Efficient Window Aggregation with General Stream Slicing 16
Merge Slices Split Slices Update Slices
Part 1: Three Fundamental Operations on Slices
Part 2: Adapt to Workload Characteristics:
Do we need to store original tuples?
Do we potentially need to split slices?
Do we potentially need
to remove tuples from slices?
41. General Stream Slicing Internals
27.03.2019 Efficient Window Aggregation with General Stream Slicing 16
Merge Slices Split Slices Update Slices
Part 1: Three Fundamental Operations on Slices
Part 2: Adapt to Workload Characteristics:
Do we need to store original tuples?
Do we potentially need to split slices?
Do we potentially need
to remove tuples from slices?
42. General Stream Slicing Internals
27.03.2019 Efficient Window Aggregation with General Stream Slicing 16
Merge Slices Split Slices Update Slices
Part 1: Three Fundamental Operations on Slices
Part 2: Adapt to Workload Characteristics:
Do we need to store original tuples?
Do we potentially need to split slices?
Do we potentially need
to remove tuples from slices?
General Stream Slicing adapts to current workload characteristics.
43. Impact of Workload Characteristics (Example)
27.03.2019 Efficient Window Aggregation with General Stream Slicing 17
44. Impact of Workload Characteristics (Example)
27.03.2019 Efficient Window Aggregation with General Stream Slicing 17
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
45. Impact of Workload Characteristics (Example)
27.03.2019 Efficient Window Aggregation with General Stream Slicing 17
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
Count-based tumbling window
with a length of 5 tuples.
46. Impact of Workload Characteristics (Example)
27.03.2019 Efficient Window Aggregation with General Stream Slicing 17
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Count-based tumbling window
with a length of 5 tuples.
47. Impact of Workload Characteristics (Example)
27.03.2019 Efficient Window Aggregation with General Stream Slicing 17
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Count-based tumbling window
with a length of 5 tuples.
11 13 12
48. Impact of Workload Characteristics (Example)
27.03.2019 Efficient Window Aggregation with General Stream Slicing 17
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
11 13 12
What if the stream is out-of-order?
49. Impact of Workload Characteristics (Example)
27.03.2019 Efficient Window Aggregation with General Stream Slicing 17
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
50. Impact of Workload Characteristics (Example)
27.03.2019 Efficient Window Aggregation with General Stream Slicing 17
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
Out-of-order Tuple
51. Impact of Workload Characteristics (Example)
27.03.2019 Efficient Window Aggregation with General Stream Slicing 17
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
Out-of-order Tuple
52. Impact of Workload Characteristics (Example)
27.03.2019 Efficient Window Aggregation with General Stream Slicing 17
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
53. Impact of Workload Characteristics (Example)
27.03.2019 Efficient Window Aggregation with General Stream Slicing 17
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
13 12
58. Impact of Workload Characteristics (Example)
27.03.2019 Efficient Window Aggregation with General Stream Slicing 17
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 123 1+ -5 + - 3
5
What if the aggregation function is not invertible?
59. In-order Processing with Context Free Windows
27.03.2019 Efficient Window Aggregation with General Stream Slicing 18
60. In-order Processing with Context Free Windows
27.03.2019 Efficient Window Aggregation with General Stream Slicing 18
Slicing techniques scale to large numbers of concurrent windows.
61. Impact of Stream Order
27.03.2019 Efficient Window Aggregation with General Stream Slicing 19
62. Impact of Stream Order
27.03.2019 Efficient Window Aggregation with General Stream Slicing 19
Slicing techniques are robust against out-of-order tuples.
63. Impact of Aggregation Functions (20% out-of-order)
27.03.2019 Efficient Window Aggregation with General Stream Slicing 20
64. Impact of Aggregation Functions (20% out-of-order)
27.03.2019 Efficient Window Aggregation with General Stream Slicing 20
Stream Slicing performs well on many different kinds of aggregation functions.
65. Efficient Window Aggregation with General Stream Slicing
27.03.2019 Efficient Window Aggregation with General Stream Slicing 21
66. Efficient Window Aggregation with General Stream Slicing
• We identify workload characteristics which impact
applicability and performance of window aggregation techniques.
27.03.2019 Efficient Window Aggregation with General Stream Slicing 21
67. Efficient Window Aggregation with General Stream Slicing
• We identify workload characteristics which impact
applicability and performance of window aggregation techniques.
• We present a generally applicable and highly efficient solution for
streaming window aggregation.
27.03.2019 Efficient Window Aggregation with General Stream Slicing 21
68. Efficient Window Aggregation with General Stream Slicing
• We identify workload characteristics which impact
applicability and performance of window aggregation techniques.
• We present a generally applicable and highly efficient solution for
streaming window aggregation.
• We show that general stream slicing is generally applicable and
offers better performance than alternative approaches.
27.03.2019 Efficient Window Aggregation with General Stream Slicing 21
69. Efficient Window Aggregation with General Stream Slicing
• We identify workload characteristics which impact
applicability and performance of window aggregation techniques.
• We present a generally applicable and highly efficient solution for
streaming window aggregation.
• We show that general stream slicing is generally applicable and
offers better performance than alternative approaches.
27.03.2019 Efficient Window Aggregation with General Stream Slicing 21
tu-berlin-dima.github.io/scotty-window-processor
Open Source Repository: