Towards Unifying Stream
Processing over Central and
Near-the-Edge Data Centers
Licentiate Seminar
November 14, 2016, Kista...
Outline
• Introduction and Research Objectives
• Background
• Contributions
• Conclusions and Future Works
2
Introduction
3
Real-time Analytics
Examples:
● Server logs
● User clicks
● Social network
interactions
4
Real-time Analytics
Examples:
● Server logs
● User clicks
● Social network
interactions
5
Examples:
● Server logs
● User clicks
● Social network
interactions
New consumers of data
analytics are joining the
Cloud ...
Internet
Geo-Distributed Data
7
Geo-Distributed Infrastructure
• Multiple central data
centers
• Several near-the-edge
resources: cloudlets,
telecom cloud...
Thesis Goal
To enable effective and
efficient stream processing
in geo-distributed settings Internet
Data Center
Data Cent...
Research Hypothesis
By placing stream processing applications
closer to data sources and sinks, we can
improve the respons...
Research Objectives
1. Design decentralized methods for stream
processing
2. Provide network-aware placement of distribute...
1. Decentralized Methods
Identify limitations and
possible improvements in the
existing systems and
algorithms
12
1. Decentralized Methods
Identify limitations and
possible improvements in the
existing systems and
algorithms
System
Apac...
1. Decentralized Methods
Identify limitations and
possible improvements in the
existing systems and
algorithms
System
Apac...
1. Decentralized Methods
Identify limitations and
possible improvements in the
existing systems and
algorithms
System Algo...
1. Decentralized Methods
Identify limitations and
possible improvements in the
existing systems and
algorithms
System Algo...
2. Network-aware Placement
Utilize Network-
awareness
17
2. Network-aware Placement
Utilize Network-
awareness
Edge resources
Micro Data Centers
18
2. Network-aware Placement
Utilize Network-
awareness
Edge resources Central and micro data centers
Micro Data Centers Spa...
Thesis Contributions
1. Stream Processing in Community Network Clouds
(FiCloud ’15)
2. Smart Partitioning of Geo-Distribut...
Background
21
Stream Processing
• Processing data as they are being streamed.
22
Stream Processing
• Processing data as they are being streamed.
• Continuous flow of data items, i.e., tuples
• Examples:
...
Stream Processing
• Processing data as they are being streamed.
• Continuous flow of data items, i.e., tuples
• Examples:
...
Streaming Graphs
streaming edges
Examples:
• Social Networks
• Internet of Things: the
connection between devices
and othe...
• Partition large graphs for
distributing them across disks,
machines, or data centers
• Graph elements are assigned
to pa...
Apache Storm
• Master-workers architecture
• Spout: source
• Bolt: operator/sink
• Parallelism: number of parallel tasks
p...
Contributions
28
Thesis Contributions
1. Stream Processing in Community Network Clouds
(FiCloud ’15)
2. Smart Partitioning of Geo-Distribut...
Contribution 1:
Stream Processing in
Community Network Clouds
Ken Danniswara, Hooman Peiro Sajjad, Ahmad Al-Shishtawy, Vla...
Summary
• Objective: to find limitations of Storm for
running in a geo-distributed environment
• We evaluate Apache Storm,...
Limitations of Storm
• Inefficient scheduling of stream processing
applications across geo-distributed resources
• Network...
Smart Partitioning of Geo-Distributed Resources
to Improve Cloud Network Performance
Hooman Peiro Sajjad, Fatemeh Rahimian...
Edge resources are connected
through/co-located with the network
devices, e.g., routers, base-stations, or
the community n...
Problem
• High variance in the network
performance and the network cost:
• links heterogeneity
• over-utilization of links...
Problem Definition
Different placements of the application
components on the resources affects the
performance of the whol...
Problem Definition
Different placements of the application
components on the resources affects the
performance of the whol...
Network-aware grouping of geo-
distributed resources into a set
of computing clusters, each
called a micro data center.
Ou...
Intra-Micro Data Centers:
Network overhead and latency
Inter-Micro Data Centers:
Network overhead and latency
Our Solution...
• Based on Random Walk
• Decentralized
• No global knowledge of the topology
• High quality results
Diffusion Based Commun...
●Geolocation based (KMeans)
●Modularity based community detection (Centralized)
●Diffusion based community detection(Decen...
Single, KMeans, Centralized, Decentralized
Evaluation: Number of Links and Bandwidth
Min available bandwidth between each ...
Number of intra-micro data center links.
Single, KMeans, Centralized, Decentralized
Evaluation: Number of Links and Bandwi...
Latency between each pair of nodes
inside the micro data centers.
Evaluation: Latency
44
• Placing distributed applications inside micro data centers
reduces the network overhead and the network latency.
• Our p...
Boosting Vertex-Cut Partitioning For
Streaming Graphs
Hooman Peiro Sajjad, Amir H. Payberah, Fatemeh Rahimian, Vladimir Vl...
Vertex-Cut Partitioning
P1 P2
Efficient for power-law graphs
47
• Low replication factor: the average number
of replicas for each vertex
• Balanced partitions with respect to the
number ...
Slow partitioning time
Low replication factor
Centralized partitioner Distributed partitioner
Fast partitioning time
High ...
• Streaming Vertex-Cut partitioner
• Parallel and Distributed
• Multiple streaming data sources
• Scales without degrading...
Core
Partitioning Policy
Tumbling Window
Local
State
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window...
Core
Partitioning Policy
Tumbling Window
Loca
l
Stat
e
Subpartitioner 1
Core
Partitioning Policy
Tumbling Window
Loca
l
St...
Partitioning Policy
Local
State
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window
Local
State
Subparti...
Local
State
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window
Local
State
Subpartitioner n
Edge stream...
Each subpartitioner has a
local state, which includes
information about the
edges processed locally
Partitioning Policy
Su...
Shared-state is the global
state accessible by all
subpartitioners Partitioning Policy
Subpartitioner 1
Edge stream
Core
P...
Partitioning Policy
Local
State
Subpartitioner 1
Edge stream
Core
Partitioning Policy
Tumbling Window
Local
State
Subparti...
Evaluation: Distributed Configuration
Internet topology
|V|=1.7M
|E|=11M
58
Evaluation: Distributed Configuration
Internet topology
|V|=1.7M
|E|=11M
Orkut Social
Network
|V|=3.1M
|E|=117M
59
• HoVerCut is a parallel and distributed partitioner
• We can employ different partitioning policies in a scalable
fashion...
SpanEdge: Towards Unifying Stream
Processing over Central and Near-the-
Edge Data Centers
Hooman Peiro Sajjad, Ken Dannisw...
How to enable and achieve an
effective and efficient stream
processing given the following:
• Multiple central and near-th...
How to enable and achieve an
effective and efficient stream
processing given the following:
• Multiple central and near-th...
It is hard to program and
maintain stream processing
applications both for the
edge and for central data
centers
Data Cent...
A multi-data center stream processing solution that
provides:
• an expressive programming model to unify programming on a ...
Internet
Data Center
Data Center
Data Center
Micro Data
Center Micro Data
Center
Micro Data
Center
SpanEdge Architecture
66
Two tiers:
• First tier includes central
data centers
• Second tier includes near-
the-edge data centers
1st tier
1st tier...
Two types of workers:
• Hub-worker
• Spoke-worker
1st tier
1st tier
1st tier
2nd tier 2nd tier
2nd tier
2nd tier 2nd tier
...
1st tier
1st tier
1st tier
2nd tier 2nd tier
2nd tier
2nd tier 2nd tier
Hub-Worker Hub-Worker
Spoke-Worker Spoke-Worker Sp...
1st tier
1st tier
1st tier
2nd tier 2nd tier
2nd tier
2nd tier 2nd tier
Hub-Worker Hub-Worker
Spoke-Worker Spoke-Worker Sp...
Task Groupings
71
S1
OP1 OP2 OP3
R1
OP4 R2
Output based on
aggregation of the
locally processed data
Output based on the
analysis of the loc...
S1
OP1 OP2 OP3
R1
OP4 R2
Output based on
aggregation of the
locally processed data
Output based on the
analysis of the loc...
• Local-Task: close to
the data source on
spoke-workers
• Global-Task: for
processing data
generated from
local-tasks, pla...
Converts a stream
processing graph to an
execution graph and
assigns the created tasks to
workers.
1st tier
1st tier
1st t...
2. A map of streaming
data sources
1. A stream
processing graph
3. The network topology
between workers
Scheduler
Source
T...
Evaluation Setup
CORE Network
Emulator:
• 2 central and 9
near-the-edge data
centers A A1 A2
GA
GAB
L1
G1
B B1 B2
L2
RA
RB...
Bandwidth Consumption
Evaluation: Bandwidth
78
Partial results Aggregated results
Evaluation: Latency
79
SpanEdge:
• facilitates programming on a geo-distributed
infrastructure including central and near-the-edge data
centers
•...
Conclusions and Future Work
81
Conclusions
To enable effective and efficient stream processing in
geo-distributed settings, we proposed solutions both on...
Scheduling of stream processing applications with respect
to:
• Dynamic network conditions
• Resource heterogeneity on the...
Acknowledgements
CLOMMUNITY FP-7 EU-Project http://clommunity-project.eu
E2E Clouds Research Project http://e2e-clouds.org...
Thank You!
85
Próximos SlideShares
Carregando em…5
×

Towards Unifying Stream Processing over Central and Near-the-edge Data Centers

94 visualizações

Publicada em

Licentiate seminar 2016, KTH, Sweden

1 comentário
1 gostou
Estatísticas
Notas
Sem downloads
Visualizações
Visualizações totais
94
No SlideShare
0
A partir de incorporações
0
Número de incorporações
0
Ações
Compartilhamentos
0
Downloads
5
Comentários
1
Gostaram
1
Incorporações 0
Nenhuma incorporação

Nenhuma nota no slide

Towards Unifying Stream Processing over Central and Near-the-edge Data Centers

  1. 1. Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers Licentiate Seminar November 14, 2016, Kista, Sweden Hooman Peiro Sajjad shps@kth.se
  2. 2. Outline • Introduction and Research Objectives • Background • Contributions • Conclusions and Future Works 2
  3. 3. Introduction 3
  4. 4. Real-time Analytics Examples: ● Server logs ● User clicks ● Social network interactions 4
  5. 5. Real-time Analytics Examples: ● Server logs ● User clicks ● Social network interactions 5
  6. 6. Examples: ● Server logs ● User clicks ● Social network interactions New consumers of data analytics are joining the Cloud that require low- latency results Real-time Analytics 6
  7. 7. Internet Geo-Distributed Data 7
  8. 8. Geo-Distributed Infrastructure • Multiple central data centers • Several near-the-edge resources: cloudlets, telecom clouds, and Fog Internet Data Center Data Center Data Center Near-the-edge resources Near-the-edge resources Near-the-edge resources 8
  9. 9. Thesis Goal To enable effective and efficient stream processing in geo-distributed settings Internet Data Center Data Center Data Center Near-the-edge resources Near-the-edge resources Near-the-edge resources 9
  10. 10. Research Hypothesis By placing stream processing applications closer to data sources and sinks, we can improve the response time and reduce the network overhead. 10
  11. 11. Research Objectives 1. Design decentralized methods for stream processing 2. Provide network-aware placement of distributed stream processing applications across geo- distributed infrastructures. 11
  12. 12. 1. Decentralized Methods Identify limitations and possible improvements in the existing systems and algorithms 12
  13. 13. 1. Decentralized Methods Identify limitations and possible improvements in the existing systems and algorithms System Apache Storm 13
  14. 14. 1. Decentralized Methods Identify limitations and possible improvements in the existing systems and algorithms System Apache Storm SpanEdge 14
  15. 15. 1. Decentralized Methods Identify limitations and possible improvements in the existing systems and algorithms System Algorithm Apache Storm Streaming Graph Partitioning SpanEdge 15
  16. 16. 1. Decentralized Methods Identify limitations and possible improvements in the existing systems and algorithms System Algorithm Apache Storm Streaming Graph Partitioning SpanEdge HoVerCut 16
  17. 17. 2. Network-aware Placement Utilize Network- awareness 17
  18. 18. 2. Network-aware Placement Utilize Network- awareness Edge resources Micro Data Centers 18
  19. 19. 2. Network-aware Placement Utilize Network- awareness Edge resources Central and micro data centers Micro Data Centers SpanEdge 19
  20. 20. Thesis Contributions 1. Stream Processing in Community Network Clouds (FiCloud ’15) 2. Smart Partitioning of Geo-Distributed Resources to Improve Cloud Network Performance (CloudNet ‘15) 3. Boosting Vertex-Cut Streaming Graph Partitioning (BigData Congress ‘16), Best Paper Award 4. SpanEdge: Towards Unifying Stream Processing over Central and Near the Edge Data Centers (SEC ‘16) 20
  21. 21. Background 21
  22. 22. Stream Processing • Processing data as they are being streamed. 22
  23. 23. Stream Processing • Processing data as they are being streamed. • Continuous flow of data items, i.e., tuples • Examples: • temperature values • motion detection data • traffic information 23
  24. 24. Stream Processing • Processing data as they are being streamed. • Continuous flow of data items, i.e., tuples • Examples: • temperature values • motion detection data • traffic information • Stream processing application: a graph of operators (e.g., aggregations or filters) 24
  25. 25. Streaming Graphs streaming edges Examples: • Social Networks • Internet of Things: the connection between devices and other entities • The graph elements are streamed continuously over time 25
  26. 26. • Partition large graphs for distributing them across disks, machines, or data centers • Graph elements are assigned to partitions as they are being streamed • No global knowledge Partitioner P1 P2 Pp streaming edges Streaming Graph Partitioning 26
  27. 27. Apache Storm • Master-workers architecture • Spout: source • Bolt: operator/sink • Parallelism: number of parallel tasks per component 27
  28. 28. Contributions 28
  29. 29. Thesis Contributions 1. Stream Processing in Community Network Clouds (FiCloud ’15) 2. Smart Partitioning of Geo-Distributed Resources to Improve Cloud Network Performance (CloudNet ‘15) 3. Boosting Vertex-Cut Streaming Graph Partitioning (BigData Congress ‘16), Best Paper Award 4. SpanEdge: Towards Unifying Stream Processing over Central and Near the Edge Data Centers (SEC ‘16) 29
  30. 30. Contribution 1: Stream Processing in Community Network Clouds Ken Danniswara, Hooman Peiro Sajjad, Ahmad Al-Shishtawy, Vladimir Vlassov 3rd IEEE International Conference on Future Internet of Things and Cloud (FiCloud), 2015
  31. 31. Summary • Objective: to find limitations of Storm for running in a geo-distributed environment • We evaluate Apache Storm, a widely used open-source stream processing system • We emulate a Community Network Cloud environment • The community network Cloud host applications on the edge resources 31
  32. 32. Limitations of Storm • Inefficient scheduling of stream processing applications across geo-distributed resources • Network communication among Storm’s components: • Actual data streams • Stream groupings: data transfer among tasks • Maintenance overhead (workers-manager): • Scheduling time • Failure detection 32
  33. 33. Smart Partitioning of Geo-Distributed Resources to Improve Cloud Network Performance Hooman Peiro Sajjad, Fatemeh Rahimian, Vladimir Vlassov 4th IEEE International Conference on Cloud Networking (CloudNet), 2015 Contribution 2: 33
  34. 34. Edge resources are connected through/co-located with the network devices, e.g., routers, base-stations, or the community network nodes. Geo-Distributed Resources 34
  35. 35. Problem • High variance in the network performance and the network cost: • links heterogeneity • over-utilization of links • different number of hops between the communicating nodes • Their Network topology is not optimal for hosting distributed data-intensive applications 35
  36. 36. Problem Definition Different placements of the application components on the resources affects the performance of the whole network. 36
  37. 37. Problem Definition Different placements of the application components on the resources affects the performance of the whole network. 37
  38. 38. Network-aware grouping of geo- distributed resources into a set of computing clusters, each called a micro data center. Our Solution: Micro Data Centers 38
  39. 39. Intra-Micro Data Centers: Network overhead and latency Inter-Micro Data Centers: Network overhead and latency Our Solution: Micro Data Centers 39
  40. 40. • Based on Random Walk • Decentralized • No global knowledge of the topology • High quality results Diffusion Based Community Detection 40
  41. 41. ●Geolocation based (KMeans) ●Modularity based community detection (Centralized) ●Diffusion based community detection(Decentralized) ●Real data set from community network: 52 nodes and 224 links Evaluation: Clustering Methods 41
  42. 42. Single, KMeans, Centralized, Decentralized Evaluation: Number of Links and Bandwidth Min available bandwidth between each pair of nodes inside micro data centers; 42
  43. 43. Number of intra-micro data center links. Single, KMeans, Centralized, Decentralized Evaluation: Number of Links and Bandwidth Min available bandwidth between each pair of nodes inside micro data centers; 43
  44. 44. Latency between each pair of nodes inside the micro data centers. Evaluation: Latency 44
  45. 45. • Placing distributed applications inside micro data centers reduces the network overhead and the network latency. • Our proposed decentralized community detection solution finds clusters with qualities competitive to the centralized community detection method. Summary 45
  46. 46. Boosting Vertex-Cut Partitioning For Streaming Graphs Hooman Peiro Sajjad, Amir H. Payberah, Fatemeh Rahimian, Vladimir Vlassov, and Seif Haridi 5th IEEE International Congress on Big Data (IEEE BigData Congress), 2016 Contribution 3: 46
  47. 47. Vertex-Cut Partitioning P1 P2 Efficient for power-law graphs 47
  48. 48. • Low replication factor: the average number of replicas for each vertex • Balanced partitions with respect to the number of edges A Good Vertex-Cut Partitioning 48
  49. 49. Slow partitioning time Low replication factor Centralized partitioner Distributed partitioner Fast partitioning time High replication factor HoVerCut Partitioning Time vs. Partition Quality 49
  50. 50. • Streaming Vertex-Cut partitioner • Parallel and Distributed • Multiple streaming data sources • Scales without degrading the quality of partitions • Employs different partitioning policies HoVerCut ... 50
  51. 51. Core Partitioning Policy Tumbling Window Local State Subpartitioner 1 Edge stream Core Partitioning Policy Tumbling Window Local State Subpartitioner n Edge stream Shared State Async Async Architecture Overview 51
  52. 52. Core Partitioning Policy Tumbling Window Loca l Stat e Subpartitioner 1 Core Partitioning Policy Tumbling Window Loca l Stat e Subpartitioner n Edge stream Async Async • Input graphs are streamed by their edges • Each subpartitioner receives an exclusive subset of the edges Shared State Architecture: Input 52
  53. 53. Partitioning Policy Local State Subpartitioner 1 Edge stream Core Partitioning Policy Tumbling Window Local State Subpartitioner n Edge stream Async Async Subpartitioners collect a number of incoming edges in a window of a certain size Tumbling Window Core Shared State Architecture: Configurable Window 53
  54. 54. Local State Subpartitioner 1 Edge stream Core Partitioning Policy Tumbling Window Local State Subpartitioner n Edge stream Async Async Each subpartitioner assigns the edges to the partitions based on a given policy Partitioning Policy Tumbling Window Shared State Core Architecture: Partitioning Policy 54
  55. 55. Each subpartitioner has a local state, which includes information about the edges processed locally Partitioning Policy Subpartitioner 1 Edge stream Core Partitioning Policy Tumbling Window Local State Subpartitioner n Edge stream Async Async Local State Tumbling Window Shared State Core Architecture: Local State 55
  56. 56. Shared-state is the global state accessible by all subpartitioners Partitioning Policy Subpartitioner 1 Edge stream Core Partitioning Policy Tumbling Window Local State Subpartitioner n Edge stream Async Async Local State Tumbling Window Shared State Core Architecture: Local State 56
  57. 57. Partitioning Policy Local State Subpartitioner 1 Edge stream Core Partitioning Policy Tumbling Window Local State Subpartitioner n Edge stream Async Async The core is HoVerCut’s main algorithm parametrised with partitioning policy and the window size Core Shared State Tumbling Window Architecture: Core 57
  58. 58. Evaluation: Distributed Configuration Internet topology |V|=1.7M |E|=11M 58
  59. 59. Evaluation: Distributed Configuration Internet topology |V|=1.7M |E|=11M Orkut Social Network |V|=3.1M |E|=117M 59
  60. 60. • HoVerCut is a parallel and distributed partitioner • We can employ different partitioning policies in a scalable fashion • We can scale HoVerCut to partition larger graphs without degrading the quality of partitions • https://github.com/shps/HoVerCut Conclusion 60
  61. 61. SpanEdge: Towards Unifying Stream Processing over Central and Near-the- Edge Data Centers Hooman Peiro Sajjad, Ken Danniswara, Ahmad Al-Shishtawy, and Vladimir Vlassov The First IEEE/ACM Symposium on Edge Computing (SEC), 2016 Contribution 4: 61
  62. 62. How to enable and achieve an effective and efficient stream processing given the following: • Multiple central and near-the-edge DCs • Multiple data sources and sinks • Multiple stream processing applications Problem Definition Internet Data Center Data Center Data Center Micro Data Center Micro Data Center Micro Data Center 62
  63. 63. How to enable and achieve an effective and efficient stream processing given the following: • Multiple central and near-the-edge DCs • Multiple data sources and sinks • Multiple stream processing applications Problem Definition Internet Data Center Data Center Data Center Micro Data Center Micro Data Center Micro Data Center And: • Data is streamed from sources to their closest near-the-edge DC • DCs are connected with heterogeneous network 63
  64. 64. It is hard to program and maintain stream processing applications both for the edge and for central data centers Data Center Data Center Data Center Micro Data Center Micro Data CenterMonitor Traffic Monitor Traffic Aggregate Anomaly statistics Hard to Program 64
  65. 65. A multi-data center stream processing solution that provides: • an expressive programming model to unify programming on a geo- distributed infrastructure. • a run-time system to manage (schedule and execute) stream processing applications across the DCs. SpanEdge 65
  66. 66. Internet Data Center Data Center Data Center Micro Data Center Micro Data Center Micro Data Center SpanEdge Architecture 66
  67. 67. Two tiers: • First tier includes central data centers • Second tier includes near- the-edge data centers 1st tier 1st tier 1st tier 2nd tier 2nd tier 2nd tier 2nd tier 2nd tier SpanEdge Architecture 67
  68. 68. Two types of workers: • Hub-worker • Spoke-worker 1st tier 1st tier 1st tier 2nd tier 2nd tier 2nd tier 2nd tier 2nd tier Hub-Worker Hub-Worker Spoke-Worker Spoke-Worker Spoke-Worker Spoke-Worker Spoke-Worker Hub-Worker Manager SpanEdge Architecture 68
  69. 69. 1st tier 1st tier 1st tier 2nd tier 2nd tier 2nd tier 2nd tier 2nd tier Hub-Worker Hub-Worker Spoke-Worker Spoke-Worker Spoke-Worker Spoke-Worker Spoke-Worker Hub-Worker SpanEdge Architecture Manager 69
  70. 70. 1st tier 1st tier 1st tier 2nd tier 2nd tier 2nd tier 2nd tier 2nd tier Hub-Worker Hub-Worker Spoke-Worker Spoke-Worker Spoke-Worker Spoke-Worker Spoke-Worker Hub-Worker SpanEdge Architecture Manager 70
  71. 71. Task Groupings 71
  72. 72. S1 OP1 OP2 OP3 R1 OP4 R2 Output based on aggregation of the locally processed data Output based on the analysis of the local data (fast) Task Groupings 72
  73. 73. S1 OP1 OP2 OP3 R1 OP4 R2 Output based on aggregation of the locally processed data Output based on the analysis of the local data (fast) • Fast results based on the data available near-the-edge • Avoid sending unnecessary tuples over the WAN Task Groupings 73
  74. 74. • Local-Task: close to the data source on spoke-workers • Global-Task: for processing data generated from local-tasks, placed on a hub-worker S1 OP1 OP2 OP3 R1 OP4 R2 Output based on aggregation of the locally processed data Output based on the analysis of the local data (fast) L1 G1 Task Groupings 74
  75. 75. Converts a stream processing graph to an execution graph and assigns the created tasks to workers. 1st tier 1st tier 1st tier 2nd tier 2nd tier 2nd tier 2nd tier 2nd tier Hub-Worker Hub-Worker Spoke-Worker Spoke-Worker Spoke-Worker Spoke-Worker Spoke-Worker Hub-Worker Manager The manager runs the scheduler. Scheduler 75
  76. 76. 2. A map of streaming data sources 1. A stream processing graph 3. The network topology between workers Scheduler Source Type Spoke-Worker src1 {sw1, sw2, sw3} src2 {sw2,sw4} …. …. A map of tasks to workers Scheduler 76
  77. 77. Evaluation Setup CORE Network Emulator: • 2 central and 9 near-the-edge data centers A A1 A2 GA GAB L1 G1 B B1 B2 L2 RA RB GB RAB 77
  78. 78. Bandwidth Consumption Evaluation: Bandwidth 78
  79. 79. Partial results Aggregated results Evaluation: Latency 79
  80. 80. SpanEdge: • facilitates programming on a geo-distributed infrastructure including central and near-the-edge data centers • provides a run-time system to manage stream processing applications across the DCs • https://github.com/Telolets/StormOnEdge Conclusions 80
  81. 81. Conclusions and Future Work 81
  82. 82. Conclusions To enable effective and efficient stream processing in geo-distributed settings, we proposed solutions both on system level and algorithm to fill the gaps for: • A multi-data center stream processing system that utilizes both central and near-the-edge data centers • Network-aware placement of stream processing components and network-aware structuring of edge resources • Efficient state-sharing in distributed streaming graph partitioning 82
  83. 83. Scheduling of stream processing applications with respect to: • Dynamic network conditions • Resource heterogeneity on the edge • Mission-critical applications and applications with different priorities Reducing the latency for: • Scheduling • Failure detection and failure recovery Potential Future Work 83
  84. 84. Acknowledgements CLOMMUNITY FP-7 EU-Project http://clommunity-project.eu E2E Clouds Research Project http://e2e-clouds.org My advisor, Vladimir Vlassov My secondary advisors, Fatemeh Rahimian, and Seif Haridi My co-authors My colleagues at KTH and SICS 84
  85. 85. Thank You! 85

×