SlideShare a Scribd company logo
1 of 50
Data Streaming in
IoT and Big Data Analytics
Vincenzo Gulisano, Ph.D.
Agenda
• Motivation
• The data streaming processing paradigm
• Challenges and research questions
• Conclusions
• Bibliography
Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 3
Agenda
• Motivation
• The data streaming processing paradigm
• Challenges and research questions
• Conclusions
• Bibliography
Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 4
Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 5
IoT enables for increased awareness, security, power-efficiency, ...
large IoT systems are complex
traditional data analysis techniques alone are not adequate!
AMIs [1,2,3,4] VNs [5,6]
• demand-response
• scheduling [7]
• micro-grids
• detection of medium size blackouts [8]
• detection of non technical losses
• ...
6
• autonomous driving
• platooning
• accident detection [9]
• variable tolls [9]
• congestion monitoring [10]
• ...
IoT enables for increased awareness, security, power-efficiency, ...
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
AMIs VNs
7
large IoT systems are complex
The data streaming paradigm and its use in Fog architectures
Characteristics [15]:
1. edge location
2. location awareness
3. low latency
4. geographical distribution
5. large-scale
6. support for mobility
7. real-time interactions
8. predominance of wireless
9. heterogeneous
10. interoperability / federation
11. interaction with the cloud
Vincenzo Gulisano
8
traditional data analysis techniques alone are not adequate! [13,14]
The data streaming paradigm and its use in Fog architectures
1. does the infrastructure allow for billions of
readings per day to be transferred continuously?
2. the latency incurred while transferring data, does
that undermine the utility of the analysis?
3. is it secure to concentrate all the data in a single
place? [11]
4. is it smart to give away fine-grained data? [12]
Vincenzo Gulisano
Agenda
• Motivation
• The data streaming processing paradigm
• Challenges and research questions
• Conclusions
• Bibliography
Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 9
Main Memory
Motivation
DBMS vs. DSMS
Disk
1 Data
Query Processing
3 Query
results
2 Query
Main Memory
Query Processing
Continuous
Query
Data
Query
results
10The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
Before we start... about data streaming and Stream Processing Engines (SPEs)
11
An incomplete list of SPEs (cf. related work in [16]):
time
Borealis
The Aurora Project
STanfordstREamdatAManager
NiagaraCQ
COUGAR
StreamCloud
Covering all of them / discussing which use cases are best for each one out of scope...
the following show connection between what is being presented and a certain SPE
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
data stream: unbounded sequence of tuples sharing the same schema
12
Example: vehicles’ speed reports
time
Field Field
vehicle id text
time (secs) text
speed (Km/h) double
X coordinate double
Y coordinate double
A 8:00 55.5 X1 Y1
Let’s assume each source
(e.g., vehicle) produces
and delivers a timestamp
sorted stream
A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
continuous query (or simply query): Directed Acyclic Graph (DAG) of
streams and operators
13
OP
OP
OP
OP OP
OP OP
source op
(1+ out streams)
sink op
(1+ in streams)
stream
op
(1+ in, 1+ out streams)
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
data streaming operators
Two main types:
• Stateless operators
• do not maintain any state
• one-by-one processing
• if they maintain some state, such state does not evolve depending
on the tuples being processed
• Stateful operators
• maintain a state that evolves depending on the tuples being
processed
• produce output tuples that depend on multiple input tuples
14
OP
OP
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
stateless operators
15
Filter
...
Map
Union
...
Filter / route tuples based on one (or more) conditions
Transform each tuple
Merge multiple streams (with the same schema) into one
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
stateless operators
16
Filter
...
Map
Union
...
source: http://storm.apache.org/releases/2.0.0-SNAPSHOT/Trident-tutorial.html
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
stateless operators
17
Filter
...
Map
Union
...
source: https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/index.html
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
stateless operators
18
Filter
...
Map
Union
...
source: http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
stateful operators
19
Aggregate information from multiple tuples
(e.g., max, min, sum, ...)
Join tuples coming from 2 streams given a certain predicate
Aggregate
Join
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
stateful operators
20
source: http://storm.apache.org/releases/2.0.0-SNAPSHOT/Trident-tutorial.html
source: http://spark.apache.org/docs/latest/streaming-programming-
guide.html#transformations-on-dstreams
source: http://spark.apache.org/docs/latest/streaming-programming-
guide.html#transformations-on-dstreams
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
Wait a moment!
if streams are unbounded, how can we aggregate or join?
21The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
windows and stateful analysis [16]
Stateful operations are done over windows:
• Time-based (e.g., tuples in the last 10 minutes)
• Tuple-based (e.g., given the last 50 tuples)
22
time
[8:00,9:00)
[8:20,9:20)
[8:40,9:40)
Usually applications rely on time-based sliding windows
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
time-based sliding window aggregation (count)
23
Counter: 4
time
[8:00,9:00)
8:05 8:15 8:22 8:45 9:05
Output: 4
Counter: 1
Counter: 2
Counter: 3
Counter: 3
time
8:05 8:15 8:22 8:45 9:05
[8:20,9:20)
we assumed each source
produces and delivers a
timestamp sorted stream!
What happens if this is not
the case?
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
24
windows and stateful analysis
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
sample query
For each vehicle, raise an alert if the speed of the latest report is more
than 2 times higher than its average speed in the last 30 days.
26
time
A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
27
Field
vehicle id
time (secs)
speed (Km/h)
X coordinate
Y coordinate
Compute average
speed for each
vehicle during the
last 30 days
Aggregate
Field
vehicle id
time (secs)
avg speed (Km/h)
Join
Check
condition
Filter
Field
vehicle id
time (secs)
speed (Km/h)
Join on
vehicle id
Field
vehicle id
time (secs)
avg speed (Km/h)
speed (Km/h)
sample query
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
Why data streaming, then?
Expressive
Online
Parallel &
Distributed
Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 29
Expressive
Online
Parallel &
Distributed
A F M
A F M
A F M
A F M
F
M
A
A
A
F
F
Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 30
Sample query that
e.g. validates data /
raises alarms...
Agenda
• Motivation
• The data streaming processing paradigm
• Challenges and research questions
• Conclusions
• Bibliography
Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 31
1. Distributed deployment
2. Parallel deployment
3. Ordering and determinism
4. Shared-nothing vs shared-memory parallelism
5. Load balancing
6. Elasticity
7. Fault tolerance
8. Data sharing
32
Challenges and research questions
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
Before we start...
33
Following examples are from vehicular networks
Road-side unit
RSU
Vehicle
Server
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
1 - Distributed deployment – where to place a given operator? [17,4,18,19]
34
M
?
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
2 - Parallel deployment – how do we parallelize the analysis? [20,21]
35
M
M A
A
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
36
M
M A
A
A 8:00 55.5 X1 Y1
A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
A 8:00 55.5 X1 Y1
A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
What if tuple with
timestamp 8:00
arrives after tuple
with timestamp 8:07?
3 – Ordering and determinism [22,23,24]
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
4 – shared-nothing vs. shared-memory parallelism [25]
37
M
M
...
A
A
J
...
How to take advantage of multi-core
architectures?
How to boost inter-node parallelism
and intra-node parallelism?
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
5 – load balancing & state transfer [20,26]
38
If we shift the processing of a certain subset of
tuples from node A to node B, how do transfer
its previous state?
M
A
M
A
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
6 – elasticity [20,27]
39
How / when to provision or
decommission new resources depending
on the analysis’ costs fluctuations?
J
The data streaming paradigm and its use in Fog architectures
J J
J
Vincenzo Gulisano
7 – fault tolerance [16, 28, 29]
40
How to replace a failed node minimizing
recovery time (making it transparent to
the end user)?
The data streaming paradigm and its use in Fog architectures
J
J
J
J
J
J
Vincenzo Gulisano
8 – data sharing (differential privacy) [2,30,31,32]
41
How to prevent privacy leaks?
Suppose we are interested in publishing vehicles’
average speed over a window of one hour...
We could aggregate by RSU!
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
8 – data sharing (differential privacy)
42
Wait a moment!
what if a single vehicle is
connected to a certain RSU?
Whether a certain mechanism preserves or not the privacy
of the underlying data depends on the knowledge of the
adversary
Differential privacy assumes the worst case scenario!
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
Agenda
• Motivation
• The data streaming processing paradigm
• Challenges and research questions
• Conclusions
• Bibliography
Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 43
Computers and humans evolution are not that different...
Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 44
Millions of years
of evolution
Millions of
sensors
• Store information
• Iterate multiple times over data
• Think, do not rush through decisions
• ”Hard-wired” routines
• Real-time decisions
• High-throughput / low-latency
Should I (really)
have an extra
piece of cake?
Danger!!!
Run!!!
Humans
Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 45
Years / Decades
of evolution
Millions of
sensors
What traffic
congestion
patterns can I
observe
frequently?
Don’t take
over, car in
opposite lane!
• Store information
• Iterate multiple times over data
• Think, do not rush through decisions
Databases, data mining
techniques...
Data streaming, distributed
and parallel analysis
• Continuous analysis
• Real-time decisions
• High-throughput / low-latency
Computers
(cyber-physical / IoT systems)
Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 46
Agenda
• Motivation
• The data streaming processing paradigm
• Challenges and research questions
• Conclusions
• Bibliography
Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 47
Bibliography
1. Zhou, Jiazhen, Rose Qingyang Hu, and Yi Qian. "Scalable distributed communication architectures to support advanced
metering infrastructure in smart grid." IEEE Transactions on Parallel and Distributed Systems 23.9 (2012): 1632-1642.
2. Gulisano, Vincenzo, et al. "BES: Differentially Private and Distributed Event Aggregation in Advanced Metering Infrastructures."
Proceedings of the 2nd ACM International Workshop on Cyber-Physical System Security. ACM, 2016.
3. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "Online and scalable data validation in advanced metering
infrastructures." IEEE PES Innovative Smart Grid Technologies, Europe. IEEE, 2014.
4. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "METIS: a two-tier intrusion detection system for advanced
metering infrastructures." International Conference on Security and Privacy in Communication Systems. Springer International
Publishing, 2014.
5. Yousefi, Saleh, Mahmoud Siadat Mousavi, and Mahmood Fathy. "Vehicular ad hoc networks (VANETs): challenges and
perspectives." 2006 6th International Conference on ITS Telecommunications. IEEE, 2006.
6. El Zarki, Magda, et al. "Security issues in a future vehicular network." European Wireless. Vol. 2. 2002.
7. Georgiadis, Giorgos, and Marina Papatriantafilou. "Dealing with storage without forecasts in smart grids: Problem
transformation and online scheduling algorithm." Proceedings of the 29th Annual ACM Symposium on Applied Computing.
ACM, 2014.
8. Fu, Zhang, et al. "Online temporal-spatial analysis for detection of critical events in Cyber-Physical Systems." Big Data (Big
Data), 2014 IEEE International Conference on. IEEE, 2014.
The data streaming paradigm and its use in Fog architectures 48Vincenzo Gulisano
Bibliography
9. Arasu, Arvind, et al. "Linear road: a stream data management benchmark." Proceedings of the
Thirtieth international conference on Very large data bases-Volume 30. VLDB Endowment,
2004.
10. Lv, Yisheng, et al. "Traffic flow prediction with big data: a deep learning approach." IEEE
Transactions on Intelligent Transportation Systems 16.2 (2015): 865-873.
11. Grochocki, David, et al. "AMI threats, intrusion detection requirements and deployment
recommendations." Smart Grid Communications (SmartGridComm), 2012 IEEE Third
International Conference on. IEEE, 2012.
12. Molina-Markham, Andrés, et al. "Private memoirs of a smart meter." Proceedings of the 2nd
ACM workshop on embedded sensing systems for energy-efficiency in building. ACM, 2010.
13. Gulisano, Vincenzo, et al. "Streamcloud: A large scale data streaming system." Distributed
Computing Systems (ICDCS), 2010 IEEE 30th International Conference on. IEEE, 2010.
14. Stonebraker, Michael, Uǧur Çetintemel, and Stan Zdonik. "The 8 requirements of real-time
stream processing." ACM SIGMOD Record 34.4 (2005): 42-47.
15. Bonomi, Flavio, et al. "Fog computing and its role in the internet of things." Proceedings of the
first edition of the MCC workshop on Mobile cloud computing. ACM, 2012.
The data streaming paradigm and its use in Fog architectures 49Vincenzo Gulisano
Bibliography
16. Gulisano, Vincenzo Massimiliano. StreamCloud: An Elastic Parallel-Distributed Stream
Processing Engine. Diss. Informatica, 2012.
17. Cardellini, Valeria, et al. "Optimal operator placement for distributed stream processing
applications." Proceedings of the 10th ACM International Conference on Distributed and Event-
based Systems. ACM, 2016.
18. Costache, Stefania, et al. "Understanding the Data-Processing Challenges in Intelligent
Vehicular Systems." Proceedings of the 2016 IEEE Intelligent Vehicles Symposium (IV16).
19. Giatrakos, Nikos, Antonios Deligiannakis, and Minos Garofalakis. "Scalable Approximate Query
Tracking over Highly Distributed Data Streams." Proceedings of the 2016 International
Conference on Management of Data. ACM, 2016.
20. Gulisano, Vincenzo, et al. "Streamcloud: An elastic and scalable data streaming system." IEEE
Transactions on Parallel and Distributed Systems 23.12 (2012): 2351-2365.
21. Shah, Mehul A., et al. "Flux: An adaptive partitioning operator for continuous query systems."
Data Engineering, 2003. Proceedings. 19th International Conference on. IEEE, 2003.
The data streaming paradigm and its use in Fog architectures 50Vincenzo Gulisano
Bibliography
22. Cederman, Daniel, et al. "Brief announcement: concurrent data structures for efficient streaming aggregation." Proceedings of
the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, 2014.
23. Ji, Yuanzhen, et al. "Quality-driven processing of sliding window aggregates over out-of-order data streams." Proceedings of
the 9th ACM International Conference on Distributed Event-Based Systems. ACM, 2015.
24. Ji, Yuanzhen, et al. "Quality-driven disorder handling for concurrent windowed stream queries with shared operators."
Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems. ACM, 2016.
25. Gulisano, Vincenzo, et al. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." Big Data (Big Data), 2015
IEEE International Conference on. IEEE, 2015.
26. Ottenwälder, Beate, et al. "MigCEP: operator migration for mobility driven distributed complex event processing." Proceedings
of the 7th ACM international conference on Distributed event-based systems. ACM, 2013.
27. De Matteis, Tiziano, and Gabriele Mencagli. "Keep calm and react with foresight: strategies for low-latency and energy-efficient
elastic data stream processing." Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel
Programming. ACM, 2016.
28. Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." ACM Transactions on
Database Systems (TODS) 33.1 (2008): 3.
29. Castro Fernandez, Raul, et al. "Integrating scale out and fault tolerance in stream processing using operator state
management." Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, 2013.
The data streaming paradigm and its use in Fog architectures 51Vincenzo Gulisano
Bibliography
30. Dwork, Cynthia. "Differential privacy: A survey of results." International Conference on Theory and Applications of
Models of Computation. Springer Berlin Heidelberg, 2008.
31. Dwork, Cynthia, et al. "Differential privacy under continual observation." Proceedings of the forty-second ACM
symposium on Theory of computing. ACM, 2010.
32. Kargl, Frank, Arik Friedman, and Roksana Boreli. "Differential privacy in intelligent transportation systems." Proceedings
of the sixth ACM conference on Security and privacy in wireless and mobile networks. ACM, 2013.
The data streaming paradigm and its use in Fog architectures 52Vincenzo Gulisano
Thank you! Questions?
53The data streaming paradigm and its use in Fog architectures

More Related Content

What's hot

IoT ( M2M) - Big Data - Analytics: Emulation and Demonstration
IoT ( M2M) - Big Data - Analytics: Emulation and DemonstrationIoT ( M2M) - Big Data - Analytics: Emulation and Demonstration
IoT ( M2M) - Big Data - Analytics: Emulation and DemonstrationCHAKER ALLAOUI
 
MongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big DataMongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big DataStefano Dindo
 
FIWARE Global Summit - Big Data and Machine Learning with FIWARE
FIWARE Global Summit - Big Data and Machine Learning with FIWAREFIWARE Global Summit - Big Data and Machine Learning with FIWARE
FIWARE Global Summit - Big Data and Machine Learning with FIWAREFIWARE
 
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...Dataconomy Media
 
Introduction to edge analytics- Intelligent IoT
Introduction to edge analytics- Intelligent IoTIntroduction to edge analytics- Intelligent IoT
Introduction to edge analytics- Intelligent IoTShreya Mukhopadhyay
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analyticsAhmed Banafa
 
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...Hong-Linh Truong
 
Building the Smart City Platform on FIWARE Lab
Building the Smart City Platform on FIWARE LabBuilding the Smart City Platform on FIWARE Lab
Building the Smart City Platform on FIWARE LabFernando Lopez Aguilar
 
Momentum in Big Data, IoT and Machine Intelligence
Momentum in Big Data, IoT and Machine IntelligenceMomentum in Big Data, IoT and Machine Intelligence
Momentum in Big Data, IoT and Machine IntelligenceShamshad Ansari
 
Apache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsApache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsMuralidhar Somisetty
 
Making Smarter Systems with IoT and Analytics
Making Smarter Systems with IoT and AnalyticsMaking Smarter Systems with IoT and Analytics
Making Smarter Systems with IoT and AnalyticsWSO2
 
Green Compute and Storage - Why does it Matter and What is in Scope
Green Compute and Storage - Why does it Matter and What is in ScopeGreen Compute and Storage - Why does it Matter and What is in Scope
Green Compute and Storage - Why does it Matter and What is in ScopeNarayanan Subramaniam
 
Xanadu Based Big Data Deep Learning for Medical Data Analysis
Xanadu Based Big Data Deep Learning for Medical Data AnalysisXanadu Based Big Data Deep Learning for Medical Data Analysis
Xanadu Based Big Data Deep Learning for Medical Data AnalysisAlex G. Lee, Ph.D. Esq. CLP
 
Xanadu for Big Data + Deep Learning + Cloud + IoT Integration Strategy
Xanadu for Big Data + Deep Learning + Cloud + IoT Integration StrategyXanadu for Big Data + Deep Learning + Cloud + IoT Integration Strategy
Xanadu for Big Data + Deep Learning + Cloud + IoT Integration StrategyAlex G. Lee, Ph.D. Esq. CLP
 
Edge Intelligence: The Convergence of Humans, Things and AI
Edge Intelligence: The Convergence of Humans, Things and AIEdge Intelligence: The Convergence of Humans, Things and AI
Edge Intelligence: The Convergence of Humans, Things and AIThomas Rausch
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh BellurSTS FORUM 2016
 
Solving the weak spots of serverless with directed acyclic graph model
Solving the weak spots of serverless with directed acyclic graph modelSolving the weak spots of serverless with directed acyclic graph model
Solving the weak spots of serverless with directed acyclic graph modelVeselin Pizurica
 
Industry of Things World - Berlin 19-09-16
Industry of Things World - Berlin 19-09-16Industry of Things World - Berlin 19-09-16
Industry of Things World - Berlin 19-09-16Boris Adryan
 

What's hot (20)

IoT ( M2M) - Big Data - Analytics: Emulation and Demonstration
IoT ( M2M) - Big Data - Analytics: Emulation and DemonstrationIoT ( M2M) - Big Data - Analytics: Emulation and Demonstration
IoT ( M2M) - Big Data - Analytics: Emulation and Demonstration
 
MongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big DataMongoDB Solution for Internet of Things and Big Data
MongoDB Solution for Internet of Things and Big Data
 
FIWARE Global Summit - Big Data and Machine Learning with FIWARE
FIWARE Global Summit - Big Data and Machine Learning with FIWAREFIWARE Global Summit - Big Data and Machine Learning with FIWARE
FIWARE Global Summit - Big Data and Machine Learning with FIWARE
 
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...Stephen Cantrell, kdb+ Developer at Kx Systems  “Kdb+: How Wall Street Tech c...
Stephen Cantrell, kdb+ Developer at Kx Systems “Kdb+: How Wall Street Tech c...
 
Introduction to edge analytics- Intelligent IoT
Introduction to edge analytics- Intelligent IoTIntroduction to edge analytics- Intelligent IoT
Introduction to edge analytics- Intelligent IoT
 
The future of big data analytics
The future of big data analyticsThe future of big data analytics
The future of big data analytics
 
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
Emerging Dynamic TUW-ASE Summer 2015 - Distributed Systems and Challenges for...
 
Building the Smart City Platform on FIWARE Lab
Building the Smart City Platform on FIWARE LabBuilding the Smart City Platform on FIWARE Lab
Building the Smart City Platform on FIWARE Lab
 
Momentum in Big Data, IoT and Machine Intelligence
Momentum in Big Data, IoT and Machine IntelligenceMomentum in Big Data, IoT and Machine Intelligence
Momentum in Big Data, IoT and Machine Intelligence
 
Apache Spark and future of advanced analytics
Apache Spark and future of advanced analyticsApache Spark and future of advanced analytics
Apache Spark and future of advanced analytics
 
Making Smarter Systems with IoT and Analytics
Making Smarter Systems with IoT and AnalyticsMaking Smarter Systems with IoT and Analytics
Making Smarter Systems with IoT and Analytics
 
Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?Big Data and Fast Data combined – is it possible?
Big Data and Fast Data combined – is it possible?
 
Green Compute and Storage - Why does it Matter and What is in Scope
Green Compute and Storage - Why does it Matter and What is in ScopeGreen Compute and Storage - Why does it Matter and What is in Scope
Green Compute and Storage - Why does it Matter and What is in Scope
 
Xanadu Based Big Data Deep Learning for Medical Data Analysis
Xanadu Based Big Data Deep Learning for Medical Data AnalysisXanadu Based Big Data Deep Learning for Medical Data Analysis
Xanadu Based Big Data Deep Learning for Medical Data Analysis
 
Xanadu for Big Data + Deep Learning + Cloud + IoT Integration Strategy
Xanadu for Big Data + Deep Learning + Cloud + IoT Integration StrategyXanadu for Big Data + Deep Learning + Cloud + IoT Integration Strategy
Xanadu for Big Data + Deep Learning + Cloud + IoT Integration Strategy
 
Edge Intelligence: The Convergence of Humans, Things and AI
Edge Intelligence: The Convergence of Humans, Things and AIEdge Intelligence: The Convergence of Humans, Things and AI
Edge Intelligence: The Convergence of Humans, Things and AI
 
Data Activities in Austria
Data Activities in AustriaData Activities in Austria
Data Activities in Austria
 
Big Data - Umesh Bellur
Big Data - Umesh BellurBig Data - Umesh Bellur
Big Data - Umesh Bellur
 
Solving the weak spots of serverless with directed acyclic graph model
Solving the weak spots of serverless with directed acyclic graph modelSolving the weak spots of serverless with directed acyclic graph model
Solving the weak spots of serverless with directed acyclic graph model
 
Industry of Things World - Berlin 19-09-16
Industry of Things World - Berlin 19-09-16Industry of Things World - Berlin 19-09-16
Industry of Things World - Berlin 19-09-16
 

Similar to Data Streaming in IoT and Big Data Analytics

Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisVincenzo Gulisano
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsVincenzo Gulisano
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesVincenzo Gulisano
 
User-Driven Cloud Transportation System for Smart Driving
User-Driven Cloud Transportation System for Smart DrivingUser-Driven Cloud Transportation System for Smart Driving
User-Driven Cloud Transportation System for Smart Drivingamg93
 
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure DecentralizationFrom Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure DecentralizationFogGuru MSCA Project
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataStavros Kontopoulos
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thessaloniki
 
Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...
Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...
Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...IsabellaFilippo
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the artStavros Kontopoulos
 
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theoryQCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theoryAysylu Greenberg
 
Review of implementing fog computing
Review of implementing fog computingReview of implementing fog computing
Review of implementing fog computingeSAT Journals
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Boris Adryan
 
slides_itc30_2018_Morichetta_v2.pdf
slides_itc30_2018_Morichetta_v2.pdfslides_itc30_2018_Morichetta_v2.pdf
slides_itc30_2018_Morichetta_v2.pdfAndrea Morichetta
 
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...IRJET Journal
 
The data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architecturesThe data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
 
Distributed systems in practice, in theory (JAX London)
Distributed systems in practice, in theory (JAX London)Distributed systems in practice, in theory (JAX London)
Distributed systems in practice, in theory (JAX London)Aysylu Greenberg
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
 
Study of Internet Traffic to Analyze and Predict Traffic
Study of Internet Traffic to Analyze and Predict TrafficStudy of Internet Traffic to Analyze and Predict Traffic
Study of Internet Traffic to Analyze and Predict TrafficAmit Arora
 
IRJET- Traffic Prediction Techniques: Comprehensive analysis
IRJET- Traffic Prediction Techniques: Comprehensive analysisIRJET- Traffic Prediction Techniques: Comprehensive analysis
IRJET- Traffic Prediction Techniques: Comprehensive analysisIRJET Journal
 

Similar to Data Streaming in IoT and Big Data Analytics (20)

Data Streaming in Big Data Analysis
Data Streaming in Big Data AnalysisData Streaming in Big Data Analysis
Data Streaming in Big Data Analysis
 
Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
 
User-Driven Cloud Transportation System for Smart Driving
User-Driven Cloud Transportation System for Smart DrivingUser-Driven Cloud Transportation System for Smart Driving
User-Driven Cloud Transportation System for Smart Driving
 
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure DecentralizationFrom Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
 
Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...
Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...
Extended summary of "Cloudy with a chance of short RTTs Analyzing Cloud Conne...
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
QCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theoryQCon NYC: Distributed systems in practice, in theory
QCon NYC: Distributed systems in practice, in theory
 
Review of implementing fog computing
Review of implementing fog computingReview of implementing fog computing
Review of implementing fog computing
 
Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017Zühlke Meetup - Mai 2017
Zühlke Meetup - Mai 2017
 
slides_itc30_2018_Morichetta_v2.pdf
slides_itc30_2018_Morichetta_v2.pdfslides_itc30_2018_Morichetta_v2.pdf
slides_itc30_2018_Morichetta_v2.pdf
 
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...
Smart Traffic Congestion Control System: Leveraging Machine Learning for Urba...
 
The data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architecturesThe data streaming paradigm and its use in Fog architectures
The data streaming paradigm and its use in Fog architectures
 
Distributed systems in practice, in theory (JAX London)
Distributed systems in practice, in theory (JAX London)Distributed systems in practice, in theory (JAX London)
Distributed systems in practice, in theory (JAX London)
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
Study of Internet Traffic to Analyze and Predict Traffic
Study of Internet Traffic to Analyze and Predict TrafficStudy of Internet Traffic to Analyze and Predict Traffic
Study of Internet Traffic to Analyze and Predict Traffic
 
SSG4Env EGU2010
SSG4Env EGU2010SSG4Env EGU2010
SSG4Env EGU2010
 
IRJET- Traffic Prediction Techniques: Comprehensive analysis
IRJET- Traffic Prediction Techniques: Comprehensive analysisIRJET- Traffic Prediction Techniques: Comprehensive analysis
IRJET- Traffic Prediction Techniques: Comprehensive analysis
 

Recently uploaded

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 

Recently uploaded (20)

9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 

Data Streaming in IoT and Big Data Analytics

  • 1. Data Streaming in IoT and Big Data Analytics Vincenzo Gulisano, Ph.D.
  • 2. Agenda • Motivation • The data streaming processing paradigm • Challenges and research questions • Conclusions • Bibliography Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 3
  • 3. Agenda • Motivation • The data streaming processing paradigm • Challenges and research questions • Conclusions • Bibliography Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 4
  • 4. Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 5 IoT enables for increased awareness, security, power-efficiency, ... large IoT systems are complex traditional data analysis techniques alone are not adequate!
  • 5. AMIs [1,2,3,4] VNs [5,6] • demand-response • scheduling [7] • micro-grids • detection of medium size blackouts [8] • detection of non technical losses • ... 6 • autonomous driving • platooning • accident detection [9] • variable tolls [9] • congestion monitoring [10] • ... IoT enables for increased awareness, security, power-efficiency, ... The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 6. AMIs VNs 7 large IoT systems are complex The data streaming paradigm and its use in Fog architectures Characteristics [15]: 1. edge location 2. location awareness 3. low latency 4. geographical distribution 5. large-scale 6. support for mobility 7. real-time interactions 8. predominance of wireless 9. heterogeneous 10. interoperability / federation 11. interaction with the cloud Vincenzo Gulisano
  • 7. 8 traditional data analysis techniques alone are not adequate! [13,14] The data streaming paradigm and its use in Fog architectures 1. does the infrastructure allow for billions of readings per day to be transferred continuously? 2. the latency incurred while transferring data, does that undermine the utility of the analysis? 3. is it secure to concentrate all the data in a single place? [11] 4. is it smart to give away fine-grained data? [12] Vincenzo Gulisano
  • 8. Agenda • Motivation • The data streaming processing paradigm • Challenges and research questions • Conclusions • Bibliography Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 9
  • 9. Main Memory Motivation DBMS vs. DSMS Disk 1 Data Query Processing 3 Query results 2 Query Main Memory Query Processing Continuous Query Data Query results 10The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 10. Before we start... about data streaming and Stream Processing Engines (SPEs) 11 An incomplete list of SPEs (cf. related work in [16]): time Borealis The Aurora Project STanfordstREamdatAManager NiagaraCQ COUGAR StreamCloud Covering all of them / discussing which use cases are best for each one out of scope... the following show connection between what is being presented and a certain SPE The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 11. data stream: unbounded sequence of tuples sharing the same schema 12 Example: vehicles’ speed reports time Field Field vehicle id text time (secs) text speed (Km/h) double X coordinate double Y coordinate double A 8:00 55.5 X1 Y1 Let’s assume each source (e.g., vehicle) produces and delivers a timestamp sorted stream A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2 The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 12. continuous query (or simply query): Directed Acyclic Graph (DAG) of streams and operators 13 OP OP OP OP OP OP OP source op (1+ out streams) sink op (1+ in streams) stream op (1+ in, 1+ out streams) The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 13. data streaming operators Two main types: • Stateless operators • do not maintain any state • one-by-one processing • if they maintain some state, such state does not evolve depending on the tuples being processed • Stateful operators • maintain a state that evolves depending on the tuples being processed • produce output tuples that depend on multiple input tuples 14 OP OP The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 14. stateless operators 15 Filter ... Map Union ... Filter / route tuples based on one (or more) conditions Transform each tuple Merge multiple streams (with the same schema) into one The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 18. stateful operators 19 Aggregate information from multiple tuples (e.g., max, min, sum, ...) Join tuples coming from 2 streams given a certain predicate Aggregate Join The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 19. stateful operators 20 source: http://storm.apache.org/releases/2.0.0-SNAPSHOT/Trident-tutorial.html source: http://spark.apache.org/docs/latest/streaming-programming- guide.html#transformations-on-dstreams source: http://spark.apache.org/docs/latest/streaming-programming- guide.html#transformations-on-dstreams The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 20. Wait a moment! if streams are unbounded, how can we aggregate or join? 21The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 21. windows and stateful analysis [16] Stateful operations are done over windows: • Time-based (e.g., tuples in the last 10 minutes) • Tuple-based (e.g., given the last 50 tuples) 22 time [8:00,9:00) [8:20,9:20) [8:40,9:40) Usually applications rely on time-based sliding windows The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 22. time-based sliding window aggregation (count) 23 Counter: 4 time [8:00,9:00) 8:05 8:15 8:22 8:45 9:05 Output: 4 Counter: 1 Counter: 2 Counter: 3 Counter: 3 time 8:05 8:15 8:22 8:45 9:05 [8:20,9:20) we assumed each source produces and delivers a timestamp sorted stream! What happens if this is not the case? The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 23. 24 windows and stateful analysis The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 24. sample query For each vehicle, raise an alert if the speed of the latest report is more than 2 times higher than its average speed in the last 30 days. 26 time A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2 The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 25. 27 Field vehicle id time (secs) speed (Km/h) X coordinate Y coordinate Compute average speed for each vehicle during the last 30 days Aggregate Field vehicle id time (secs) avg speed (Km/h) Join Check condition Filter Field vehicle id time (secs) speed (Km/h) Join on vehicle id Field vehicle id time (secs) avg speed (Km/h) speed (Km/h) sample query The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 26. Why data streaming, then? Expressive Online Parallel & Distributed Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 29
  • 27. Expressive Online Parallel & Distributed A F M A F M A F M A F M F M A A A F F Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 30 Sample query that e.g. validates data / raises alarms...
  • 28. Agenda • Motivation • The data streaming processing paradigm • Challenges and research questions • Conclusions • Bibliography Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 31
  • 29. 1. Distributed deployment 2. Parallel deployment 3. Ordering and determinism 4. Shared-nothing vs shared-memory parallelism 5. Load balancing 6. Elasticity 7. Fault tolerance 8. Data sharing 32 Challenges and research questions The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 30. Before we start... 33 Following examples are from vehicular networks Road-side unit RSU Vehicle Server The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 31. 1 - Distributed deployment – where to place a given operator? [17,4,18,19] 34 M ? The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 32. 2 - Parallel deployment – how do we parallelize the analysis? [20,21] 35 M M A A The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 33. 36 M M A A A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2 A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3 A 8:03 70.3 X2 Y2 What if tuple with timestamp 8:00 arrives after tuple with timestamp 8:07? 3 – Ordering and determinism [22,23,24] The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 34. 4 – shared-nothing vs. shared-memory parallelism [25] 37 M M ... A A J ... How to take advantage of multi-core architectures? How to boost inter-node parallelism and intra-node parallelism? The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 35. 5 – load balancing & state transfer [20,26] 38 If we shift the processing of a certain subset of tuples from node A to node B, how do transfer its previous state? M A M A The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 36. 6 – elasticity [20,27] 39 How / when to provision or decommission new resources depending on the analysis’ costs fluctuations? J The data streaming paradigm and its use in Fog architectures J J J Vincenzo Gulisano
  • 37. 7 – fault tolerance [16, 28, 29] 40 How to replace a failed node minimizing recovery time (making it transparent to the end user)? The data streaming paradigm and its use in Fog architectures J J J J J J Vincenzo Gulisano
  • 38. 8 – data sharing (differential privacy) [2,30,31,32] 41 How to prevent privacy leaks? Suppose we are interested in publishing vehicles’ average speed over a window of one hour... We could aggregate by RSU! The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 39. 8 – data sharing (differential privacy) 42 Wait a moment! what if a single vehicle is connected to a certain RSU? Whether a certain mechanism preserves or not the privacy of the underlying data depends on the knowledge of the adversary Differential privacy assumes the worst case scenario! The data streaming paradigm and its use in Fog architecturesVincenzo Gulisano
  • 40. Agenda • Motivation • The data streaming processing paradigm • Challenges and research questions • Conclusions • Bibliography Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 43
  • 41. Computers and humans evolution are not that different... Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 44
  • 42. Millions of years of evolution Millions of sensors • Store information • Iterate multiple times over data • Think, do not rush through decisions • ”Hard-wired” routines • Real-time decisions • High-throughput / low-latency Should I (really) have an extra piece of cake? Danger!!! Run!!! Humans Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 45
  • 43. Years / Decades of evolution Millions of sensors What traffic congestion patterns can I observe frequently? Don’t take over, car in opposite lane! • Store information • Iterate multiple times over data • Think, do not rush through decisions Databases, data mining techniques... Data streaming, distributed and parallel analysis • Continuous analysis • Real-time decisions • High-throughput / low-latency Computers (cyber-physical / IoT systems) Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 46
  • 44. Agenda • Motivation • The data streaming processing paradigm • Challenges and research questions • Conclusions • Bibliography Vincenzo Gulisano The data streaming paradigm and its use in Fog architectures 47
  • 45. Bibliography 1. Zhou, Jiazhen, Rose Qingyang Hu, and Yi Qian. "Scalable distributed communication architectures to support advanced metering infrastructure in smart grid." IEEE Transactions on Parallel and Distributed Systems 23.9 (2012): 1632-1642. 2. Gulisano, Vincenzo, et al. "BES: Differentially Private and Distributed Event Aggregation in Advanced Metering Infrastructures." Proceedings of the 2nd ACM International Workshop on Cyber-Physical System Security. ACM, 2016. 3. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "Online and scalable data validation in advanced metering infrastructures." IEEE PES Innovative Smart Grid Technologies, Europe. IEEE, 2014. 4. Gulisano, Vincenzo, Magnus Almgren, and Marina Papatriantafilou. "METIS: a two-tier intrusion detection system for advanced metering infrastructures." International Conference on Security and Privacy in Communication Systems. Springer International Publishing, 2014. 5. Yousefi, Saleh, Mahmoud Siadat Mousavi, and Mahmood Fathy. "Vehicular ad hoc networks (VANETs): challenges and perspectives." 2006 6th International Conference on ITS Telecommunications. IEEE, 2006. 6. El Zarki, Magda, et al. "Security issues in a future vehicular network." European Wireless. Vol. 2. 2002. 7. Georgiadis, Giorgos, and Marina Papatriantafilou. "Dealing with storage without forecasts in smart grids: Problem transformation and online scheduling algorithm." Proceedings of the 29th Annual ACM Symposium on Applied Computing. ACM, 2014. 8. Fu, Zhang, et al. "Online temporal-spatial analysis for detection of critical events in Cyber-Physical Systems." Big Data (Big Data), 2014 IEEE International Conference on. IEEE, 2014. The data streaming paradigm and its use in Fog architectures 48Vincenzo Gulisano
  • 46. Bibliography 9. Arasu, Arvind, et al. "Linear road: a stream data management benchmark." Proceedings of the Thirtieth international conference on Very large data bases-Volume 30. VLDB Endowment, 2004. 10. Lv, Yisheng, et al. "Traffic flow prediction with big data: a deep learning approach." IEEE Transactions on Intelligent Transportation Systems 16.2 (2015): 865-873. 11. Grochocki, David, et al. "AMI threats, intrusion detection requirements and deployment recommendations." Smart Grid Communications (SmartGridComm), 2012 IEEE Third International Conference on. IEEE, 2012. 12. Molina-Markham, Andrés, et al. "Private memoirs of a smart meter." Proceedings of the 2nd ACM workshop on embedded sensing systems for energy-efficiency in building. ACM, 2010. 13. Gulisano, Vincenzo, et al. "Streamcloud: A large scale data streaming system." Distributed Computing Systems (ICDCS), 2010 IEEE 30th International Conference on. IEEE, 2010. 14. Stonebraker, Michael, Uǧur Çetintemel, and Stan Zdonik. "The 8 requirements of real-time stream processing." ACM SIGMOD Record 34.4 (2005): 42-47. 15. Bonomi, Flavio, et al. "Fog computing and its role in the internet of things." Proceedings of the first edition of the MCC workshop on Mobile cloud computing. ACM, 2012. The data streaming paradigm and its use in Fog architectures 49Vincenzo Gulisano
  • 47. Bibliography 16. Gulisano, Vincenzo Massimiliano. StreamCloud: An Elastic Parallel-Distributed Stream Processing Engine. Diss. Informatica, 2012. 17. Cardellini, Valeria, et al. "Optimal operator placement for distributed stream processing applications." Proceedings of the 10th ACM International Conference on Distributed and Event- based Systems. ACM, 2016. 18. Costache, Stefania, et al. "Understanding the Data-Processing Challenges in Intelligent Vehicular Systems." Proceedings of the 2016 IEEE Intelligent Vehicles Symposium (IV16). 19. Giatrakos, Nikos, Antonios Deligiannakis, and Minos Garofalakis. "Scalable Approximate Query Tracking over Highly Distributed Data Streams." Proceedings of the 2016 International Conference on Management of Data. ACM, 2016. 20. Gulisano, Vincenzo, et al. "Streamcloud: An elastic and scalable data streaming system." IEEE Transactions on Parallel and Distributed Systems 23.12 (2012): 2351-2365. 21. Shah, Mehul A., et al. "Flux: An adaptive partitioning operator for continuous query systems." Data Engineering, 2003. Proceedings. 19th International Conference on. IEEE, 2003. The data streaming paradigm and its use in Fog architectures 50Vincenzo Gulisano
  • 48. Bibliography 22. Cederman, Daniel, et al. "Brief announcement: concurrent data structures for efficient streaming aggregation." Proceedings of the 26th ACM symposium on Parallelism in algorithms and architectures. ACM, 2014. 23. Ji, Yuanzhen, et al. "Quality-driven processing of sliding window aggregates over out-of-order data streams." Proceedings of the 9th ACM International Conference on Distributed Event-Based Systems. ACM, 2015. 24. Ji, Yuanzhen, et al. "Quality-driven disorder handling for concurrent windowed stream queries with shared operators." Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems. ACM, 2016. 25. Gulisano, Vincenzo, et al. "Scalejoin: A deterministic, disjoint-parallel and skew-resilient stream join." Big Data (Big Data), 2015 IEEE International Conference on. IEEE, 2015. 26. Ottenwälder, Beate, et al. "MigCEP: operator migration for mobility driven distributed complex event processing." Proceedings of the 7th ACM international conference on Distributed event-based systems. ACM, 2013. 27. De Matteis, Tiziano, and Gabriele Mencagli. "Keep calm and react with foresight: strategies for low-latency and energy-efficient elastic data stream processing." Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM, 2016. 28. Balazinska, Magdalena, et al. "Fault-tolerance in the Borealis distributed stream processing system." ACM Transactions on Database Systems (TODS) 33.1 (2008): 3. 29. Castro Fernandez, Raul, et al. "Integrating scale out and fault tolerance in stream processing using operator state management." Proceedings of the 2013 ACM SIGMOD international conference on Management of data. ACM, 2013. The data streaming paradigm and its use in Fog architectures 51Vincenzo Gulisano
  • 49. Bibliography 30. Dwork, Cynthia. "Differential privacy: A survey of results." International Conference on Theory and Applications of Models of Computation. Springer Berlin Heidelberg, 2008. 31. Dwork, Cynthia, et al. "Differential privacy under continual observation." Proceedings of the forty-second ACM symposium on Theory of computing. ACM, 2010. 32. Kargl, Frank, Arik Friedman, and Roksana Boreli. "Differential privacy in intelligent transportation systems." Proceedings of the sixth ACM conference on Security and privacy in wireless and mobile networks. ACM, 2013. The data streaming paradigm and its use in Fog architectures 52Vincenzo Gulisano
  • 50. Thank you! Questions? 53The data streaming paradigm and its use in Fog architectures

Editor's Notes

  1. A like to be hands-on...
  2. say these are some examples...
  3. say these are some examples...
  4. say these are some examples...
  5. About the ordering we will come back later
  6. Need to explain a bit what is the approach I am following here
  7. here mention latency encryption for security
  8. Present the deployment first Say what a parallel operator is!
  9. Say we focus on the join here, there are other operators too
  10. Mention elasticity goes together with load balancing
  11. Specify this is really an high level overview
  12. Here we could answer to the question: what if 1 is connected with ”we do not publish the result” and then ask again, what if 2 are connected (and I am one of the two, driving on purpose so that I get the average of the other)?
  13. maybe say the DB part is a bit of an oversimplifaction