Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
StreamSight - Query-Driven Descriptive Analytics for IoT and Edge Computing
1. Query-Driven Descriptive Analytics for IoT
and Edge Computing
Moysis Symeonides*, Demetris Trihinas✝, Zacharias Georgiou*,
George Pallis*, Marios D. Dikaiakos*
IEEE International Conference on Cloud Engineering (IC2E 2019)
*Department of Computer Science
University of Cyprus
✝Department of Computer Science
University of Nicosia
2. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
Distributed Data Processing Engines
2
2
● Frameworks like Hadoop and Spark are contributing to the democratization
of big data analytics by hiding the complexity related to:
○ Machine communication and resource management -> dealing with the
underlying infrastructure.
○ Task scheduling and supervision for analytic jobs.
○ Fault tolerance for both the infrastructure and execution state.
○ Monitoring and logging.
○ ...
3. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
● Transforming the physical world into an information system.
● 3.6 Billion IoT devices are being used daily1 with these devices projected
to generate 500 ZB of data2 by the end of the year (2019).
The Internet of Things
3
● It only seems “natural” that IoT services offload analytic jobs to the cloud
for data processing.
● But… IoT services usually come with near real-time requirements and
moving data “centrally” for processing penalizes analytics timeliness.
[1] Next big things in IoT predictions for 2020, ITPro, 2018
[2] Global Cloud Index, Cisco, 2018
Analytic Insights
IoT services
4. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
Edge Computing… Saving IoT Analytics
4
Cloud
Analytic Insights
IoT services
The “Edge”
● Data processing now possible in place -or within- local network.
○ Shorter response times for latency critical IoT services.
○ More efficient processing by offloading “centralized” components.
● Possible because hardware for mobile/fog/edge is scaling-up1.
● But… bandwidth and battery capacity NOT scaling at same rate2.
[1] EdgeIoT: Mobile Edge Computing for the Internet of Things, X. Sun et al, IEEE Communications, 2016.
[2] Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of Things, D. Trihinas et al., IEEE, Trans. on Services Computing, 2018.
5. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
IoT Analytics Over the Edge
5
Cloud
Analytic Insights
IoT services
The “Edge”
How to process enormous volumes of streaming data at
the edge to provide query-driven analytic insights while
also minimizing response times?
6. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 6
Query-Driven Analytics
Abstractions required for modelling knowledge extraction from data streams
Challenge 1: Expressing (ad-hoc) analytic queries
● One must have specific knowledge of the programming model of the
underlying processing engine.
...
...
Compute the average of
a metric using a 60s
sliding window
● Queries are bounded to the underlying processing engine (query portability).
7. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 7
Query-Driven Analytics
● A naive “edge” deployment can impose compute and communication
penalties for intermediate recomputations and data exchange.
Challenge 2: Geo-distributed deployments are the norm
for IoT services not the exception
dnR1 =
data exchange
and computation
R1R2 =
result exchange
...d1
dnR2 = d1
dnR1 = ...d1
Naive Deployment
...
Re-using intermediate results
...+ ...+
● Network bandwidth between geo-distributed entities is far from uniform.
Pixida: Optimizing data parallel jobs in wide-area data analytics , K. Kloudas, VLDB, 2015.
Mechanisms to avoid data movement and recomputations are needed
8. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
Outline of Today’s Talk
8
8
● IoT analytics over geo-distributed topologies.
● Abstract query model for query-driven IoT analytics.
● The StreamSight Framework
○ Query plan compilation.
○ Edge computing improvements.
○ Experimentation.
● Future research directions and open research questions.
9. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 9
Abstract Query Model
● Queries are applied on metric streams with the
intent to derive insights.
● Insights can be reused-transformed-composed with
other metric streams to create new insights.
<bus_id, bus99>,
<bus_delay, 5>
<bus_region, NW>
...
Metric
Record
Metric
Stream
10. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 10
Abstract Query Model
Insight = COMPUTE <Expression> EVERY <Interval> [WITH Optimizations>]
COMPUTE
➢The composition, transformation and aggregation of multiple metric
streams (e.g., expression, composite, aggregate).
EVERY
➢Denotes the interval the expression is evaluated and can be a time
interval (e.g., every 1min) or tuple-based (e.g., every 1000 records).
WITH
➢Optional statement for capturing user-defined optimizations and
constraints for data streams and edge topologies.
Metric Stream
Expression Insight Stream
Metric Stream
...
EVERY
11. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 11
Smart City Bus Network
edge server
● Buses equipped with GPS tracking devices emitting updates to respected
local edge server of the current region it is navigating through.
● Bus updates include: bus id, location coordinates, operating city region, an
estimation of the current bus route delay, etc.
● Inspired by Dublin smart city bus network.
12. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 12
Insight Operations
1. Window Operations: Aggregation of values within a time period
COMPUTE
ARITHMETIC_MEAN(bus_delay, 10 MINUTES)
EVERY 5 SECONDS
Raw metric stream Time periodAggregate
Time Interval
COMPUTE
ARITHMETIC_MEAN(bus_delay, 10 MINUTES)
BY city_segment EVERY 5 SECONDS
Group by a metric key
Examples of Aggregates: sum, count, sdev, median, percentile,etc.
Apache Spark
14 ops
Apache Spark
15 Ops
13. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 13
Insight Operations
2. Temporal Compositions: Compositions with different time windows
COMPUTE (
ARITHMETIC_MEAN(bus_delay, 10 MINUTES)
/
ARITHMEIC_MEAN (bus_delay, 60 MINUTES)
) EVERY 5 SECONDS
3. Accumulated Compositions: Updates on previously computed data
COMPUTE EWMA[0.85](passengers) BY bus_stop EVERY 1 TUPLE
Examples: running_mean, running_max, running_sdev, etc.
Apache Spark
32 ops
Apache Spark
24 ops
14. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
COMPUTE bus_delay
WHEN > ( RUNNING_MEAN(bus_delay) + 3 * RUNNING_SDEV(bus_delay) )
BY city_segment EVERY 5 SECONDS;
14
Insight Operations
4. Hybrid Compositions: Combing window and accumulated operations
COMPUTE (
ARITHMETIC_MEAN( bus_delay, 10 MINUTES)
-
EWMA[0.65]( bus_delay)
) BY city_segment EVERY 5 SECONDS
5. Filtered Compositions: Filter input and output streams
Window Operation
Accumulated Operation
Filter Predicate
Apache Spark
34 ops
Apache Spark
41 ops
15. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 15
Collaborative Edge Services
● Infrastructures of multiple stakeholders that are
geographically distributed
● Inspired by publically available data from:
○ the New York transportation authority,
○ the Dublin smart city bus network and
○ Uber
● Endorsed with real-time weather data from open-
access meteorological stations
● Companies, Employees and Clients can easily
submit their queries
16. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 16
Collaborative Edge Services
COMPUTE vehicleID
FROM (taxis, car_sharing)
WHEN GEOHASH[10](cusLoc) == GEOHASH[10](vehLoc)
EVERY 1 MINUTES
Geo-analytic Queries Travel app user interested
in finding closest taxis or
car-sharing vehicles.
Multiple Sources
The city segment with least
number of vehicles in a
15min sliding window
when the temperature
drops below 10◦C
COMPUTE MIN(
COUNT(buses, 15 MINUTES) BY city_segment +
COUNT(taxis, 15 MINUTES) BY city_segment +
COUNT(sharing, 15 MINUTES) BY city_segment
) WHEN temperature <= 10
EVERY 10 MINUTES
COMPUTE TOP_K[5] (
MEAN(total_amount, 1 MONTH)-
MEAN(total_amount, 1 MONTH, 1 MONTH )
) BY city_segment EVERY 1 HOURS
The top-5 city areas based
on current and previous
month average amount.
1 MONTH offset
Data-driven suggestions
17. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
Outline of Today’s Talk
17
17
● IoT analytics over geo-distributed topologies.
● Abstract query model for query-driven IoT analytics.
● The StreamSight Framework
○ Query plan compilation.
○ Edge computing improvements.
○ Experimentation.
● Future research directions and open research questions.
18. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 18
Specification, compilation, and execution of streaming IoT analytic
queries on distributed processing engines optimized for edge computing
environments.
StreamSight Framework
StreamSight: A Query-Driven Framework for Streaming Analytics in Edge Computing. Z. Georgiou et al, IEEE/ACM UCC, 2018.
Currently Supporting
Future Adapters
...
19. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
COMPUTE
ARITHMETIC_MEAN( bus_delay, 10 MINUTES)
BY city_segment EVERY 5 SECONDS
19
Query Model Translation
● Nodes correspond to a
grammar rule of the language
● Leaves are the tokens and
symbols of the language
Insight Description
Abstract Syntax Tree
● Parser performs early validation to verify syntactic correctness of query.
20. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
● Constructs Query Execution Plan, assembling
the pipeline of stream operations from the
AST representation.
20
Compilation Phase
● A recursive algorithm traverses the AST
● Each node is mapped to a stream operation
of the underlying processing engine
Abstract Syntax Tree
● Naive AST Mapping... extremely inefficient by
ignoring geo-distributed nature of edge realms
○ Unnecessary intermediate re-computations
○ Increased data movement
● AST must acknowledge these.
...
...
21. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
-
System Optimizations
21
Reusing intermediate results
● StreamSight caches and broadcasts across worker nodes expressions,
composites and results to reduce unnecessary re-computations.
Insight 1: Calculate current average bus_delay Insight 2: Calculate the ratio between current
and last hour bus_delay
22. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
User Optimizations
22
[1] ApproxIoT: Approximate Analytics for Edge Computing, Z. Wen et al, ICDCS, 2018
Sampling enables the execution of an insight description on a portion of the
streamed measurements for approximate but in time answers (k <<N)
● Uniform Sampling
● Weighted Hierarchical Reservoir Sampling (WHRS)1
● Applies on the fly reservoir + stratified sampling
StreamSight allows the user to prioritize insights
● On high-load influx or network uncertainties critical queries are not
delayed while less important are queued.
23. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 23
User Optimizations
COMPUTE MAX(taxis_fare_amount, 60 MINUTES)
BY city_segment EVERY 1 MINUTES
WITH SALIENCE 1 Priority Higher is better
Sampling with Error Margin & Confidence:
COMPUTE
ARITHMETIC_MEAN(taxi_passengers, 10 MINUTES)
EVERY 30 SECONDS
WITH MAX_ERROR 0.05 AND CONFIDENCE 0.95
Error upper bound Confidence Interval
COMPUTE ARITHMETIC_MEAN(bus_delay, 60 MINUTES)
BY stop_id EVERY 5 MINUTES
WITH SALIENCE 1 AND SAMPLE 0.2
Prioritization On high-load influx
critical queries are not
delayed
Uniform Sampling Query execution on a
portion of the data
stream
Query execution with
bounded error
guarantees for sampling
24. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 24
User Optimizations
COMPUTE COUNT(taxis)
BY city_segment
EVERY 1 SECONDS
WITH ALLOW ON DEDICATED[5]
Dedicated Execution
Number of Dedicated
Nodes
COMPUTE
PEWMA[0.5](bus_delay) BY bus_id
EVERY 30 SECONDS
WITH MAX_ERROR 0.05 AND CONFIDENCE 0.95
AND AWARENESS ON COMPUTATIONS Try to minimize the
Computations
Try to maximize the
Accuracy
Awareness on Computations
Accuracy Aware Execution
COMPUTE
PEWMA[0.5](bus_delay) BY bus_id
EVERY 30 SECONDS
WITH MAX_ERROR 0.05 AND CONFIDENCE 0.95
AND AWARENESS ON ACCURACY
Execution of crucial
queries on dedicated
Nodes
Minimize the computation
footprint of execution for
less significant queries but
at the same time keep the
error less than 5%
Only in high influx periods
sacrifice a portion of the
accuracy but keep the error
less than 5%
26. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 26
Dublin Bus Workload
Real-World Datasets
● Dublin Smart City Buses Network[1]
○ 968 Buses (Jan 2014)
○ 16 metrics/record, including: bus_id, bus_delay, city_segment
○ Used 7 insights of actual interest for Bus operators
[1] Dublin, “Smart City ITS,” https://data.smartdublin.ie/, 2018
16 Edge servers
● 1 vCPU, 1GB MEM, 2↑ 16↓ Mbps
Evaluation Metric
● Batch Processing Time
Unstable
System
Stable
System
➢StreamSight achieved x1.4 speedup over the baseline
➢StreamSight+WHRS achieved x4.3 speedup over the baseline
27. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 27
Re-usage of Intermediate Results
● Dublin Bus Workload
● Average Processing Time ( Fixed Input rate 700 req/s )
StreamSight DOES NOT
incur a performance
overhead
Baseline configuration failed
28. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
Outline of Today’s Talk
28
28
● IoT analytics over geo-distributed topologies.
● Abstract query model for query-driven IoT analytics.
● The StreamSight Framework
○ Query plan compilation.
○ Edge computing improvements.
○ Experimentation.
● Future research directions and open research questions.
29. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 29
● Same composition across different insights - different queries but with common
operators.
● Same operators across different compositions - e.g., MEAN, is composed from a
SUM divided by a COUNT. If either SUM or COUNT available then reuse them.
● Same composition across different offsets1
● Re-use insights across users - involves tracking shared results across deployments
and users, privacy protection, etc. (possibly use of blockchain?)
COMPUTE
ARITHMETIC_MEAN(consumption, 10 MINUTES)/
ARITHMETIC_MEAN(consumption, 10 MINUTES, 10 MINUTES)
EVERY 15 MINUTES
we can cache and reuse the
composition for 10 minutes
Reusage of Intermediate Results
[1] SlickDeque: High Throughput and Low Latency Incremental Sliding-Window Aggregation. A. Shein et al, EDBT, 2018.
30. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 30
● Query model operators: DEDICATED, SALIENCE, ALLOW ON, AWARENESS, etc.
● Still… fog-device-user mobility and network uncertainties affect IoT services
QoS, cost, and energy consumption.
● Analytics job scheduling requires “intelligent” consideration of data placement
when orchestrating dynamic IoT services.
● Ignoring this can result in IoT services placed for optimal responsiveness but
failing to guarantee timely insight refreshment.
Query Execution Placement
ADMin: Adaptive Monitoring Dissemination for the Internet of Things, D. Trihinas et al., IEEE,INFOCOM, 2017.
31. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 31
● Moving to the “edge” means not only are data sources diverse but possibly
even the data processing engines.
● These engines must “speak” the same language.
● Open specification vs federation layer?
Multiple and Heterogeneous Data
Processing Engines
OpenFog Consortium and OpenEdge Initiative
32. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 32
● Do we always need to actually compute the answer on the entire data?
○ Sampling…
○ Yes, but we need bounded approximations… and these approximations must
be computed efficiently across geo-distributed environments.
■ Beware… substituting one computation with another must be beneficial in
terms of performance (e.g., multivariate and dependent metrics)1.
● Do we always need to actually compute the answer?
○ or... can we use a bounded approximation on recent history be satisfactory2.
Data-less Query Execution
[1] ATMoN: Adapting the ”Temporality” in Large-Scale Dynamic Networks, D Trihinas et al, IEEE ICDCS, 2018.
[2] Towards intelligent distributed data systems for scalable efficient and accurate analytics, P. Triantafyllou et al, IEEE ICDCS, 2018.
33. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 33
● Query model provides provisions for data confidentiality, restricted access
control and data movement constraints across geo-locations.
● Offloading sensitive data to the cloud hinders man-in-the-middle attacks… on
the other hand… processing “in place” hinders attacks (e.g., DDoS) on “easier”
attacking planes (e.g., low-power IoT devices).
● Query model NOT enough… geo-distributed analytics requires task scheduling
algorithms to acknowledge privacy-aware compute… How to do this efficiently?
Security & Privacy
COMPUTE patient_stream
EVERY 5 MINUTES
WITH ALLOW
WHEN MEAN( heart_beat, 1 MINUTES ) >= 190
AND doctor_id IN (doctor_ids)
AND region == clinic_region
Evaluation
Rule
34. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
Conclusion
34
● Abstract query model for query-driven IoT analytics
○ Use cases (smart city, energy, health, microservices) illustrating value of the query model.
● A prototype framework called StreamSight
○ A framework for the specification, compilation, and execution of streaming analytic
queries on the “Edge” .
○ Optimizations:
■ Intermediate results
■ User-optimizations
○ StreamSight can achieve up to 4.3x speedup compared to a naively deployment.
● Many open research challenges for geo-distributed and query-driven
analytics in edge/fog topologies.
Reduce compute and network load on
the Edge
35. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
THANK YOU
This work is partially
supported by the European
Commission in terms of
Unicorn 731846 H2020 project
(H2020-ICT-2016-1)
Download StreamSight at: https://github.com/UCY-
LINC-LAB/StreamSight.git
35
36. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019 36
Energy Consumption in Micro-DCs
● Micro-DCs, also denoted as Green-DCs, powered by:
○ National electricity providers and
○ Photovoltaic power harvesting stations placed near to the DCs
● A wide range of sensors are placed in all datacenter racks and the
photovoltaic stations which generates measurements like:
○ Temperature and Energy consumption per Data Center, per Rack or per
Node
○ Energy generation per Photovoltaic Panel
○ Weather data from station like humidity, wind, temperature etc
● Inspired by ENEDI project http://enedi.eu
ENEDI: Energy Saving in Datacenters, Tryfonos et al, IEEE Global IoT, 2018.
37. D. Trihinas
trihinas@cs.ucy.ac.cy
Laboratory for
Internet Computing
StreamSight - IC2E 2019
ProcessingTime(s)
37
Insight Prioritization
● Dublin Bus Workload
● Average Processing Time (fixed workload)
● 1 Insight with high priority and 3 insights with low priority
Non prioritized queries are
queued
Introduced artificial latency (x2) between worker nodes
Prioritized insight
experiences no delay