More and more applications require real-time processing of heterogeneous data streams. In terms of the “Vs” of Big Data (volume, velocity, variety and veracity), they require addressing velocity and variety at the same time. Big Data solutions able to handle separately velocity and variety have been around for a while, but only Stream Reasoning approaches those two dimensions at once. Current results in the Stream Reasoning field are relevant for application areas that require to: handle massive datasets, process data streams on the fly, cope with heterogeneous incomplete and noisy data, provide reactive answers, support fine-grained information access, and integrate complex domain models. This talk starting from those requirements, frames the problem addressed by Stream Reasoning. It poses the research question and operationalise it with four simpler sub-questions. It describes how the database group of Politecnico di Milano positively answered those sub-questions in the last 7 years of research. It briefly surveys alternative approaches investigated by other research groups world wide and it elaborates on current limitations and open challenges.
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
Stream reasoning: mastering the velocity and the variety dimensions of Big Data at once
1. Stream
Reasoning:
mastering
the
velocity
and
the
variety
dimensions
of
Big
Data
at
once
Emanuele
Della
Valle
DEIB
-‐
Politecnico
di
Milano
@manudellavalle
emanuele.dellavalle@polimi.it
hBp://emanueledellavalle.org
University
of
Olso,
Norway
-‐
3.11.2015
2. It's
a
streaming
world
…
• Off-‐shore
oil
operaQons
• Smart
CiQes
• Global
Contact
Center
• Social
networks
• Generate
data
streams!
E.
Della
Valle,
S.
Ceri,
F.
van
Harmelen,
D.
Fensel
It's
a
Streaming
World!
Reasoning
upon
Rapidly
Changing
Informa:on.
IEEE
Intelligent
Systems
24(6):
83-‐89
(2009)
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
2
3. …
looking
for
reacQve
answers
…
• What
is
the
expected
Qme
to
failure
when
that
turbine's
barring
starts
to
vibrate
as
detected
in
the
last
10
minutes?
• Is
public
transportaQon
where
the
people
are?
• Who
are
the
best
available
agents
to
route
all
these
unexpected
contacts
about
the
tariff
plan
launched
yesterday?
• Who
is
driving
the
discussion
about
the
top
10
emerging
topics
?
• Require
conQnuous
processing
and
reacQve
answer
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
3
4. …with
conflicQng
requirements
1/8
A
system
able
to
answer
those
queries
must
be
able
to
• handle
massive
datasets
– A
typical
oil
producQon
plaeorm
is
equipped
with
about
400.000
sensors
– Telecom
data
is
the
most
pervasive
data
source
in
urban
are,
in
Milano
there
are
1.8
million
mobile
users
– A
global
contact
centre
of
a
Telecom
operator
counts
500
millions
of
clients
– Facebook
alone
has
1.1
billion
of
acQve
users
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
4
5. …with
conflicQng
requirements
2/8
A
system
able
to
answer
those
queries
must
be
able
to
• process
data
streams
on
the
fly
– The
sensors
on
typical
oil
producQon
plaeorm
generates
10,000
observaQons
per
minute
with
peaks
of
100,000
o/m
– The
mobile
users
in
Milano
generates
20,000
call/sms/data
connecQons
per
minute
with
peaks
of
80,000
c/m
– A
global
contact
centre
receives
10,000
contacts
per
minute
with
peaks
of
30,000
c/m
– Facebook,
as
of
May
2013,
observes
3
millions
"I
like"
per
minute
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
5
6. …with
conflicQng
requirements
3/8
A
system
able
to
answer
those
queries
must
be
able
to
• cope
with
heterogeneous
dataset
– The
sensors
on
typical
oil
producQon
have
been
deployed
over
10
years
by
10s
of
different
producers
– Tens
of
data
sources
are
normally
needed
to
make
sense
of
an
urban
phenomena
– A
global
contact
centre
consists
in
100s
of
offices
owned
by
different
subsidiary
companies
engaged
yearly
– Each
social
network
has
its
own
data
model,
APIs,
…
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
6
7. …with
conflicQng
requirements
4/8
A
system
able
to
answer
those
queries
must
be
able
to
• cope
with
incomplete
data
– 10s
of
sensors
and
networking
links
broke
down
daily
– Coverage
is
incomplete
– Only
standard
cases
are
covered
by
fully
machine
processable
data
records
100s
of
contacts
per
minute
are
manage
ad-‐hoc
– Conversa:ons
happen
outside
the
social
networks,
too!
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
7
8. …with
conflicQng
requirements
5/8
A
system
able
to
answer
those
queries
must
be
able
to
• cope
with
noisy
data
– Sensor
out-‐of-‐opera:ng
range
– Faulty
sensors
– Agents
misunderstand,
get
:red,
…
–
Irony,
sarcasm,
…
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
8
9. …with
conflicQng
requirements
6/8
A
system
able
to
answer
those
queries
must
be
able
to
• provide
reac:ve
answers
– detecQon
of
dangerous
situaQons
must
occur
within
minutes
– recommendaQons
to
ciQzens
must
be
performed
in
few
seconds
– rouQng
a
contact
through
each
step
of
the
decision
tree
must
take
less
than
a
second
– Search
autocompleQng
may
need
to
be
updated
every
few
minutes
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
9
10. …with
conflicQng
requirements
7/8
A
system
able
to
answer
those
queries
must
be
able
to
• support
fine-‐grained
informa:on
access
– IdenQfy
a
turbine
among
thousands
– Locate
a
bus
among
thousands
– Contact
an
agent
among
thousands
– IdenQfy
an
opinion
maker
among
thousands
of
influencers
for
a
topic
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
10
11. …with
conflicQng
requirements
8/8
A
system
able
to
answer
those
queries
must
be
able
to
• integrate
complex
domain
models
of
– opera:onal
and
control
process
– various
city
aspects
– contact
management,
contract
types,
agent
skills,
contactor
profiles,
…
– topics,
user
profiles,
…
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
11
12. Challenges
A
system
able
to
answer
those
queries
must
be
able
to
• handle
massive
datasets
x
• process
data
streams
on
the
fly
x
• cope
with
heterogeneous
datasets
x
• cope
with
incomplete
data
x
x
• cope
with
noisy
data
x
• provide
reac:ve
answers
x
• support
fine-‐grained
access
x
x
• integrate
complex
domain
models
x
Volume'
Velocity'
Variety'
Veracity'
In Big Data terms
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
12
13. Grand
challenge
• Volume
+
Velocity
+
Variety
=
hard
deal
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
Volume
months days hours min. sec. ms.
velocity
ZB
EB
PB
TB
GB
MB
KB
Variety
13
14. A
good
reason
to
embrace
it!
• ++
Variety
à
++
value
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
Value
ms. sec. min. hours days months years
velocity
Variety
14
15. From
challenges
to
opportuniQes
• Formally
data
streams
are
:
– unbounded
sequences
of
Qme-‐varying
data
elements
• Less
formally,
in
many
applicaQon
domains,
they
are:
– a
“conQnuous”
flow
of
informaQon
– where
recent
informa:on
is
more
relevant
as
it
describes
the
current
state
of
a
dynamic
system
• OpportuniQes
– Forget
old
enough
informa:on
– Exploit
the
implicit
ordering
(by
recency)
in
the
data
time
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
15
16. State-‐of-‐the-‐art:
DSMS
and
CEP
• A
paradigma:c
change!
• ConQnuous
queries
registered
over
streams
that
are
observed
trough
windows
window
input streams streams of answerRegistered
ConQnuous
Query
Dynamic
System
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
16
17. DSMS
and
CEP
vs.
requirements
Requirement
DSMS
CEP
massive datasets
data streams
heterogeneous dataset
incomplete data
noisy data
reactive answers
fine-grained information access
complex domain models
✗
✗
✗
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
17
18. State of the art: OBDA
• Given
ontology
O
and
query
Q,
use
O
to
rewrite
Q
as
Q’
so
that,
for
any
set
of
ground
facts
A
contained
in
mulQple
databases:
– answer(Q,O,A)
=
answer(Q’,!,A)
The
answer
of
the
query
Q
using
the
ontology
O
for
any
set
of
ground
facts
A
is
equal
to
answer
of
a
query
Q’
without
considering
the
ontology
O
• Use
mapping
M
to
map
Q’
to
mulQple
SQL
queries
to
the
various
databases
Rewrite
O
Q
Q’
Map
SQL
M
answer
A
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
18
19. DSMS/CEP,OBDA
vs.
requirements
Requirement
DSMS
CEP
OBDA
massive datasets
data streams
heterogeneous dataset
incomplete data
noisy data
reactive answers
fine-grained information access
complex domain models
✗
✗
✗
✗
✗
✗
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
19
20. Stream
Reasoning
• Research
quesQon
– is
it
possible
to
make
sense
in
real
:me
of
mul:ple,
heterogeneous,
gigan:c
and
inevitably
noisy
and
incomplete
data
streams
in
order
to
support
the
decision
processes
of
extremely
large
numbers
of
concurrent
users?
• Proposed
approach
Complexity
Raw
Stream
Processing
SemanQc
Streams
DL-‐Lite
DL
AbstracQon
SelecQon
InterpretaQon
Reasoning
Querying
Re-‐wriQng
Change
Frequency
PTIME
NEXPTIME
104
Hz
1
Hz
Complexity
vs.
Dynamics
AC0
H.
Stuckenschmidt,
S.
Ceri,
E.
Della
Valle,
F.
van
Harmelen:
Towards
Expressive
Stream
Reasoning.
Proceedings
of
the
Dagstuhl
Seminar
on
SemanQc
Aspects
of
Sensor
Networks,
2010.
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
20
21. Sub-‐research
quesQons
1. Is
it
possible
extend
the
Seman:c
Web
stack
in
order
to
represent
heterogeneous
data
streams,
conQnuous
queries,
and
conQnuous
reasoning
tasks?
2. Does
the
ordered
nature
of
data
streams
and
the
possibility
to
forget
old
enough
informaQon
allow
to
op:mize
con:nuous
querying
and
con:nuous
reasoning
tasks
so
to
provide
reac:ve
answers
to
large
number
of
concurrent
users
without
forsaking
correctness
or
completeness?
3. Can
SemanQc
Web
and
Machine
Learning
technologies
be
jointly
employed
to
cope
with
the
noisy
and
incomplete
nature
of
data
streams?
4. Are
there
prac:cal
cases
where
processing
data
stream
at
semanQc
level
is
the
best
choice?
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
21
22. Sub-‐research
quesQons
1. Is
it
possible
extend
the
Seman:c
Web
stack
in
order
to
represent
heterogeneous
data
streams,
conQnuous
queries,
and
conQnuous
reasoning
tasks?
2. Does
the
ordered
nature
of
data
streams
and
the
possibility
to
forget
old
enough
informaQon
allow
to
op:mize
con:nuous
querying
and
con:nuous
reasoning
tasks
so
to
provide
reac:ve
answers
to
large
number
of
concurrent
users
without
forsaking
correctness
or
completeness?
3. Can
SemanQc
Web
and
Machine
Learning
technologies
be
jointly
employed
to
cope
with
the
noisy
and
incomplete
nature
of
data
streams?
4. Are
there
prac:cal
cases
where
processing
data
stream
at
semanQc
level
is
the
best
choice?
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
22
23. State-‐of-‐the-‐art:
RDF
model
• RDF:
Resource
DescripQon
Framework
– It
allows
to
make
statements
about
resources
in
the
form
of
subject-‐predicate-‐object
expressions
• In
RDF
terminology
triples
• E.g.
@BarakObama
posts
"Four
more
years"
– A
collecQon
of
RDF
statements
represents
a
labelled,
directed
graph
• In
RDF
terminology
a
graph
• E.g.,
the
tweet
above
by
Barak
Obama
is
connected
to
– 800,000+
twiBer
user
profiles
via
retweets
– 300,000+
twiBer
user
profiles
favorite
– …
subject predicate object
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
23
24. ContribuQon:
RDF
stream
Models
• RDF
Stream
(the
C-‐SPARQL
way)
– Unbound
sequence
of
:me-‐varying
triples
– each
represented
by
a
pair
made
of
an
RDF
triple
and
its
Qmestamp
– Timestamp
are
non-‐decreasing
(allowing
for
simultaneity)
…
@BarakObama
posts
"Four
more
years",
8:16PM
6
Nov
2012
@Alice
posts
"RT:
Four
more
years",
8:17PM
6
Nov
2012
…
D.F.
Barbieri,
D.
Braga,
S.
Ceri,
E.
Della
Valle,
M.
Grossniklaus:
Querying
RDF
streams
with
C-‐SPARQL.
SIGMOD
Record
39(1):
20-‐26
(2010)
subject predicate object timestamp
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
24
25. ContribuQon:
RDF
stream
Models
• RDF
Stream
(the
Streaming
Linked
Data
way)
– Unbound
sequence
of
:me-‐varying
graphs
– each
represented
by
a
pair
made
of
an
RDF
graph
and
its
Qmestamp
– Timestamps
(if
present)
are
monotonically
increasing
– Graphs
act
as
a
form
of
punctuaQon
(all
triples
in
a
graph
are
simultaneous)
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
D.F.
Barbieri,
E.
Della
Valle:
A
Proposal
for
Publishing
Data
Streams
as
Linked
Data
-‐
A
Posi:on
Paper.
LDOW
(2010)
25
26. RDF
streams
Qme
semanQcs
1/3
• A
RDF
stream
without
Qmestamp
is
an
ordered
sequence
of
data
items
• The
order
can
be
exploited
to
perform
queries
– Does
Alice
meet
Bob
before
Carl?
– Who
does
Carl
meet
first?
S
e1
:alice
:isWith
:bob
e2
:alice
:isWith
:carl
e3
:bob
:isWith
:diana
e4
:diana
:isWith
:carl
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
26
27. RDF
streams
Qme
semanQcs
2/3
• One
Qmestamp:
the
Qme
instant
on
which
the
data
item
occurs
• We
can
start
to
compose
queries
taking
into
account
the
Qme
– How
many
people
has
Alice
met
in
the
last
5m?
– Does
Diana
meet
Bob
and
then
Carl
within
5m?
e1
e2
e3
e4
S
t
3
6
9
1
:alice
:isWith
:bob
:alice
:isWith
:carl
:bob
:isWith
:diana
:diana
:isWith
:carl
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
27
28. RDF
streams
Qme
semanQcs
3/3
• Two
Qmestamps:
the
Qme
range
on
which
the
data
item
is
valid
(from,
to]
• It
is
possible
to
write
even
more
complex
constraints:
– Which
are
the
meeQngs
the
last
less
than
5m?
– Which
are
the
meeQngs
with
conflicts?
.
S
t
3
6
9
1
:alice
:isWith
:bob
:alice
:isWith
:carl
:bob
:isWith
:diana
:diana
:isWith
:carl
e1
e2
e3
e4
D.
Anicic,
P.
Fodor,
S.
Rudolph,
&
N.
Stojanovic.
EP-‐SPARQL:
a
unified
language
for
event
processing
and
stream
reasoning.
In
WWW
2011,
pages
635–644
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
28
29. Finding
• The
Seman:c
Web
stack
can
be
extended
so
to
incorporate
streaming
data
as
a
first
class
ciQzen
– RDF
stream
data
model(s)
– Con:nuous
SPARQL
syntax
and
semanQcs
– Con:nuous
deduc:ve
reasoning
semanQcs
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
29
30. Work
in
progress
• In
2013,
an
RDF
Stream
Processing
(RSP)
community
group
was
created
at
W3C
hBp://www.w3.org/community/rsp/
• RSP
data
model
and
serializaQon
– hBps://github.com/streamreasoning/RSP-‐QL/blob/
master/SerializaQon.md
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
30
33. ContribuQon:
ConQnuous-‐SPARQL
Who
are
the
opinion
makers?
i.e.,
the
users
who
are
likely
to
influence
the
behavior
their
followers
REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS
CONSTRUCT { ?opinionMaker sd:about ?resource }
FROM STREAM <http://…> [RANGE 30m STEP 5m]
WHERE {
?opinionMaker ?opinion ?res .
?follower sioc:follows ?opinionMaker.
?follower ?opinion ?res.
FILTER (cs:timestamp(?follower ?opinion ?res) >
cs:timestamp(?opinionMaker ?opinion ?res) )
}
HAVING ( COUNT(DISTINCT ?follower) > 3 )
SR
2015,
Austria
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
33
34. ContribuQon:
ConQnuous-‐SPARQL
Who
are
the
opinion
makers?
i.e.,
the
users
who
are
likely
to
influence
the
behavior
their
followers
REGISTER STREAM OpinionMakers COMPUTED EVERY 5m AS
CONSTRUCT { ?opinionMaker sd:about ?resource }
FROM STREAM <http://…> [RANGE 30m STEP 5m]
WHERE {
?opinionMaker ?opinion ?res .
?follower sioc:follows ?opinionMaker.
?follower ?opinion ?res.
FILTER (cs:timestamp(?follower ?opinion ?res) >
cs:timestamp(?opinionMaker ?opinion ?res) )
}
HAVING ( COUNT(DISTINCT ?follower) > 3 )
Query
registra:on
(for
con:nuous
execu:on)
FROM
STREAM
clause
WINDOW
RDF
Stream
added
as
new
ouput
format
Buil:n
to
access
:mestamps
SR
2015,
Austria
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
D.F.
Barbieri,
D.
Braga,
S.
Ceri,
E.
Della
Valle,
M.
Grossniklaus:
Querying
RDF
streams
with
C-‐SPARQL.
SIGMOD
Record
39(1):
20-‐26
(2010)
34
35. Finding
• The
Seman:c
Web
stack
can
be
extended
so
to
incorporate
streaming
data
as
a
first
class
ciQzen
– RDF
stream
data
model
– Con:nuous
SPARQL
syntax
and
semanQcs
– Con:nuous
deduc:ve
reasoning
semanQcs
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
35
36. AlternaQves
to
C-‐SPARQL
• CQELS
– What:
STREAM
clause,
focus
on
new
answer
– Ref:
Le-‐Phuoc,
D.,
Dao-‐Tran,
M.,
Xavier
Parreira,
J.,
&
Hauswirth,
M.
A
naQve
and
adapQve
approach
for
unified
processing
of
linked
streams
and
linked
data.
In
ISWC
2011,
pages
370–388.
• SPARQLStream
– What:
window
in
the
past,
focus
on
RDF
to
Stream
operators
– Ref:
Calbimonte,
J.-‐P.,
Corcho,
O.,
&
Gray,
A.
J.
G.
Enabling
ontology-‐based
access
to
streaming
data
sources.
In
ISWC,
2010,
pages
96–111.
• EP-‐SPARQL
– What:
focus
on
event
specific
operators
– Ref:
Anicic,
D.,
Fodor,
P.,
Rudolph,
S.,
&
Stojanovic,
N.
EP-‐SPARQL:
a
unified
language
for
event
processing
and
stream
reasoning.
In
WWW
2011,
pages
635–644.
• TEF-‐SPARQL
– What:
adds
"facts"
as
first
class
elements
– Ref:
hBps://www.merlin.uzh.ch/publicaQon/show/8467
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
36
37. AlternaQves
to
C-‐SPARQL
• Comparison
between
exisQng
approaches
System
S2R
R2R
Time-‐aware
R2S
C-‐SPARQL
Engine
Logical
and
triple-‐based
SPARQL
1.1
query
Qmestamp
funcQon
Batch
only
Streaming
Linked
Data
Framework
Logical
and
graph-‐based
SPARQL
1.1
no
Batch
only
SPARQLstream
Logical
and
triple-‐based
SPARQL
1.1
query
no
Ins,
batch,
del
CQELS
Logical
and
triple-‐based
SPARQL
1.1
query
no
Ins
only
TEF-‐SPARQL
no
SPARQL-‐like
Temporarily
Facts,
BEFORE
SINCE,
UNTIL,
DURING,
Batch
only
EP-‐SPARQL
no
SPARQL
1.0
SEQ,
PAR,
AND,
OR,
DURING,
STARTS,
EQUALS,
NOT,
MEETS,
FINISHES
Ins
only
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
37
38. Work
in
progress
at
RSP@W3C
• RSP-‐QL
– Syntax
• hBps://github.com/streamreasoning/RSP-‐QL/blob/master/RSP-‐
QL%20Sample%20Queries.md
– Proposed
semanQcs
• D.Dell'Aglio,
E.Della
Valle,
J.-‐P.Calbimonte,
Ó.
Corcho:
RSP-‐QL
SemanQcs:
A
Unifying
Query
Model
to
Explain
Heterogeneity
of
RDF
Stream
Processing
Systems.
Int.
J.
SemanQc
Web
Inf.
Syst.
10(4):
17-‐44
(2014)
– SemanQcs
(work
in
progress)
• hBps://github.com/streamreasoning/RSP-‐QL/blob/master/
SemanQcs.md
– Quick
ref.
• D.
Dell'Aglio,
J.-‐P.
Calbimonte,
E.
Della
Valle,
Ó.
Corcho:
Towards
a
Unified
Language
for
RDF
Stream
Query
Processing.
ESWC
(Satellite
Events)
2015:
353-‐363
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
38
39. ContribuQon:
conQnuous
deducQve
reasoning
• DL
Ontology
Stream
ST
– A
ontology
stream
with
respect
to
a
staQc
Tbox
T
is
a
sequence
of
Abox
axioms
ST(i)
• A
Windowed
Ontology
Stream
ST(o,c]
– A
windowed
ontology
stream
with
respect
to
a
staQc
Tbox
T
is
the
union
of
the
Abox
axioms
ST(i)
where
o<i≤c
• Reasoning
on
a
Windowed
Ontology
Stream
ST(o,c]
is
as
reasoning
on
a
staQc
DL
KB
SR
2015,
Austria
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
39
Emanuele
Della
Valle,
Stefano
Ceri,
Davide
Francesco
Barbieri,
Daniele
Braga,
Alessandro
Campi:
A
First
Step
Towards
Stream
Reasoning.
FIS
2008:
72-‐81
40. discusses
discusses
discusses
discusses
discusses
discusses
discusses
Example
of
conQnuous
deducQve
reasoning
What impact has been my micropost p1 creating in the last hour?
Let’s count the number of microposts that discuss it …
REGISTER STREAM ImpactMeter AS
SELECT (count(?p) AS ?impact)
FROM STREAM <http://…/fb> [RANGE 60m STEP 10m]
WHERE {
:Alice posts [ sr:discusses ?p ]
}
p1
p3
p5
p8
p2
p4
p7
p6
7!
Transitive
property
Alice posts p1 .
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
40
41. Finding
• The
Seman:c
Web
stack
can
be
extended
so
to
incorporate
streaming
data
as
a
first
class
ciQzen
– RDF
stream
data
model
– Con:nuous
SPARQL
syntax
and
semanQcs
– Con:nuous
deduc:ve
reasoning
semanQcs
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
41
42. AlternaQves
to
conQnuous
deducQve
(RDFS++)
reasoning
• ETALIS
– What:
RDFS
+
Allen
Algebra
– Ref:
Anicic,
D.,
Rudolph,
S.,
Fodor,
P.,
&
Stojanovic,
N.
Stream
reasoning
and
complex
event
processing
in
ETALIS.
SemanQc
Web,
3(4),
2012,
397–407.
• STARQL
– What:
• DL-‐Lite
+
ConjuncQve
Query
+
Qme-‐series
• SHI
+
Grounded
ConjuncQve
Queries
+
Qme-‐series
– Ref:
ÖL
Özçep,
R
Möller.
Ontology
Based
Data
Access
on
Temporal
and
Streaming
Data.
Reasoning
Web,
2014
• ASP-‐based
– What:
Qme-‐decaying
ASP
– Ref:
hBp://arxiv.org/abs/1301.1392
• LARS
– What:
high-‐level
unified
formal
foundaQon
for
stream
reasoning
– Ref:
H.
Beck,
M.
Dao-‐Tran,
T.
Eiter,
M.
Fink:
LARS:
A
Logic-‐Based
Framework
for
Analyzing
Reasoning
over
Streams.
AAAI
2015:
1431-‐1438H.
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
42
43. Sub-‐research
quesQons
1. Is
it
possible
extend
the
Seman:c
Web
stack
in
order
to
represent
heterogeneous
data
streams,
conQnuous
queries,
and
conQnuous
reasoning
tasks?
2. Does
the
ordered
nature
of
data
streams
and
the
possibility
to
forget
old
enough
informaQon
allow
to
op:mize
con:nuous
querying
and
con:nuous
reasoning
tasks
so
to
provide
reac:ve
answers
to
large
number
of
concurrent
users
without
forsaking
correctness
or
completeness?
3. Can
SemanQc
Web
and
Machine
Learning
technologies
be
jointly
employed
to
cope
with
the
noisy
and
incomplete
nature
of
data
streams?
4. Are
there
prac:cal
cases
where
processing
data
stream
at
semanQc
level
is
the
best
choice?
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
43
44. ContribuQon:
opQmize
querying
for
reacQve
answers
• C-‐SPARQL
engine
Qme
window-‐based
selecQon
outperforms
SPARQL
filter-‐based
selecQon
(Jena-‐ARQ)
D.
Barbieri,
D.
Braga,
S.
Ceri,
E.
Della
Valle,
Y.
Huang,
V.
Tresp,
A.Re•nger,
H.
Wermser:
Deduc:ve
and
Induc:ve
Stream
Reasoning
for
Seman:c
Social
Media
Analy:cs
IEEE
Intelligent
Systems,
30
Aug.
2010.
Our In-memory
RDF stream
processing
engine
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
44
45. Finding
• Stream
Reasoning
task
is
feasible
and
the
very
nature
of
streaming
data
offers
opportuniQes
to
op:mise
reasoning
tasks
where
data
is
ordered
by
recency
and
can
be
forgoBen
a€er
a
while
– C-‐SPARQL
Engine
prototype
– IMaRS
conQnuous
incremental
reasoning
algorithm
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
45
46. Work
in
progress
• When
volumes
also
maBers
…
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
46
Join
Data
Stream
SPARQL
endpoint
Window
Maintenance
Policy
Local
View
RSP
engine
Web
Soheila
Dehghanzadeh,
Daniele
Dell'Aglio,
Shen
Gao,
Emanuele
Della
Valle,
Alessandra
Mileo,
Abraham
Bernstein:
Approximate
Con:nuous
Query
Answering
over
Streams
and
Dynamic
Linked
Data
Sets.
ICWE
2015:
307-‐325
47. State-‐of-‐the-‐art
deducQve
reasoning
• Data-‐driven
(a.k.a.
forward
reasoning)
• Query-‐driven
–
backward
reasoning
• Query-‐driven
–
query
rewriQng
(a.k.a.
ontology
based
data
access)
Reasoner
RDFd
ata
SPARQL
Inferred
data
ontology
SPARQL
ontology
RewriBen
query
Reasoner
Reasoner
RDFd
ata
SPARQL
ontology
data
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
47
48. Naïve
approaches
to
Stream
Reasoning
windowing
then
reasoning
• Data-‐driven
(a.k.a.
forward
reasoning)
• Query-‐driven
–
backward
reasoning
• Query-‐driven
–
query
rewriQng
(a.k.a.
ontology
based
data
access)
Reasoner
RDF
data
SPARQL
Inferred
data
ontology
ontology
RewriBen
query
Reasoner
Reasoner
RDF
data
ontology
Window
Window
Window
SPARQL
SPARQL
data
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
48
49. Not
so
naïve
approach
to
stream
reasoning
• The
problem
is
that
materializaQon
(the
result
of
data-‐driven
processing)
are
very
difficult
to
decrement
efficiently.
– State-‐of-‐the-‐art:
DRed
algorithm
• Over
delete
• Re-‐derive
• Insert
Reasoner
Inferred
data
ontology
window
inserQons
deleQons
Incremental
!!!
SPARQL
Y.
Ren,
J.
Z.
Pan.
OpQmising
ontology
stream
reasoning
with
truth
maintenance
system.
In
CIKM
(2011)
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
49
50. Is
DRed
needed?
• DRed
works
with
random
inserQons
and
deleQons
• In
a
streaming
sedng,
when
a
triple
enters
the
window,
given
the
size
of
the
window,
the
reasoner
knows
already
when
it
will
be
deleted!
• E.g.,
– if
the
window
is
40
minutes
long,
and,
– it
is
10:00,
the
triple(s)
entering
now
– will
exit
on
10:40.
• Conclusion
– dele:ons
are
predictable
Time
Enter
window
Exit
window
Explicitly in
window
Infer
win
10:00 A!B
10:10 B!C
10:20 A!E
10:30 E!C
10:40 A!B
10:50 B!C
11:00 A!E
A B
A B C A
A B C
E
A
A B C
E
A
A C
E
A
A B C
E
A
C
E
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
50
51. ContribuQon:
IMaRS
algorithm
• Idea:
– add
an
expira:on
:me
to
each
triple
and
– use
an
hash
table
to
index
triples
by
their
expiraQon
Qme
• The
algorithm
1. deletes
expired
triples
2. Adds
the
new
derivaQons
that
are
consequences
of
inserQons
annota:ng
each
inferred
triple
with
an
expira:on
:me
(the
min
of
those
of
the
triple
it
is
derived
from),
and
3. when
mul:ple
deriva:ons
occur,
for
each
mulQple
derivaQon,
it
keeps
the
max
expiraQon
Qme.
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
51
52. ContribuQon:
IMaRS
algorithm
• Incremental
Reasoning
on
RDF
streams
(IMaRS):
new
reasoning
algorithm
opQmized
for
reacQve
query
answering
D.F.
Barbieri,
D.
Braga,
S.Ceri,
E.
Della
Valle,
M.
Grossniklaus:
Incremental
Reasoning
on
Streams
and
Rich
Background
Knowledge.
ESWC
(1)
2010:
1-‐15
D.
Dell'Aglio,
E.
Della
Valle:
Incremental
Reasoning
on
RDF
Streams.
In
A.Harth,
K.Hose,
R.Schenkel
(Eds.)
Linked
Data
Management,
CRC
Press
2014,
ISBN
9781466582408
! Re-materialize after each window slide
! Use DRed
! IMaRS
% of deletions w.r.t. the content of the window
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
52
53. ContribuQon:
IMaRS
algorithm
• comparison
of
the
average
Qme
needed
to
answer
a
C-‐SPARQL
query,
when
2%
of
the
content
exits
the
window
each
Qme
it
slides,
using
– A
backward
reasoner
on
the
window
content
– DRed
+
standard
SPARQL
on
the
materializaQon
– IMaRS
+
standard
SPARQL
on
the
materializaQon
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
53
54. Finding
• Stream
Reasoning
task
is
feasible
and
the
very
nature
of
streaming
data
offers
opportuniQes
to
op:mise
reasoning
tasks
where
data
is
ordered
by
recency
and
can
be
forgoBen
a€er
a
while
– C-‐SPARQL
Engine
prototype
– IMaRS
conQnuous
incremental
reasoning
algorithm
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
54
55. OpQmizing
for
stream
reasoning
alternaQve
approaches
• DyKnow
– How:
logical
models
of
an
observed
dynamic
system
+
metric
temporal
logics
– Fredrik
Heintz,
Jonas
Kvarnström,
Patrick
Doherty:
Bridging
the
sense-‐reasoning
gap:
DyKnow
-‐
Stream-‐based
middleware
for
knowledge
processing.
Advanced
Engineering
InformaQcs
24(1):
14-‐26
(2010)
• MorphStream
– How:
rewriQng
in
DSMS
languages
(one
at
a
Qme)
– Ref:
Calbimonte,
J.-‐P.,
Corcho,
O.,
&
Gray,
A.
J.
G.
Enabling
ontology-‐based
access
to
streaming
data
sources.
In
ISWC,
2010,
pages
96–111.
• TR-‐OWL
– How:
Truth
maintenance
for
EL++
with
syntacQc
approximaQons
– Ref:
Y.
Ren,
J.
Z.
Pan.
OpQmising
ontology
stream
reasoning
with
truth
maintenance
system.
In
CIKM
(2011)
• ETALIS
– How:
rewriQng
in
prolog
– Ref:
Anicic,
D.,
Rudolph,
S.,
Fodor,
P.,
&
Stojanovic,
N..
Stream
reasoning
and
complex
event
processing
in
ETALIS.
SemanQc
Web,
3(4),
2012,
397–407.
(conQnues
in
the
next
slide)
SR
2015,
Austria
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
55
56. OpQmizing
for
stream
reasoning
alternaQve
approaches
• Sparkwave
– How:
extended
RETE
algorithm
for
windows
and
RDFS
– Ref:
Sparkwave:
ConQnuous
Schema-‐Enhanced
PaBern
Matching
over
RDF
Data
Streams.
Komazec
S,
Cerri
D.
DEBS
2012
• DynamiTE
– How:
Truth
maintenance
for
ρDF
(a
fragment
of
RDFS)
– J.
Urbani,
A.
Margara,
C.
J.
H.
Jacobs,
F.
van
Harmelen,
H.E.
Bal:
DynamiTE:
Parallel
MaterializaQon
of
Dynamic
RDF
Data.
ISWC
(1)
2013:
657-‐672
• STARQL
– How:
rewriQng
on
a
scalable
DSMS
with
Qme-‐series
support
– Ref:
ÖL
Özçep,
R
Möller.
Ontology
Based
Data
Access
on
Temporal
and
Streaming
Data.
Reasoning
Web,
2014
• ASP-‐based
– How:
opQmizing
ASP
for
incremental
and
Qme-‐decaying
programs
– Ref:
hBp://arxiv.org/abs/1301.1392
• The
Backward/Forward
Algorithm
– How:
opQmizing
DRed
– B.
MoQk,
Y.
Nenov,
R.E.F.
Piro,
I.
Horrocks:
Incremental
Update
of
Datalog
MaterialisaQon:
the
Backward/Forward
Algorithm.
AAAI
2015:
1560-‐1568
SR
2015,
Austria
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
56
57. Sub-‐research
quesQons
1. Is
it
possible
extend
the
Seman:c
Web
stack
in
order
to
represent
heterogeneous
data
streams,
conQnuous
queries,
and
conQnuous
reasoning
tasks?
2. Does
the
ordered
nature
of
data
streams
and
the
possibility
to
forget
old
enough
informaQon
allow
to
op:mize
con:nuous
querying
and
con:nuous
reasoning
tasks
so
to
provide
reac:ve
answers
to
large
number
of
concurrent
users
without
forsaking
correctness
or
completeness?
3. Can
SemanQc
Web
and
Machine
Learning
technologies
be
jointly
employed
to
cope
with
the
noisy
and
incomplete
nature
of
data
streams?
4. Are
there
prac:cal
cases
where
processing
data
stream
at
semanQc
level
is
the
best
choice?
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
57
58. Cope
with
the
noisy
and
incomplete
data
• "Noise"
is
reduced
using
DSMS
techniques
• Deduc:ve
stream
reasoning
copes
with
incompleteness
deducing
implicit
facts
• Induc:ve
stream
reasoning
copes
with
"irrepairable"
incompleteness
inducing
missing
facts
D.F.
Barbieri,
D.
Braga,
S.
Ceri,
E.
Della
Valle,
Y.
Huang,
V.
Tresp,
A.
Re•nger,
H.
Wermser:
Deduc:ve
and
Induc:ve
Stream
Reasoning
for
Seman:c
Social
Media
Analy:cs.
IEEE
Intelligent
Systems
25(6):
32-‐41
(2010)
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
58
59. Findings
• A
combina:on
of
deduc:ve
and
induc:ve
stream
reasoning
techniques
can
cope
with
incomplete
and
noisy
data
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
59
60. AlternaQve
approaches
• Stream
Reasoning
with
ProbabilisQc
Answer
Set
Programming
– MaBhias
Nickles,
Alessandra
Mileo:
Web
Stream
Reasoning
Using
ProbabilisQc
Answer
Set
Programming.
RR
2014:
197-‐205
– Anastasios
SkarlaQdis,
Georgios
Paliouras,
Alexander
ArQkis,
George
A.
Vouros:
ProbabilisQc
Event
Calculus
for
Event
RecogniQon.
ACM
Trans.
Comput.
Log.
16(2):
11:1-‐11:37
(2015)
– Anni-‐Yasmin
Turhan,
Erik
Zenker:
Towards
Temporal
Fuzzy
Query
Answering
on
Stream-‐based
Data.
HiDeSt@KI
2015:
56-‐69
SR
2015,
Austria
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
60
61. Sub-‐research
quesQons
1. Is
it
possible
extend
the
Seman:c
Web
stack
in
order
to
represent
heterogeneous
data
streams,
conQnuous
queries,
and
conQnuous
reasoning
tasks?
2. Does
the
ordered
nature
of
data
streams
and
the
possibility
to
forget
old
enough
informaQon
allow
to
op:mize
con:nuous
querying
and
con:nuous
reasoning
tasks
so
to
provide
reac:ve
answers
to
large
number
of
concurrent
users
without
forsaking
correctness
or
completeness?
3. Can
SemanQc
Web
and
Machine
Learning
technologies
be
jointly
employed
to
cope
with
the
noisy
and
incomplete
nature
of
data
streams?
4. Are
there
prac:cal
cases
where
processing
data
stream
at
semanQc
level
is
the
best
choice?
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
61
62. ContribuQon:
Streaming
Linked
Data
Framework
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
62
Stream Bus
Recorder Re-player
AnalyserDecorator
Adapter Publisher VisualizerStream
HTTP
HTTP
Data Source Streaming Linked Data Server HTML5 Browser
Marco
Balduini,
Emanuele
Della
Valle,
Daniele
Dell'Aglio,
Mikalai
Tsytsarau,
Themis
Palpanas,
CrisQan
Confalonieri:
Social
Listening
of
City
Scale
Events
Using
the
Streaming
Linked
Data
Framework.
InternaQonal
SemanQc
Web
Conference
(2)
2013:
1-‐16
64. PracQcal
cases
• 10+
deployments
in
Sensor
Networks
&
Social
media
analyQcs,
e.g.
BOTTARI
Winner of Semantic Web
Challenge 2011
City Data Fusion
Winner of IBM
faculty award 2013
M.
Balduini,
I.
Celino,
D.
Dell’Aglio,
E.
Della
Valle,
Y.
Huang,
T.
Lee,
S.-‐H.
Kim,
V.
Tresp:
BOTTARI:
An
augmented
reality
mobile
applica:on
to
deliver
personalized
and
loca:on-‐based
recommenda:ons
by
con:nuous
analysis
of
social
media
streams.
J.
Web
Sem.
16:
33-‐41
(2012)
Social Listener
M.Balduini,
E.Della
Valle,
M.Azzi,
R.Larcher,
F.Antonelli,
and
P.Ciuccarelli:
CitySensing:
Fusing
City
Data
for
Visual
Storytelling.
IEEE
MulQMedia
22(3):
44-‐53
(2015)
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
64
65. Findings
1. The
Seman:c
Web
stack
can
be
extended
so
to
incorporate
streaming
data
as
a
first
class
ciQzen
– RDF
stream
data
model
– Con:nuous
SPARQL
syntax
and
semanQcs
– Con:nuous
deduc:ve
reasoning
semanQcs
2. Stream
Reasoning
task
is
feasible
and
the
very
nature
of
streaming
data
offers
opportuniQes
to
op:mise
reasoning
tasks
where
data
is
ordered
by
recency
and
can
be
forgoBen
a€er
a
while
– IMaRS
conQnuous
incremental
reasoning
algorithm
– C-‐SPARQL
Engine
prototype
3. A
combinaQon
of
deduc:ve
and
induc:ve
stream
reasoning
techniques
can
cope
with
incomplete
and
noisy
data
4. There
are
applica:on
domains
where
Stream
Reasoning
offers
an
adequate
soluQon
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
65
66. Open
issues
1. The
Seman:c
Web
stack
can
be
extended
– "NavigaQng
the
Chasm
between
the
Scylla
of
PracQcal
ApplicaQons
and
the
Charybdis
of
TheoreQcal
Approaches"
A.
Bernstein,
2015
2. Stream
Reasoning
task
is
feasible
– It's
Qme
to
start
removing
assumpQons
• knowledge
does
not
change
• background
data
does
not
change
– OBDA
for
SQL
≠
OBDA
for
conQnuous
querying
3. Stream
reasoning
can
cope
with
incomplete
and
noisy
data
– Theory
is
needed!
4. There
are
applica:on
domains
where
Stream
Reasoning
offers
an
adequate
soluQon
– Rigorous
quanQtaQve
comparaQve
research
is
needed
UiO,
Norway
-‐
3.11.2015
@manudellavalle
-‐
hBp://emanueledellavalle.org
66
67. AdverQsements
:-‐P
• Check
out
my
PhD
thesis
– hBp://dare.ubvu.vu.nl/handle/1871/53293
– Chapter
1:
IntroducQon
• The
content
of
this
presentaQon
– Chapter
8:
conclusions
• A
review
of
stream
reasoning
approaches
updated
in
spring
2015
• Put
an
"I
like"
to
Stream
Reasoning
on
Facebook
– hBps://www.facebook.com/streamreasoning
@manudellavalle
-‐
hBp://emanueledellavalle.org
UiO,
Norway
-‐
3.11.2015
67
68. Thank
you!
Any
QuesQon?
Emanuele
Della
Valle
DEIB
-‐
Politecnico
di
Milano
emanuele.dellavalle@polimi.it
hBp://emanueledellavalle.org
University
of
Olso,
Norway
-‐
3.11.2015