RSP-QL*: Querying Data-Level Annotations in RDF Streams
1. RSP-QL*: Querying Data-Level Annotations in
RDF Streams
Robin Keskisärkkä
April 17, 2019
Department of Computer and Information Science
Linköping University
3. Semantic Web
• Linked Data
• Builds upon standardized web technologies including, e.g., RDF,
SPARQL, and OWL
• RDF Stream Processing
• A growing interest in processing streaming RDF data
• RSP W3C Community Group
1
5. Resource Description Framework (RDF)
An RDF triple is a tuple ⟨s, p, o⟩ ∈ (I ∪ B) × I × (I ∪ B ∪ L), where I
(IRIs), B (blank nodes), and L (literals) are pairwise disjoint sets. An
RDF graph is a set of RDF triples.
2
6. RDF: Example
<Jack> a <Person> .
<Jack> <knows> <Joe> .
<Joe> a <Person> .
<Joe> <knows> <jane> .
<Jane> a <Person> .
<Jack> <hasIncome> "High" .
<Jane> <hasIncome> "High" .
3
7. RDF: Named Graphs
A named RDF graph is a pair (n, G), where n is the graph name and G
is an RDF graph. An RDF dataset consists of a default graph and one
or more named graphs: D = {G0, (n1, G1), (n2, G2), ..., (ni, Gi)}
4
8. SPARQL
• SPARQL 1.1 is the standard query language for querying RDF data
• Provides an SQL-like syntax
• Based on basic graph patterns (BGPs), which are similar to
Turtle statements, but where the resources can be replaced by
variables. The pattern is then matched against the RDF graphs.
5
9. SPARQL: Example
PREFIX : <http://example.org#>
SELECT ?person
FROM <http://my/base/data>
FROM NAMED <http://my/namedgraph/>
WHERE {
?person a :Person .
GRAPH <http://my/namedgraph/> {
?person :knows :Jack .
}
}
6
10. RDF Stream Processing (RSP)
• Growing interest in processing and querying streaming RDF data
• Several RSP models have been proposed over the last decade
• CQELS
• C-SPARQL
• SPARQLstream
• ...
• Temporal windows over streams
• RSP-QL represents a community effort towards a
standardization of RDF stream semantics and the query model
and syntax1
1https://w3id.org/rsp
7
12. Statement-Level Metadata in RDF
• We often need to attach some metadata to our data to capture
aspects such as:
• uncertainty, probability, errors, distribution parameters,
provenance information, descriptions, classifications, temporal
information, ...
• This can be managed on the statement level in RDF, but not in a
straightforward way
• For streaming scenarios, the representation needs to be efficient
8
15. An alternative approach: RDF* and SPARQL*
An RDF* triple is recursively defined as follows:
• any RDF triple is an RDF* triple, and;
• a triple with an RDF* triple in the subject or object position is an
RDF* triple.
A BGP* extends the notion of BGPs by adding the possibility to
recursively nest triple patterns in the subject or object position.
11
16. An alternative approach: RDF* and SPARQL*
<<:bob foaf:age 23>> dc:creator :crawler1 ;
dc:source <http://wiki/text.html> .
SELECT ?person ?age ?src
WHERE {
<<?person foaf:age ?age>> dc:source ?src .
}
12
19. RDF* Streams
A timestamped RDF* graph TG∗
is an RDF* dataset that contains:
(i) exactly one named graph ⟨n, G∗
⟩, and
(ii) at least one regular RDF triple ⟨n, p, t⟩ in the default graph, where
p is a timestamp predicate and t is a timestamp.
14
20. RDF* Stream: Example of an element in a stream
_:g1 {
<<:bob foaf:age 23>> dc:creator :crawler1 ;
dc:source <http://wiki/text.html> .
} .
_:g1 prov:generatedAtTime "2019−04−17T09:10:01Z"^^xsd:dateTime .
15
21. RDF* Streams
An RDF* stream is defined as a potentially unbounded sequence of
timestamped RDF* graphs, which are ordered based on an
associated timestamp in non-decreasing time order.
S = ((t1, TG∗
1 ), (t2, TG∗
2 ), (t3, TG∗
3 ), ...), such that ti ≤ ti+1.
16
22. Windows over stream
A time-based window W is a function that returns a contiguous set
of temporal RDF* graphs from an RDF* stream S in a time interval
(l, u] such that: W(S, l, u) = {TG∗
| (t, TG∗
) ∈ S ∧ t ∈ (l, u]}.
A time-varying dataset TVD is a function that takes as input an RDF*
stream S, a window width α, a slide parameter β, a start time t0, for
and a time instant t, and returns a set of time-based window
TVD(S, l, u) such that u = max(t0 + α + β × n) < t | n ∈ N and
l = u − α.
17
23. Core Fragment of the Semantics of RSP-QL*
RSP-QL* patterns are defined recursively as follows2
):
1. Any BGP* is an RSP-QL* pattern.
2. If n ∈ (V ∪ I) and P is a RSP-QL* pattern, then (GRAPH n P) is an
RSP-QL* pattern.
3. If n ∈ (V ∪ I) and P is a RSP-QL* pattern, then (WINDOW n P) is
an RSP-QL* pattern.
4. If P1 and P2 are RSP-QL* pattern, then the expressions (P1 AND
P2), (P1 OPT P2), and (P1 UNION P2) are RSP-QL* patterns.
5. If P is a BGP* and R is an RSP-QL* built-in condition, then the
expression (P FILTER R) is a RSP-QL* pattern.
6. If tp is a triple* pattern and ?v is a variable, then (tp AS ?v) is an
RSP-QL* pattern.
2Pérez, J., Arenas, M. & Gutierrez, C. (2009). Semantics and Complexity of SPARQL. ACM
Trans. Database Syst., 34, 16:1–16:45. doi: 10.1145/1567274.1567278
18
24. Semantics of RSP-QL*
For every window w, we write D(w) to denote the dataset that is the
merge of all timestamped graphs in the window:
D(w) = (G, (n1, G1), ..., (ni, Gi)), assuming
w = {(G1, (n1, G′
1)), ..., (Gi, (ni, G′
i))} and G =
∪
(Gdef,(u,G′))∈w Gdef
19
25. Semantics of RSP-QL*
The evaluation of a pattern P over the dataset D and a set of named
windows W = {(u1, w1), ..., (un, wn))}, denoted by P D,W
G , can now be
defined as follows:
• If P is (WINDOW u P’) and (u, w) ∈ W, then
P D,W
G = P′ D′
,∅
G′ where D′
= D(w) and G′
is the default graph of
D′
• If P is (WINDOW ?x P’), then
P D,W
G =
∪
(u,w)∈W WINDOW u P’ D,W
G
20
26. RSP-QL*: Example
REGISTER STREAM <http://my/trusted/temp−stream>
SELECT ?room ?temp ?time ?src
FROM <http://my/base/data>
FROM NAMED WINDOW :w ON <http://temp−stream> [RANGE PT30S STEP PT5S]
WHERE {
?src a :ValidatedSource .
WINDOW :w {
GRAPH ?g {
<<:bob foaf:age ?age>> dc:source ?src .
}
?g prov:generatedAtTime ?time .
}
}
21
27. Summary
• Provides an efficient and easy-to-read alternative for
statement-level metadata
• Leverage the concepts introduced in RDF* and SPARQL* for
RSP-QL
• The semantics of RSP-QL can be extended to support RSP-QL* in
a straightforward way
22
28. Future Work
• Provide additional functionality on top of the RDF* model
(e.g., define operators for combining values under a specific
uncertainty model)
• Prototype based on RSP-QL* semantics
• Evaluation of performance
23