Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Managing Completeness of Web Data
1. Managing Completeness of Web Data
Fariz Darari
PhD Supervisor: Werner Nutt
Supported by the project MAGIC, funded by the province of Bolzano
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 1 / 38
6. Completeness statements are already there
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 6 / 38
7. However . . .
Completeness statements are available
but only in natural language
Unclear what data completeness & query completeness mean
No techniques to check whether data completeness entails
query completeness
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 7 / 38
8. Solution Ideas
Completeness statements are available
but only in natural language
Solution: RDF-ize completeness statements
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38
9. Solution Ideas
Completeness statements are available
but only in natural language
Solution: RDF-ize completeness statements
Unclear what data completeness & query completeness mean
Solution: Formalize data completeness & query completeness
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38
10. Solution Ideas
Completeness statements are available
but only in natural language
Solution: RDF-ize completeness statements
Unclear what data completeness & query completeness mean
Solution: Formalize data completeness & query completeness
No techniques to check whether data completeness entails
query completeness
Solution: Develop techniques to check whether data completeness
entails query completeness
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 8 / 38
14. Story: Incomplete Data Source
An incomplete data source of Reservoir Dogs,
Gdbp = (Ga
dbp, Gi
dbp):
Ga
dbp = {(resDogs, dir, tarantino)}
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 12 / 38
15. Story: Incomplete Data Source
An incomplete data source of Reservoir Dogs,
Gdbp = (Ga
dbp, Gi
dbp):
Gi
dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)}
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 13 / 38
16. Story: Completeness Statement
Ga
dbp = {(resDogs, dir, tarantino)}
Gi
dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)}
From (Ga
dbp, Gi
dbp), we can say that DBpedia is complete
for movies directed by Tarantino:
Cdir = Compl((?m, dir, tarantino) | ∅)
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 14 / 38
17. Story: Completeness Statement
Ga
dbp = {(resDogs, dir, tarantino)}
Gi
dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)}
From (Ga
dbp, Gi
dbp), we can say that DBpedia is complete
for movies directed by Tarantino:
Cdir = Compl((?m, dir, tarantino) | ∅)
However, it is not complete for actors in movies directed by Tarantino:
Cact = Compl((?m, act, ?a) | (?m, dir, tarantino))
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 14 / 38
18. Story: Query Completeness
Ga
dbp = {(resDogs, dir, tarantino)}
Gi
dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)}
Consequently, when we ask for all movies directed by Tarantino
over DBpedia:
Qdir = ({?m}, {(?m, dir, tarantino)})
the query completeness Compl(Qdir ) is obtained.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 15 / 38
19. Story: Query Completeness
Ga
dbp = {(resDogs, dir, tarantino)}
Gi
dbp = {(resDogs, dir, tarantino), (resDogs, act, tarantino)}
However, if we ask for all movies directed by and starring Tarantino:
Qdir+act = ({?m}, {(?m, dir, tarantino), (?m, act, tarantino)})
the query completeness Compl(Qdir+act ) is not obtained.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 16 / 38
20. Incomplete Data Source
Definition (Incomplete Data Source)
An incomplete data source is a pair of two graphs
G = (Ga, Gi), where Ga ⊆ Gi.
We call Ga the available graph and Gi the ideal graph.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 17 / 38
21. Completeness Statement
Definition (Completeness Statement)
Let P1 be a non-empty BGP and P2 a BGP.
A completeness statement is defined as
Compl(P1 | P2)
where we call P1 the pattern and P2 the condition of the statement.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 18 / 38
22. Satisfaction of Completeness Statements
To a statement
C = Compl(P1 | P2),
we associate the CONSTRUCT query
QC = (P1, P1 ∪ P2).
Then, we say:
C is satisfied by an incomplete data source G = (Ga, Gi),
written G |= C, if
QC Gi ⊆ Ga
.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 19 / 38
23. Completeness Statements in RDF
Cact = Compl((?m, act, ?a) | (?m, dir, tarantino))
lv:dataset a void:Dataset;
c:hasComplStmt lv:csAct.
lv:csAct c:hasPattern [c:subject [c:varName "m"];
c:predicate s:actor;
c:object [c:varName "a"]];
c:hasCondition [c:subject [c:varName "m"];
c:predicate s:director;
c:object lmdb:Quentin_Tarantino].
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 20 / 38
24. Query Completeness
Definition (Query Completeness)
Let Q be a query. We write
Compl(Q)
to say that Q is complete.
An incomplete data source G = (Ga, Gi) satisfies Compl(Q),
written G |= Compl(Q), if
Q Gi = Q Ga .
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 21 / 38
25. Completeness Entailment
Problem Definition (Completeness Entailment)
Let C be a set of completeness statements and Q a query.
We say that C entails the completeness of Q, written
C |= Compl(Q),
if any incomplete data source satisfying C also satisfies Compl(Q).
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 22 / 38
26. Intuition: Completeness Entailment
Consider the set Cdir,act = { Cdir , Cact } of completeness statements
and the query Qdir+act = ({ ?m }, Pdir+act ) where
Pdir+act = { (?m, dir, tarantino), (?m, act, tarantino) }.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 23 / 38
27. Intuition: Completeness Entailment
Consider the set Cdir,act = { Cdir , Cact } of completeness statements
and the query Qdir+act = ({ ?m }, Pdir+act ).
˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) }
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 24 / 38
28. Intuition: Completeness Entailment
Consider the set Cdir,act = { Cdir , Cact } of completeness statements
and the query Qdir+act = ({ ?m }, Pdir+act ).
˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) }
Therefore,
QCdir ˜Pdir+act
∪ QCact ˜Pdir+act
=
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38
29. Intuition: Completeness Entailment
Consider the set Cdir,act = { Cdir , Cact } of completeness statements
and the query Qdir+act = ({ ?m }, Pdir+act ).
˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) }
Therefore,
QCdir ˜Pdir+act
∪ QCact ˜Pdir+act
=
{ ( ˜m, dir, tarantino), ( ˜m, act, tarantino) } =
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38
30. Intuition: Completeness Entailment
Consider the set Cdir,act = { Cdir , Cact } of completeness statements
and the query Qdir+act = ({ ?m }, Pdir+act ).
˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) }
Therefore,
QCdir ˜Pdir+act
∪ QCact ˜Pdir+act
=
{ ( ˜m, dir, tarantino), ( ˜m, act, tarantino) } =
˜Pdir+act .
Thus,
Cdir,act |= Compl(Qdir+act ).
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 25 / 38
31. Prototypical Graph
˜Pdir+act = { ( ˜m, dir, tarantino), ( ˜m, act, tarantino) }
Definition (Prototypical Graph)
Let Q = (W, P) be a query.
The freeze mapping ˜id is defined as a mapping
from each variable ?v in P to a new IRI ˜v.
Instantiating the graph pattern P with ˜id yields the graph
˜P := ˜id P,
which we call the prototypical graph of Q.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 26 / 38
32. Transfer Operator
QCdir ˜Pdir+act
∪ QCact ˜Pdir+act
Definition (Transfer Operator)
For any set C of completeness statements and a graph G,
we define the transfer operator TC that computes the union
of the evaluation over G of all CONSTRUCT queries
of the statements in C:
TC(G) =
C∈ C
QC G
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 27 / 38
33. Completeness Entailment Theorem
˜Pdir+act = TCdir,act
(˜Pdir+act )
Theorem (Completeness of Basic Queries)
Let C be a set of completeness statements and
Q = (W, P) a basic query. Then,
C |= Compl(Q) if and only if ˜P = TC(˜P).
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 28 / 38
34. Query Class: DISTINCT Queries
Give us all Oscar-winning things:
Qawd = (Wawd , Pawd )d =
({?m}, { (?m, award, oscar), (?m, award, ?aw) })d
Complete for all Oscar-winning things:
Cos = Compl((?m, award, oscar) | ∅)
{ Cos } |= Compl(Qawd ) holds?
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 29 / 38
35. Query Class: OPT Queries
Give us all movies, and their awards, if any:
Qmaw = ({ ?m, ?aw }, ((?m, a, Movie) OPT (?m, award, ?aw)))
Complete for all movies and their awards:
Caw = Compl((?m, a, Movie), (?m, award, ?aw) | ∅)
{ Caw } |= Compl(Qmaw ) holds?
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 30 / 38
36. Query Class: Queries under RDFS Semantics
Give us all films:
Qfilm = ({ ?m }, { (?m, a, Film) })
Complete for all movies:
Cmovie = Compl((?m, a, Movie) | ∅)
Films are the same as movies:
Sfm = {(Film, subclass, Movie), (Movie, subclass, Film)}
{ Cmovie } |= Compl(Qfilm) wrt. Sfm holds?
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 31 / 38
40. Conclusions
Completeness statements can now be represented in RDF
We know how completeness statements can entail query
completeness in different query classes and
different settings of completeness statements
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 35 / 38
41. Future Work
Completeness statements for queries with negation
Completeness statements as session annotations
for RDF streams
Statistical completeness reasoning
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 36 / 38
42. Publications
Fariz Darari, Werner Nutt, Giuseppe Pirrò, Simon Razniewski: Completeness
Statements about RDF Data Sources and Their Use for Query Answering.
ISWC 2013.
Fariz Darari, Radityo Eko Prasojo, Werner Nutt: CORNER: A Completeness
Reasoner for SPARQL Queries Over RDF Data Sources. ESWC Posters and
Demos 2014.
Fariz Darari, Simon Razniewski, Werner Nutt: Bridging the Semantic Gap
between RDF and SPARQL using Completeness Statements. ISWC Posters
and Demos 2014.
Fariz Darari, Radityo Eko Prasojo, Werner Nutt: Expressing No-Value
Information in RDF. ISWC Posters & Demos 2015.
The latest results (timestamped statements and efficient completeness
reasoning with 1 million statements) have been submitted to a journal.
Fariz Darari (unibz) Managing Completeness of Web Data Oct 20, 2015 37 / 38