OXFORD 2013, Presentation on the query rewriting approach taken in ontop/Quest. Separating reasoning with respect to hierarchies and existential constants using mapping transformation techniques and a specialised query rewriting algorithm
Take control of your SAP testing with UiPath Test Suite
OXFORD'13 Optimising OWL 2 QL query rewriring
1. .
.
.
.
Ontop at Work
Mariano Rodríguez-Muro1,
Roman Kontchakov2
Michael Zakharyaschev2
1 Faculty of Computer Science, Free
University of Bozen-Bolzano, Italy
2 Department of Computer Science
and Information Systems,
Birkbeck, University of London, U.K.
May 22th, 2013
2. .
.
.
OBDA: What is it?
.
Loosely speaking...
.
.
.
Using ontologies to access of data.
Ontop at Work 2 / 29
3. .
.
.
OBDA: What is it?
.
Loosely speaking...
.
.
.
Using ontologies to access of data.
(Virtual) ABox
User
Query
Ontology
(TBox)
Mappings
OBDA System
RBMS
Data source
Ontop at Work 2 / 29
4. .
.
.
OBDA: What is it?
.
Loosely speaking...
.
.
.
Using ontologies to access of data.
(Virtual) ABox
User
Query
Ontology
(TBox)
Mappings
OBDA System
RBMS
Data source
Our focus are OWL 2 QL ontologies, since they are tailored to
handle very large amounts of data by means of query rewriting
techniques.
Ontop at Work 2 / 29
5. .
.
.
Query Answering by Query rewriting
.
Objective
.
.
.
Given a query Q over the ontology T derive a query Q′
over the database D that preserves the semantics of T.
Ontop at Work 3 / 29
6. .
.
.
Query Answering by Query rewriting
.
Objective
.
.
.
Given a query Q over the ontology T derive a query Q′
over the database D that preserves the semantics of T.
.
.
Consider a TBox T
Movie ≡ ∃title, Movie ⊑ ∃year,
Movie ≡ ∃cast, ∃cast−
⊑ Person
Actor ⊑ Person Actress ⊑ Person,
Producer ⊑ Person, Director ⊑ Person,
Writer ⊑ Person, Editor ⊑ Person.
Ontop at Work 3 / 29
8. .
.
.
Example
.
.
The Database D: Two DB relations title[m, t, y] and
castinfo[p, m, r].
The mapping M (logical form, think R2RML):
Movie(m) ← title(m, t, y), title(m, t) ← title(m, t, y),
year(m, y) ← title(m, t, y), cast(m, p) ← castinfo(p, m, r),
Person(p) ← castinfo(p, m, r),
Actor(p) ← castinfo(p, m, ”c1”) · · ·
Editor(p) ← castinfo(p, m, ”c6”).
Ontop at Work 4 / 29
9. .
.
.
The classic OBDA architecture
.
.
CQ q .
ontology T
. FO q′
.
mapping
. SQL
.
data D
.
ABox A
.
+
.
rewriting
. +
.
unfolding
.
+
.
ABox virtualisation
Stages in the classic OBDA approach:
. Rewriting w.r.t. T,
. Unfolding w.r.t. M,
. Execution over D.
Ontop at Work 5 / 29
10. .
.
.
The classic OBDA architecture
.
.
CQ q .
ontology T
. FO q′
.
mapping
. SQL
.
data D
.
ABox A
.
+
.
rewriting
. +
.
unfolding
.
+
.
ABox virtualisation
Stages in the classic OBDA approach:
. Rewriting w.r.t. T,
. Unfolding w.r.t. M,
. Execution over D.
.
.
Unfolding and Mappings are ignored in most OBDA literature
Ontop at Work 5 / 29
11. .
.
.
Example: Rewriting
Given the query Q
q(x) ← Person(x)
Gives the rewriting
q(x) ← Person(x)
q(x) ← cast(z, x)
q(x) ← Actor(x)
. . .
q(x) ← Editor(x)
Ontop at Work 6 / 29
12. .
.
.
Example: Unfolding
Given the query Q
q(x) ← Person(x)
Gives the rewriting
q(x1) ← castinfo(x1, m, r)
q(x2) ← castinfo(x2, m, r)
q(x3) ← castinfo(x3, m, ”c1”)
. . .
q(x8) ← castinfo(x8, m, ”c6”)
Ontop at Work 7 / 29
13. .
.
.
Issues
The issues with these rewritings are:
. Large size (n1 ∗ . . . ∗ n2)
. Largely redundant (w.r.t. query containment)
Ontop at Work 8 / 29
14. .
.
.
Issues
The issues with these rewritings are:
. Large size (n1 ∗ . . . ∗ n2)
. Largely redundant (w.r.t. query containment)
In the literature we find two solutions:
. Encoding the rewriting as a Datalog program. For example,
given the query:
q(x, y) ← Person(x), Person(y), cast(m, x), cast(m, z)
Ontop at Work 8 / 29
15. .
.
.
Issues
The issues with these rewritings are:
. Large size (n1 ∗ . . . ∗ n2)
. Largely redundant (w.r.t. query containment)
In the literature we find two solutions:
. Encoding the rewriting as a Datalog program. For example,
given the query:
q(x, y) ← Person(x), Person(y), cast(m, x), cast(m, z)
we generate the rewriting:
q(x, y) ← Person(x), Person(y), cast(m, x), cast(m, z)
Person(x) ← cast(m, x)
Person(x) ← Actor(x)
. . .
Person(x) ← Edtior(x)
Ontop at Work 8 / 29
16. .
.
.
Issues
The issues with these rewritings are:
. Large size (n1 ∗ . . . ∗ n2)
. Largely redundant (w.r.t. query containment)
In the literature we find two solutions:
. Encoding the rewriting as a Datalog program.
.
But...
.
.
.
The query still needs to be unfolded into an SQL query. There are
two choices here:
. Generate SQL queries with nested UNIONs. Very bad for
performance.
. Expand into a UCQ. Back to square 1.
Ontop at Work 9 / 29
17. .
.
.
Issues (cont.)
The issues with these rewritings are:
. Large size (n1 ∗ . . . ∗ n2)
. Largely redundant (w.r.t. query containment)
Ontop at Work 10 / 29
18. .
.
.
Issues (cont.)
The issues with these rewritings are:
. Large size (n1 ∗ . . . ∗ n2)
. Largely redundant (w.r.t. query containment)
. Using Query Containment to clean the output.
Ontop at Work 10 / 29
19. .
.
.
Issues (cont.)
The issues with these rewritings are:
. Large size (n1 ∗ . . . ∗ n2)
. Largely redundant (w.r.t. query containment)
. Using Query Containment to clean the output. For example,
to detect that this:
q(x1) ← castinfo(x1, m, r)
q(x2) ← castinfo(x2, m, r)
q(x3) ← castinfo(x3, m, ”c1”)
. . .
q(x8) ← castinfo(x8, m, ”c6”)
Ontop at Work 10 / 29
20. .
.
.
Issues (cont.)
The issues with these rewritings are:
. Large size (n1 ∗ . . . ∗ n2)
. Largely redundant (w.r.t. query containment)
. Using Query Containment to clean the output. For example,
to detect that this:
q(x1) ← castinfo(x1, m, r)
q(x2) ← castinfo(x2, m, r)
q(x3) ← castinfo(x3, m, ”c1”)
. . .
q(x8) ← castinfo(x8, m, ”c6”)
can be simplified to
q(x1) ← castinfo(x1, m, r)
Ontop at Work 10 / 29
21. .
.
.
Issues (cont.)
The issues with these rewritings are:
. Large size (n1 ∗ . . . ∗ n2)
. Largely redundant (w.r.t. query containment)
. Using Query Containment to clean the output.
.
But...
.
.
.
. Query containment is an extremely expensive operation.
. We are working with large sets of queries.
Ontop at Work 11 / 29
22. .
.
.
Roots of the problem
There are 3 main reasons for large CQ rewritings and unfoldings:
Ontop at Work 12 / 29
23. .
.
.
Roots of the problem
There are 3 main reasons for large CQ rewritings and unfoldings:
(E) Sub-queries of q with existentially quantified variables
can be folded in many different ways to match the
canonical model (existential trees), e.g.,
Person ⊑ ∃hasFather.Person
and the query
q(x) ← hasFather(x, y), hasFather(y, z)
Ontop at Work 12 / 29
24. .
.
.
Roots of the problem
There are 3 main reasons for large CQ rewritings and unfoldings:
(E) Sub-queries of q with existentially quantified variables
can be folded in many different ways to match the
canonical model (existential trees), e.g.,
Person ⊑ ∃hasFather.Person
and the query
q(x) ← hasFather(x, y), hasFather(y, z)
(H) The concepts and roles for atoms in q can have many
sub-concepts and sub-roles according to T,
Ontop at Work 12 / 29
25. .
.
.
Roots of the problem
There are 3 main reasons for large CQ rewritings and unfoldings:
(E) Sub-queries of q with existentially quantified variables
can be folded in many different ways to match the
canonical model (existential trees), e.g.,
Person ⊑ ∃hasFather.Person
and the query
q(x) ← hasFather(x, y), hasFather(y, z)
(H) The concepts and roles for atoms in q can have many
sub-concepts and sub-roles according to T,
(M) The mapping M can have multiple definitions of the
ontology terms,
Most of the proposed rewriting techniques try to tame (E).
Ontop at Work 12 / 29
26. .
.
.
More about (E)
More about (E)
. it is in theory incurable
. it is independent of (H) and (M)
Ontop at Work 13 / 29
27. .
.
.
More about (E)
More about (E)
. it is in theory incurable
. it is independent of (H) and (M)
However
. Rewriting algorithms deal with (E) and (H) at the same time
. Real-world Qs and T’s generate few queries when dealing with
(E) in isolation.
. Even artificially constructed Qs and T’s become simple.
Ontop at Work 13 / 29
28. .
.
.
More about (E)
More about (E)
. it is in theory incurable
. it is independent of (H) and (M)
However
. Rewriting algorithms deal with (E) and (H) at the same time
. Real-world Qs and T’s generate few queries when dealing with
(E) in isolation.
. Even artificially constructed Qs and T’s become simple.
.
.
The strongest issues in query rewriting are (H) and (M)
Ontop at Work 13 / 29
29. .
.
.
More about (E)
More about (E)
. it is in theory incurable
. it is independent of (H) and (M)
However
. Rewriting algorithms deal with (E) and (H) at the same time
. Real-world Qs and T’s generate few queries when dealing with
(E) in isolation.
. Even artificially constructed Qs and T’s become simple.
.
.
The strongest issues in query rewriting are (H) and (M)
In Ontop we deal with (H) and (M) separately from (E). We do it
through T-mappings and TreeWitness rewritings.
Ontop at Work 13 / 29
30. .
.
.
Dealing with (H) and (M): T-Mappings
A T-mapping MT is a transformation of M that enforces all (H)
entailments (H-completeness), formally,
M |= A(c) and T |= A ⊑ B → MT |= B(c)
Ontop at Work 14 / 29
31. .
.
.
Dealing with (H) and (M): T-Mappings
A T-mapping MT is a transformation of M that enforces all (H)
entailments (H-completeness), formally,
M |= A(c) and T |= A ⊑ B → MT |= B(c)
.
T-mapping example 1
.
.
.
Consider two DB relations title[m, t, y] and castinfo[p, m, r] and an
ontology MO describing the film domain as follows:
Movie ≡ ∃cast
Let M be the following mappings:
Movie(m) ← title(m, t, y),
cast(m, p) ← castinfo(p, m, r).
Ontop at Work 14 / 29
32. .
.
.
Dealing with (H) and (M): T-Mappings
A T-mapping MT is a transformation of M that enforces all (H)
entailments (H-completeness), formally,
M |= A(c) and T |= A ⊑ B → MT |= B(c)
.
T-mapping example 1 (domain/range)
.
.
.
Consider two DB relations title[m, t, y] and castinfo[p, m, r] and an
ontology MO describing the film domain as follows:
Movie ≡ ∃cast
Let M be the following mappings:
Movie(m) ← title(m, t, y),
cast(m, p) ← castinfo(p, m, r).
Movie(m) ← castinfo(p, m, r).
Ontop at Work 15 / 29
33. .
.
.
T-Mappings: Example 2
.
T-mappings example 2 (hierarchies)
.
.
.
Consider a TBox T
Actor ⊑ Person Actress ⊑ Person,
Producer ⊑ Person, Director ⊑ Person,
Writer ⊑ Person, Editor ⊑ Person.
The mapping M:
Actor(p) ← castinfo(p, m, ”c1”) · · ·
Editor(p) ← castinfo(p, m, ”c6”).
Ontop at Work 16 / 29
34. .
.
.
T-Mappings: Example 2
.
T-mappings example 2 (hierarchies)
.
.
.
Consider a TBox T
Actor ⊑ Person Actress ⊑ Person,
Producer ⊑ Person, Director ⊑ Person,
Writer ⊑ Person, Editor ⊑ Person.
The mapping M:
Person(p) ← castinfo(p, m, ”c1”) · · ·
Person(p) ← castinfo(p, m, ”c6”).
Ontop at Work 17 / 29
35. .
.
.
Optimising T-mappings
.
.
The objective of T-mapping allow to deal with hierarchical reasoning
(H) at the level of the unfolding. At this point, we can exploit
. DB dependencies and
. SQL expressivity to reduce and often the exponential growth
coming form (H) and (M).
Ontop at Work 18 / 29
37. .
.
.
Optimising with Dependencies
A first optimisation is Query Containment (w.r.t. dependencies)
.
Example
.
.
.
Consider the previous example, since T |= ∃cast ⊑ Movie, the
T-mapping contains:
Movie(m) ← title(m, t, y),
Movie(m) ← castinfo(p, m, r).
Ontop at Work 19 / 29
38. .
.
.
Optimising with Dependencies
A first optimisation is Query Containment (w.r.t. dependencies)
.
Example
.
.
.
Consider the previous example, since T |= ∃cast ⊑ Movie, the
T-mapping contains:
Movie(m) ← title(m, t, y),
Movie(m) ← castinfo(p, m, r).
The latter rule is redundant since IMDb contains the foreign key
title(m, t, y) ⇝ title(p, m, r)
This step is crucial to reduce the growth due to inferences related to
domain and range.
Ontop at Work 19 / 29
40. .
.
.
Optimising with SQL expressivity
Observation. The only means for perfect reformulations to deal
with (H) is through disjunction (UNION). DBMS are not good
planning UNIONs.
Ontop at Work 20 / 29
41. .
.
.
Optimising with SQL expressivity
Observation. The only means for perfect reformulations to deal
with (H) is through disjunction (UNION). DBMS are not good
planning UNIONs.
However, At the level of the unfolding and mappings, we have full
SQL expressivity (e.g., Disjunction (OR), inequalities, etc.).
Ontop at Work 20 / 29
42. .
.
.
Optimising with SQL expressivity
Observation. The only means for perfect reformulations to deal
with (H) is through disjunction (UNION). DBMS are not good
planning UNIONs.
However, At the level of the unfolding and mappings, we have full
SQL expressivity (e.g., Disjunction (OR), inequalities, etc.).
.
Objective
.
.
.
Given a T-mapping, define mapping transformations that
entail the same ABox using less mappings while ensuring
that the encoding used is efficient during execution.
Ontop at Work 20 / 29
43. .
.
.
Optimising with SQL expressivity
Use OR and inequalities to re-express mappings for hierarchies and
discriminant columns.
Ontop at Work 21 / 29
44. .
.
.
Optimising with SQL expressivity
Use OR and inequalities to re-express mappings for hierarchies and
discriminant columns.
.
Dealing with discriminant columns
.
.
.
For example, the mapping M for IMDb and MO contains six rules
for sub-concepts of Person:
Person(p) ← castinfo(p, m, ”c1”)
· · ·
Person(p) ← castinfo(p, m, ”c6”)
Ontop at Work 21 / 29
45. .
.
.
Optimising with SQL expressivity
Use OR and inequalities to re-express mappings for hierarchies and
discriminant columns.
.
Dealing with discriminant columns
.
.
.
For example, the mapping M for IMDb and MO contains six rules
for sub-concepts of Person:
Person(p) ← castinfo(p, m, ”c1”)
· · ·
Person(p) ← castinfo(p, m, ”c6”)
These can be reduced to a single rule:
Person(p) ← castinfo(c, p, m, r), (r = c1) ∨ · · · ∨ (r = c6).
Ontop at Work 21 / 29
46. .
.
.
The architecture of Ontop
.
.
CQ q .
ontology T
. UCQ qtw
.
T-mapping
.
mapping M
.
dependencies Σ
. SQL
.
data D
.
ABox A
.
H-complete ABox A
.
+ .
tw-rewriting
. +
.
unfolding
.
+
.
ABox virtualisation
.
+
.
ABox virtualisation
.
+
.
ABox completion
.
+
.
completion
.
SQO
.
SQO
.
Highlights: (H) and (M) dealt with T-mappings, rewriting for
(H)-complete ABoxes, extensive use of SQO over the unfolding.
Ontop at Work 22 / 29
47. .
.
.
Other Optimisations in Ontop
We also apply other important optimisations during system setup
and at query time, the most important:
Equivalence Simplification Simplify the ontology vocabulary w.r.t.
equivalence (keep one representative of each
equivalence class).
Semantic Query Optimisation Optimise each query generated
individually... see next slides.
Emptiness indexes Keeping track of empty predicates
Ontop at Work 23 / 29
48. .
.
.
Results
A summary of the results we have observed using this architecture:
. Mappings per class/property are few
. Query rewritings are small
. SQL queries generated like this often correspond to what a
human expert would have generated.
. Query execution of SPARQL with entailments is fast, often
much faster than in triple stores.
.
.
Query rewriting can be done efficiently
Ontop at Work 24 / 29
50. .
.
.
Summary
Results so far
. Efficiently dealt with exponential growth from (H) and (M)
. Use of dependencies and CQC/SQO to minimise and optimise
mapping rules
. We exploit SQL expressivity to transform mappings to minimize
the number of mappings.
Ontop at Work 26 / 29
51. .
.
.
Summary
Results so far
. Efficiently dealt with exponential growth from (H) and (M)
. Use of dependencies and CQC/SQO to minimise and optimise
mapping rules
. We exploit SQL expressivity to transform mappings to minimize
the number of mappings.
.
.
OWL 2 QL query answering with query rewriting is efficient and
materialisation is not required.
Ontop at Work 26 / 29
52. .
.
.
Summary
Results so far
. Efficiently dealt with exponential growth from (H) and (M)
. Use of dependencies and CQC/SQO to minimise and optimise
mapping rules
. We exploit SQL expressivity to transform mappings to minimize
the number of mappings.
.
.
OWL 2 QL query answering with query rewriting is efficient and
materialisation is not required.
Ontop is available as an SPARQL end-point, OWLAPI and
Sesame library, and Protege 4 plugin. Many more features
(SPARQL, R2RML). Permanently under-development, however,
stable enough to be used seriously in many projects, incl. Optique.
Ontop at Work 26 / 29
53. .
.
.
Summary
Results so far
. Efficiently dealt with exponential growth from (H) and (M)
. Use of dependencies and CQC/SQO to minimise and optimise
mapping rules
. We exploit SQL expressivity to transform mappings to minimize
the number of mappings.
.
.
OWL 2 QL query answering with query rewriting is efficient and
materialisation is not required.
Ontop is available as an SPARQL end-point, OWLAPI and
Sesame library, and Protege 4 plugin. Many more features
(SPARQL, R2RML). Permanently under-development, however,
stable enough to be used seriously in many projects, incl. Optique.
Current work is applying these techniques to more expressive
settings, e.g., OWL + Rules, OWL 2 EL, OWL 2 RL, through an
hybrid approach.
Ontop at Work 26 / 29
54. .
.
.
Semantic Query Optimisation
Consider the query
q(t, y) ← Movie(m), title(m, t), year(m, y), (y > 2010)
By straightforwardly applying the unfolding to qtw and the
T-mapping M above, we obtain the query
q′
tw(t, y) ← title(m, t0, y0), title(m, t, y1), title(m, t2, y), (y > 2010),
which requires two (potentially) expensive Join operations.
However, by using the primary key m of title we obtain:
q′′
tw(t, y) ← title(m, t, y), (y > 2010).
55. .
.
.
Semantic Query Optmization
Semantic Query Optimisation (SQO) is a field from DB theory
focused on optimisation of queries w.r.t. dependencies.
Semantic Query Optimisations in DB and OBDA
. While some of SQO techniques reached industrial RDBMSs,
it never had a strong impact on the database community.
. In OBDA, in contrast, SQL queries are generated
automatically, and so SQO is the only tools to reach optimal
queries.
.
.
In practice, an OBDA system must implement at least SQO
w.r.t. primary keys and foreign keys to deal with the disparities
between RDF and relational.
56. .
.
.
Why does it work?
DBs are created through standard practices that generate features
that are the focus of the previous optimisations.
Starting from a rich conceptual schema, we encode it in a relational
schema by:
– amalgamating N-to-1 and 1-to-1 attributes of an entity to a
single n-ary relation with a primary key identifying the entity
(e.g., title with title and year),
– using foreign keys over attribute columns when a column refers
to the entity (e.g., name and castinfo),
– using type-discriminant columns to encode hierarchical
information (e.g., castinfo).
As this process is universal, the T-mappings created for the resulting
databases are dramatically simplified by the Ontop optimisations