1. HPPS project work: Semantic Data Box
G. Orsi
Computing Laboratory
Oxford University
March 2, 2011
2. Introduction
Ontology-Based Data Access (OBDA)
• New type of database management systems (DBMS) combining classic
databases with advanced reasoning and query processing capabilities.
• Marriage of ontologies and databases is a rich commercial opportunity:
• Oracle Spatial 11g Semantic Technologies
• IBM IODT
• Data-Grid Inc.
• Ontotext BigOWLLim
• and source of many research papers and systems:
• QuOnto
• Pellet
• Presto
• Requiem
• IRIS±
3. Databases and constraints
Query answering under constraints
• In OBDA, an extensional relational database D is combined with a set of
ontological constraints Σ.
• A query Q is then answered against D ∪ Σ instead of D only.
The chase
• Answering a query Q over D ∪ Σ has been shown to be equivalent to answer Q
over the chase expansion of D w.r.t. Σ.
• The expansion chase(D, Σ) can be obtained through the well-known chase
procedure.
• whenever the chase procedure does not fail, the chase(D, Σ) is a universal
model for Σ and can be used to answer Q.
5. Databases and constraints (cont.)
so... where is the problem? chase expansion
Critical issue: chase(D, Σ) might be infinite D: person(santa)
father(roberto,giorgio)
also for a very simple Σ.
Q= person(santa)
R: {person,father}
Σ: σ1 : person(X) → ∃Y father(Y,X), person(Y) apply σ1 : person(roberto)
father(z0 ,santa)
Q: q(A) ← person(A) father(z1 ,roberto)
person(z0 )
D: person(santa) person(z1 )
father(roberto,giorgio) father(z2 ,z0 )
father(z3 ,z1 )
...
Q= ?
6. Answering queries when the chase is infinite
The bounded derivation depth (BDD) property
• Johnson and Klug proved that it is possible to answer the queries even when the
chase is infinite if we know that all the needed tuples can be produced in a finite
initial fragment of chase(D, Σ).
• query answering under a set Σ enjoys the BDD property if given a query Q it is
possible to compute the chase up to k steps (chasek (D, Σ)) and:
• chasek (D, Σ) |= Q iff chase(D, Σ) |= Q iff D ∪ Σ |= Q
• k depends only on Q and Σ.
• All the Σs with the BDD property enjoy also the property of FO-rewritability.
the first-order rewritability property
• A set of constraints Σ is first-order rewritable (FO-Rewritable) if, given a query
Q, there exists a first-order query QΣ (the reformulation of Q) such that
D ∪ Σ |= Q iff D |= QΣ for any database D.
• The reformulated query does not depend on the size of D!
• Answer QΣ over D, instead of Q over chase(D, Σ).
8. The hard(ware) part...
Size of the rewriting
• QΣ as the form of a union of conjunctive queries (UCQs), i.e.,
select a,b from R1,R2 where cond1
UNION
select a,b from R1,R2 where cond2
UNION
...
UNION
select a,b from R1,R2 where condn
• Johnson and Klug forgot to mention that the number of queries in QΣ is
exponential in the number of atoms of Q and in the size of Σ
• we have two ways to go:
1. generate another form of rewriting instead of UCQs (e.g., Datalog
queries) to eliminate at least the exponentiality in Σ → no more good
properties of UCQs :-(
2. go brute-force in hardware :-)
9. The hard(ware) part...
In-hardware databases (e.g., Glacier)
• Starting from a query Q and a set of constraints Σ
• reformulate the query in software and generate a query plan
• synthesize the query plan in hardware
• execute the queries in hardware
Figure: Glacier: A query-to-hardware compiler (SIGMOD 2010)
10. Problems
Reconfigurability
• Current approaches implement in hardware the general SQL operators.
• if we synthesize the reformulation we go faster than ever!
• but...
• every time the query changes, the reformulation changes
• does reconfigurability help?