Semantic Data Box

HPPS project work: Semantic Data Box

G. Orsi

Computing Laboratory
Oxford University

March 2, 2011

Introduction
Ontology-Based Data Access (OBDA)
• New type of database management systems (DBMS) combining classic
databases with advanced reasoning and query processing capabilities.
• Marriage of ontologies and databases is a rich commercial opportunity:
• Oracle Spatial 11g Semantic Technologies
• IBM IODT
• Data-Grid Inc.
• Ontotext BigOWLLim
• and source of many research papers and systems:
• QuOnto
• Pellet
• Presto
• Requiem
• IRIS±

Databases and constraints
Query answering under constraints
• In OBDA, an extensional relational database D is combined with a set of
ontological constraints Σ.
• A query Q is then answered against D ∪ Σ instead of D only.

The chase
• Answering a query Q over D ∪ Σ has been shown to be equivalent to answer Q
over the chase expansion of D w.r.t. Σ.
• The expansion chase(D, Σ) can be obtained through the well-known chase
procedure.
• whenever the chase procedure does not fail, the chase(D, Σ) is a universal
model for Σ and can be used to answer Q.

Example: the chase

weakly-acyclic constraints chase expansion
R: {teaches,professor,student} D: professor(santa)
teaches(giorgio,cheng)
Σ: σ1 : professor(X) → ∃Z teaches(X,Z)
σ2 : teaches(X,Y) → student(Y) Q= ∅

Q: q(A) ← teaches(A,B),student(B)
apply σ1 : teaches(santa, z0 )
D: professor(santa) apply σ2 : student(z0 )
teaches(giorgio,cheng) student(cheng)

Q= {santa,giorgio}

Databases and constraints (cont.)

so... where is the problem? chase expansion
Critical issue: chase(D, Σ) might be inﬁnite D: person(santa)
father(roberto,giorgio)
also for a very simple Σ.
Q= person(santa)
R: {person,father}

Σ: σ1 : person(X) → ∃Y father(Y,X), person(Y) apply σ1 : person(roberto)
father(z0 ,santa)
Q: q(A) ← person(A) father(z1 ,roberto)
person(z0 )
D: person(santa) person(z1 )
father(roberto,giorgio) father(z2 ,z0 )
father(z3 ,z1 )
...

Q= ?

Answering queries when the chase is infinite
The bounded derivation depth (BDD) property
• Johnson and Klug proved that it is possible to answer the queries even when the
chase is infinite if we know that all the needed tuples can be produced in a finite
initial fragment of chase(D, Σ).
• query answering under a set Σ enjoys the BDD property if given a query Q it is
possible to compute the chase up to k steps (chasek (D, Σ)) and:
• chasek (D, Σ) |= Q iff chase(D, Σ) |= Q iff D ∪ Σ |= Q
• k depends only on Q and Σ.
• All the Σs with the BDD property enjoy also the property of FO-rewritability.

the first-order rewritability property
• A set of constraints Σ is first-order rewritable (FO-Rewritable) if, given a query
Q, there exists a first-order query QΣ (the reformulation of Q) such that
D ∪ Σ |= Q iff D |= QΣ for any database D.
• The reformulated query does not depend on the size of D!
• Answer QΣ over D, instead of Q over chase(D, Σ).

Chase vs Rewriting

Example Rewriting
R: {person,father} D: person(santa)
father(roberto,giorgio)
Σ: σ1 : person(X) → ∃Y father(Y,X), person(Y)
Q= person(santa)
Q: q(A) ← person(A)

D: person(santa) use σ1 : q(A) ← person(A)
father(roberto,giorgio) q(A) ← father(A,B)

Q= {santa,roberto}

The hard(ware) part...
Size of the rewriting
• QΣ as the form of a union of conjunctive queries (UCQs), i.e.,

select a,b from R1,R2 where cond1
UNION
select a,b from R1,R2 where cond2
UNION
...
UNION
select a,b from R1,R2 where condn

• Johnson and Klug forgot to mention that the number of queries in QΣ is
exponential in the number of atoms of Q and in the size of Σ
• we have two ways to go:
1. generate another form of rewriting instead of UCQs (e.g., Datalog
queries) to eliminate at least the exponentiality in Σ → no more good
properties of UCQs :-(
2. go brute-force in hardware :-)

The hard(ware) part...
In-hardware databases (e.g., Glacier)
• Starting from a query Q and a set of constraints Σ
• reformulate the query in software and generate a query plan
• synthesize the query plan in hardware
• execute the queries in hardware

Figure: Glacier: A query-to-hardware compiler (SIGMOD 2010)

Problems
Reconﬁgurability
• Current approaches implement in hardware the general SQL operators.
• if we synthesize the reformulation we go faster than ever!
• but...
• every time the query changes, the reformulation changes
• does reconﬁgurability help?

Semantic Data Box

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Viewers also liked

Viewers also liked (8)

Similar to Semantic Data Box

Similar to Semantic Data Box (20)

More from Giorgio Orsi

More from Giorgio Orsi (20)

Semantic Data Box