1. Introduction Ontology Extraction Query Answering Applications References
Accessing and Documenting Relational
Databases through OWL ontologies
C. Curino, G. Orsi, E. Panigati and L. Tanca
Dipartimento di Elettronica e Informazione (DEI)
Politecnico di Milano
(Italy)
Intl Conference on Flexible Query Answering Systems - Roskilde (Denmark)
October 27th, 2009
3. Introduction Ontology Extraction Query Answering Applications References
Introduction
• Ontologies are one of the major accomplishments of the AI and KR
communities in data and metadata representation,
4. Introduction Ontology Extraction Query Answering Applications References
Introduction
• Ontologies are one of the major accomplishments of the AI and KR
communities in data and metadata representation,
• later they have become appealing also for the DB community since they:
• naturally extend many other data models (some problems with ICs
anyway),
• provide a conceptual and uniform view of data and metadata.
5. Introduction Ontology Extraction Query Answering Applications References
Introduction
• Ontologies are one of the major accomplishments of the AI and KR
communities in data and metadata representation,
• later they have become appealing also for the DB community since they:
• naturally extend many other data models (some problems with ICs
anyway),
• provide a conceptual and uniform view of data and metadata.
• Target: extend data sources with ontologies
6. Introduction Ontology Extraction Query Answering Applications References
Introduction
• Ontologies are one of the major accomplishments of the AI and KR
communities in data and metadata representation,
• later they have become appealing also for the DB community since they:
• naturally extend many other data models (some problems with ICs
anyway),
• provide a conceptual and uniform view of data and metadata.
• Target: extend data sources with ontologies
Motivations
7. Introduction Ontology Extraction Query Answering Applications References
Introduction
• Ontologies are one of the major accomplishments of the AI and KR
communities in data and metadata representation,
• later they have become appealing also for the DB community since they:
• naturally extend many other data models (some problems with ICs
anyway),
• provide a conceptual and uniform view of data and metadata.
• Target: extend data sources with ontologies
Motivations
• seamless access to heterogeneous data sources → query answering,
8. Introduction Ontology Extraction Query Answering Applications References
Introduction
• Ontologies are one of the major accomplishments of the AI and KR
communities in data and metadata representation,
• later they have become appealing also for the DB community since they:
• naturally extend many other data models (some problems with ICs
anyway),
• provide a conceptual and uniform view of data and metadata.
• Target: extend data sources with ontologies
Motivations
• seamless access to heterogeneous data sources → query answering,
• representation of heterogeneous data in a common language → publishing,
9. Introduction Ontology Extraction Query Answering Applications References
Introduction
• Ontologies are one of the major accomplishments of the AI and KR
communities in data and metadata representation,
• later they have become appealing also for the DB community since they:
• naturally extend many other data models (some problems with ICs
anyway),
• provide a conceptual and uniform view of data and metadata.
• Target: extend data sources with ontologies
Motivations
• seamless access to heterogeneous data sources → query answering,
• representation of heterogeneous data in a common language → publishing,
• deep annotation of both data and data structures → documentation.
10. Introduction Ontology Extraction Query Answering Applications References
Introduction
• Ontologies are one of the major accomplishments of the AI and KR
communities in data and metadata representation,
• later they have become appealing also for the DB community since they:
• naturally extend many other data models (some problems with ICs
anyway),
• provide a conceptual and uniform view of data and metadata.
• Target: extend data sources with ontologies
Motivations
• seamless access to heterogeneous data sources → query answering,
• representation of heterogeneous data in a common language → publishing,
• deep annotation of both data and data structures → documentation.
however...
11. Introduction Ontology Extraction Query Answering Applications References
Introduction
• Ontologies are one of the major accomplishments of the AI and KR
communities in data and metadata representation,
• later they have become appealing also for the DB community since they:
• naturally extend many other data models (some problems with ICs
anyway),
• provide a conceptual and uniform view of data and metadata.
• Target: extend data sources with ontologies
Motivations
• seamless access to heterogeneous data sources → query answering,
• representation of heterogeneous data in a common language → publishing,
• deep annotation of both data and data structures → documentation.
however...
• two major issues must be addressed:
• automatic semantic annotation of data sources [1, 7],
• scalable query answering [3].
12. Introduction Ontology Extraction Query Answering Applications References
Introduction
What do we need?
• a mapping strategy for heterogeneous data models,
• automated ontology extraction from data source schemas,
• a query rewriting technology to translate queries between data models.
13. Introduction Ontology Extraction Query Answering Applications References
Introduction
What do we need?
• a mapping strategy for heterogeneous data models,
• automated ontology extraction from data source schemas,
• a query rewriting technology to translate queries between data models.
Contributions:
• general approach to ontology-based annotation of data sources,
• extension of the Relational.OWL ontology,
• automatic extraction of ontologies from relational data sources,
• show how the presented framework can be useful in practical applications.
15. Introduction Ontology Extraction Query Answering Applications References
Infrastructure for Ontology Extraction
Architecture Data Model Ontology (DMO)
• structure of the data model in
use,
• does not vary with the schema.
16. Introduction Ontology Extraction Query Answering Applications References
Infrastructure for Ontology Extraction
Architecture Data Model Ontology (DMO)
• structure of the data model in
use,
• does not vary with the schema.
Data Source Ontology (DSO)
• intensional knowledge described
by the schema,
• no individual names (instances).
17. Introduction Ontology Extraction Query Answering Applications References
Infrastructure for Ontology Extraction
Architecture Data Model Ontology (DMO)
• structure of the data model in
use,
• does not vary with the schema.
Data Source Ontology (DSO)
• intensional knowledge described
by the schema,
• no individual names (instances).
Schema Design Ontology (SDO)
• maps the DSO to the DMO,
• describes how concepts and roles
in the ontology are rendered in a
particular data model,
• separates (and stores) the logical
organization of the schema from
its semantics.
18. Introduction Ontology Extraction Query Answering Applications References
The Relational Case: The DMO
• we adopt the Relational.OWL ontology [6],
• we modify it to model composite foreign keys,
• we render foreign-keys as first-class citizens.
19. Introduction Ontology Extraction Query Answering Applications References
The Relational Case: The DMO
• we adopt the Relational.OWL ontology [6],
• we modify it to model composite foreign keys,
• we render foreign-keys as first-class citizens.
Relational.OWL (modified) Structure
20. Introduction Ontology Extraction Query Answering Applications References
The Relational Case: The DMO
• we adopt the Relational.OWL ontology [6],
• we modify it to model composite foreign keys,
• we render foreign-keys as first-class citizens.
Relational.OWL (modified) Structure
Relational.OWL Classes
rdf:ID rdfs:subClassOf
dbs:Database rdf:Bag
dbs:Table rdf:Seq
dbs:Column rdfs:Resource
dbs:PrimaryKey rdf:Bag
dbs:ForeignKey rdf:Bag
21. Introduction Ontology Extraction Query Answering Applications References
The Relational Case: The DMO
• we adopt the Relational.OWL ontology [6],
• we modify it to model composite foreign keys,
• we render foreign-keys as first-class citizens.
Relational.OWL (modified) Structure
Relational.OWL Properties
Relational.OWL Classes rdf:ID rdfs:domain rdfs:range
rdf:ID rdfs:subClassOf dbs:has owl:Thing owl:Thing
dbs:hasTable dbs:Database dbs:Table
dbs:Database rdf:Bag
dbs:hasColumn dbs:Table dbs:Column
dbs:Table rdf:Seq
dbs:PrimaryKey
dbs:Column rdfs:Resource
dbs:ForeignKey
dbs:PrimaryKey rdf:Bag
dbs:isIdentifiedBy dbs:Table dbs:PrimaryKey
dbs:ForeignKey rdf:Bag
dbs:hasForeignKey dbs:Table dbs:ForeignKey
dbs:references dbs:Column dbs:Column
22. Introduction Ontology Extraction Query Answering Applications References
The Relational Case: The DMO
• we adopt the Relational.OWL ontology [6],
• we modify it to model composite foreign keys,
• we render foreign-keys as first-class citizens.
Relational.OWL (modified) Structure
Relational.OWL Properties
Relational.OWL Classes rdf:ID rdfs:domain rdfs:range
rdf:ID rdfs:subClassOf dbs:has owl:Thing owl:Thing
dbs:hasTable dbs:Database dbs:Table
dbs:Database rdf:Bag
dbs:hasColumn dbs:Table dbs:Column
dbs:Table rdf:Seq
dbs:PrimaryKey
dbs:Column rdfs:Resource
dbs:ForeignKey
dbs:PrimaryKey rdf:Bag
dbs:isIdentifiedBy dbs:Table dbs:PrimaryKey
dbs:ForeignKey rdf:Bag
dbs:hasForeignKey dbs:Table dbs:ForeignKey
dbs:references dbs:Column dbs:Column
Each instance of the DMO represents the structure of a given RDB
25. Introduction Ontology Extraction Query Answering Applications References
The relational case: DSO Extraction
Metadata Extraction
• RDB catalog inspection,
• Relational.OWL instance generation.
Schema Analysis
• DSO Generation (by logical to conceptual reverse engineering)
• SDO Generation
Reverse Engineering Rules (Informal)
• a concept for each table with a proper primary key,
• a concept for each table representing a n-ary relationship or a binary relationship
with attributes,
• a role for each table representing a binary relationship without attributes,
• an attribute for each attribute in the table that is not a FK,
• proper existential restrictions to force some attributes to exist (e.g., primary
keys, min cardinalities).
(see the paper for formal definitions)
26. Introduction Ontology Extraction Query Answering Applications References
Running Example
• Ensembl multi-species genome database,
• over 100 tables in the backend database,
• open source database schema, data and software.
• ...
27. Introduction Ontology Extraction Query Answering Applications References
Running Example
• Ensembl multi-species genome database,
• over 100 tables in the backend database,
• open source database schema, data and software.
• ...
• sometimes the designer forgets what a good DB design is...
28. Introduction Ontology Extraction Query Answering Applications References
Running Example
• Ensembl multi-species genome database,
• over 100 tables in the backend database,
• open source database schema, data and software.
• ...
• sometimes the designer forgets what a good DB design is...
The Ensembl genetic DB (excerpt)
29. Introduction Ontology Extraction Query Answering Applications References
Running Example
The Extracted Ontology (sketch)
30. Introduction Ontology Extraction Query Answering Applications References
The Schema Design Ontology (SDO)
The SDO contains a set of assertions of the form:
dmo:rel entity sdo:representedBy dso:onto entity
that maps the DSO to a given DMO instance
31. Introduction Ontology Extraction Query Answering Applications References
The Schema Design Ontology (SDO)
The SDO contains a set of assertions of the form:
dmo:rel entity sdo:representedBy dso:onto entity
that maps the DSO to a given DMO instance
Example
dmo:gene.r start sdo:representedBy dso:gene.r start (Attributes)
dmo:exon sdo:representedBy dso:exon (Tables)
32. Introduction Ontology Extraction Query Answering Applications References
The Schema Design Ontology (SDO)
The SDO contains a set of assertions of the form:
dmo:rel entity sdo:representedBy dso:onto entity
that maps the DSO to a given DMO instance
Example
dmo:gene.r start sdo:representedBy dso:gene.r start (Attributes)
dmo:exon sdo:representedBy dso:exon (Tables)
• If a (non conceptual) change occurs in the relational schema only the SDO
changes,
• No re-extraction needed.
33. Introduction Ontology Extraction Query Answering Applications References
The Schema Design Ontology (SDO)
The SDO contains a set of assertions of the form:
dmo:rel entity sdo:representedBy dso:onto entity
that maps the DSO to a given DMO instance
Example
dmo:gene.r start sdo:representedBy dso:gene.r start (Attributes)
dmo:exon sdo:representedBy dso:exon (Tables)
• If a (non conceptual) change occurs in the relational schema only the SDO
changes,
• No re-extraction needed.
What if a conceptual change occurs?
• the SDO and the DSO can be locally adapted.
34. Introduction Ontology Extraction Query Answering Applications References
Accessing the datasource through rewriting
In order to access the content of the data source using SPARQL we need to:
• chase the original query with the axioms in the TBox,
• translate the result in SQL.
but...
35. Introduction Ontology Extraction Query Answering Applications References
Accessing the datasource through rewriting
In order to access the content of the data source using SPARQL we need to:
• chase the original query with the axioms in the TBox,
• translate the result in SQL.
but...
• the generated ontology is in EL,
• QA in EL is PTIME-hard in data complexity → not F OL-rewritable.
however...
36. Introduction Ontology Extraction Query Answering Applications References
Accessing the datasource through rewriting
In order to access the content of the data source using SPARQL we need to:
• chase the original query with the axioms in the TBox,
• translate the result in SQL.
but...
• the generated ontology is in EL,
• QA in EL is PTIME-hard in data complexity → not F OL-rewritable.
however...
• Lutz et. Al [8] showed that given a CQ q it is still possible to obtain a perfect
rewriting qrew by chasing q against an EL TBox after a pre-processing of the
DB,
• the pre-processing is guaranteed to terminate in quadratic time.
then...
37. Introduction Ontology Extraction Query Answering Applications References
Accessing the datasource through rewriting
In order to access the content of the data source using SPARQL we need to:
• chase the original query with the axioms in the TBox,
• translate the result in SQL.
but...
• the generated ontology is in EL,
• QA in EL is PTIME-hard in data complexity → not F OL-rewritable.
however...
• Lutz et. Al [8] showed that given a CQ q it is still possible to obtain a perfect
rewriting qrew by chasing q against an EL TBox after a pre-processing of the
DB,
• the pre-processing is guaranteed to terminate in quadratic time.
then...
• the obtained rewritings can be translated in SQL in linear time,
• the queries are executed on the native RDB engine,
• the results are rendered according to the mapping stored in the SDO.
38. Introduction Ontology Extraction Query Answering Applications References
On Integrity Constraints
Representation
• Integrity constraints represented in the DMO instance,
• not (completely) represented at DSO-level,
• this is a difference w.r.t. works such as DL-Lite [3],
• not representable in OWL syntax anyway, we should resort to SWRL syntax.
39. Introduction Ontology Extraction Query Answering Applications References
On Integrity Constraints
Representation
• Integrity constraints represented in the DMO instance,
• not (completely) represented at DSO-level,
• this is a difference w.r.t. works such as DL-Lite [3],
• not representable in OWL syntax anyway, we should resort to SWRL syntax.
Enforcement
• ICs can not be enforced in the DSO,
• this is not such a great problem if we do not update,
• ICs already enforced by the underline RDB engine.
40. Introduction Ontology Extraction Query Answering Applications References
Applications:
Ok... and all this machinery can be used for...?
41. Introduction Ontology Extraction Query Answering Applications References
Applications:
Ok... and all this machinery can be used for...?
Data Integration
• while the schema integration can be done as usual on the DSO-level [9],
• the SDO can be used to explicitly represent reconciliationf unctions,
• and from these derive the SQL functions that must be applied at the DB level,
• moreover, the SDO can be extended to represent other metadata e.g.,
provenance, location dependencies, etc.
42. Introduction Ontology Extraction Query Answering Applications References
Applications:
Ok... and all this machinery can be used for...?
Data Integration
• while the schema integration can be done as usual on the DSO-level [9],
• the SDO can be used to explicitly represent reconciliationf unctions,
• and from these derive the SQL functions that must be applied at the DB level,
• moreover, the SDO can be extended to represent other metadata e.g.,
provenance, location dependencies, etc.
Schema/Ontology Evolution
• Zaniolo et. Al defined a set of operators (SMOs) describing the evolution of
relational schemas [4],
• Question: how these operators affect the conceptual level [5]?
• Ongoing Work: Is it possible to automatically derive the conceptual changes
through the SDO?
44. Introduction Ontology Extraction Query Answering Applications References
Future Work
• Apply this approach to XML (ready) and Web Pages (ongoing),
• Ontology support for schema evolution based on this work (ongoing),
• More expressive language for the DSO → Datalog± [2].
46. Introduction Ontology Extraction Query Answering Applications References
References I
C. Bizer and R. cyganiak
D2R server: Publishing relational databases on the semantic web.
In Proc. of 5th Intl Semantic Web Conference (ISWC), 2006
A. Cal´ G. Gottlob and T. Lukasiewicz
ı,
A general datalog-based framework for tractable query answering over
ontologies.
In Proc. of Intl Symp. on Principles of Database Systems (PODS), 2009.
D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini and R. Rosati
Tractable Reasoning and Efficient Query Answering in Description Logics: The
DL-Lite family.
Journal of Automated Reasoning, 2007
C. A.Curino, H. J. Moon and C. Zaniolo
Graceful database schema evolution: the PRISM workbench.
In Proc. of the 34th Intl Conf. on Very Large Databases (VLDB), 2008
C. A. Curino, H. J. Moon and C. Zaniolo
Managing the history of metadata in support for db archiving and schema
evolution.
In: ER Workshop on Evolution and Change in Data Management (ECDM), 2008
47. Introduction Ontology Extraction Query Answering Applications References
References II
C. P. de Laborda and S. Conrad
Relational.owl: a data and schema representation format based on owl.
In Proc. of the 2nd Asia-Pacific Conf. on conceptual modelling (APCM), 2005
L. Lubyte and S. Tessaris
Automatic extraction of ontologies wrapping relational data sources.
In Proc. of 20th Intl Conf. on Database and Expert Systems Applications
(DEXA), 2009
C. Lutz and D. Toman and F. Wolter
Conjunctive Query Answering in EL using a Database System.
In Proc. of OWL Experiences and Directions Intl Workshop (OWLED), 2008
N. F. Noy
Semantic integration: a survey of ontology-based approaches.
ACM Sigmod Record, 33(4), 2004