This document summarizes a research paper that presents an approach for automatically discovering schema-level mappings between ontologies of linked data sources. The approach uses an extensional approach to align concepts based on the overlap of instances belonging to different concepts. It can discover alignments between atomic and conjunctive restriction classes, as well as detect concept coverings using disjunctive restriction classes. The approach is able to find rich alignments even when ontologies are rudimentary, and can detect outliers that may require corrections.
Discovering Alignments in Ontologies of Linked Data
1. Discovering Concept Coverings in
Ontologies of Linked Data Sources
Best Paper: Research Track – The 11th International Semantic
Web Conference, 2012
Rahul Parundekar, Craig A. Knoblock and Jose-Luis Ambite
{parundek,knoblock}@usc.edu, ambite@isi.edu
University of Southern California
3. Data Integration on the Web
Goal: Integrate and Query Data across the Web
Two challenges in data integration:
• Object-level (aka record linkage): When objects at different
sites are the same real-world object
• Linked Data movement addresses object-level integration
• objects have unique URIs,
• equivalent objects linked with owl:sameAs statements
• Schema-level (aka schema/ontology mapping): align the
semantics of different sources
• Define mappings to relate source schemas to common schema
• Use schema mappings to query across multiple sources
Our work: Exploit linked data to automatically
discover expressive schema/ontology mappings
4. The Web of Linked Data
integrates data at the object level
Example:
Geospatial
Domain
Los
Angeles
City of Los
Angeles
5. Equivalent instances in different sources
are connected with owl:sameAs
Source 1
Source 2
Ontology Level
Populated
Place
City
Instance Level
owl:sameAs
Los Angeles
City of Los Angeles
6. Links are absent at the ontology level
Source 1
Source 2
Ontology Level
Populated
Place
Only 15 out of the
NO LINKS!!
190 ontologies
are connected
City
Instance Level
owl:sameAs
Los Angeles
City of Los Angeles
7. Alignments are necessary for
interoperability of the sources
Source 1
Source 2
Ontology Level
Populated
Place
We need to find
NO LINKS!!
links at the
Ontology Level
City
Instance Level
owl:sameAs
Los Angeles
City of Los Angeles
8. How can we find ontology alignments?
Source 1
Source 2
Ontology Level
=
Populated
Place
City
Instance Level
owl:sameAs
Los Angeles
City of Los Angeles
10. Use an extensional approach to align concepts
Represents set of instances belonging to ClassA
Represents set of instances belonging to ClassB
ClassA is disjoint from ClassB
ClassA is equivalent to ClassB
ClassA is subset of ClassB
ClassB is subset of ClassA
11. Align concepts when supported by evidence at
the instance level
Source 1
Source 2
Ontology Level
=
Populated
Place
City
Instance Level
New York
NYC
City of Los
Angeles
Los
Angeles
City of
Dublin
Dublin
Kuala
Lumpur
… and more
Kolumpo
… and more
12. However, ontologies of many sources are
rudimentary
DBpedia Ontology
Description
GeoNames Ontology
# of Properties
3.77 million
8 million
9 feature classes, 645 feature
codes
359
1
(Well-definedhierarchy)
# ofClasses
Geographical
Database
For Example: Places, People,
Music, Topics, etc.
# of Instances
Semantic Web version
of Wikipedia
(rdf:type=Feature)
1775
29
Rich, Descriptive Ontology
Impoverished Ontology
Finding Alignments is Non-Trivial
13. Create NEW concepts by restricting values of
properties
Set of all instances in
GeoNames
Set of all instances with
featureClass=P
Atomic Restriction Classes*
Set of all instances in
DBpedia
Set of all instances with
rdf:type=PopulatedPlace
* Value Restrictions in OWL-DL
16. Create specialized concepts using
conjunctive restriction classes
For Example: Creating the concept for “Schools in the US”
Set of all instances in
GeoNames
Set of all instances with
countryCode=US
i.e. features in the US
Set of all instances with
featureCode=S.SCH
&countryCode=US
i.e. Schools in the US
Set of all instances with
featureCode=S.SCH
i.e. Schools
Conjunctive Restriction
Classes
17. An ordered top-down exploration algorithm to
align Atomic & Conjunctive Restriction Classes
18. Detect rich alignments even when ontologies are
rudimentary
GeoNames-Dbpedia
Relationship
Equivalent
Subset
# Alignments Found with Atomic
and Conjunctive Restriction
Classes
31
2193
Can we find more
meaningful alignments?
20. 1) Schools in GeoNames are Educational
Institutions in DBpedia
featureCode=S.SCH
rdf:type=EducationalInstitution
21. 2) Colleges in GeoNames are
Educational Institutions in DBpedia
featureCode=S.SCH
rdf:type=EducationalInstitution
featureCode=S.SCHC
22. 3) Universities in GeoNames are
Educational Institutions in DBpedia
featureCode=S.SCH
rdf:type=EducationalInstitution
featureCode=S.SCHC
featureCode=S.UNIV
23. Using featureCode property as a hint,
create a Union of concepts
featureCode=S.SCH
rdf:type=EducationalInstitution
featureCode=S.SCHC
featureCode=S.UNIV
featureCode=S.SCHC
∩
∩
featureCode=S.SCH
featureCode=S.UNIV
24. Detect a Concept Covering by
extensional comparison
featureCode=S.SCH
rdf:type=EducationalInstitution
featureCode=S.SCHC
=
featureCode=S.UNIV
featureCode=S.SCHC
∩
∩
featureCode=S.SCH
featureCode=S.UNIV
25. Compare the overlap of the extension
sets to determine equivalence
featureCode={S.SCH, S.SCHC, S.UNIV}
US
rdf:type=EducationalInstitution
UL
|UL| = 404 Educational
Institutions
UA=US∩UL
=
| UA |
| US |
> = 1, by definition
0.9, by definition
| UA |
| UL |
=
=1
396
404
= 0.98 > 0.9
27. Example: Am I in Spain … or Italy?
• We align dbpedia:country=dbpedia:Spain with
geonames:countryCode=ES
• 3917 out of 3918 instances in GeoNames agree
with this
• ONE instance had its country code as Italy.
• Because this instance contradicts
overwhelming evidence, we can flag it as an
outlier
28. Concept Covering of Educational Institutions:
What are the other 8 instances?
featureCode={S.SCH, S.SCHC, S.UNIV}
| UA |
| UL |
rdf:type=EducationalInstitution
=
396
404
• 1 with featureCode=S.HSP (Hospitals)
• There are 31 instances with S.HSP because of which
Hospitals are not subsets
•
•
•
•
•
3 with featureCode=S.BLDG (Buildings)
1 with featureCode=S.EST (Establishment)
1 with featureCode=S.LIBR (Library)
1 with featureCode=S.MUS (Museum)
1 doesn’t have a featureCodeproperty
30. Example alignments of
Atomic Restriction Classes
Restriction Class
from GeoNames
Restriction Class from
DBpedia
Rel
P
R
| Img(r1)
∩ r2|
Alignments with concepts that were not explicit– e.g. Concept ofPlaces
featureClass=P
rdf:type=PopulatedPlace
=
99.
6
90.5 70658
∩
Alignments with geographical regions like countries, administrative divisions, etc.
countryCode=ES
country=Spain
=
94.
5
99.9 3917
Find the actual relationship between concepts as opposed to the perceived one
31. Example alignments of
Conjunctive Restriction Classes
Restriction Class
from GeoNames
Restriction Class from
DBpedia
Rel
P
R
| Img(r1)
∩ r2|
Find alignments with conjunctiverestriction classes
e.g. Concepts of ‘Places in the US’ are equal
featureClass=P
&countrycode=US
rdf:type=PopulatedPlace&
country=United_States
=
97.
2
96.7 26061
Find alignments with conjunctive restriction classes that have related properties
e.g. Places in North Dakota have 701 area code for phone numbers
featureClass=P &
parentADM1=
North_Dakota
areaCode=701
=
98.
1
96.5 361
In some cases the meaning of a concept shifts slightly
e.g. Populated Places in Senegal are aligned to Towns rather than PopulatedPlaces
featureClass=P
&countryCode=SN
rdf:type=Town &
country=Senegal
=
92.
6
100
25
32. Example Alignments of
Disjunctive Restriction Classes
Larger Restriction
Class
Union of Smaller Restriction
Classes
Rel
R
Ove Outliers
rlap
Find concept coverings with disjunctive restriction classes– e.g. Educational
Institution concept in Dbpedia covers concepts of Schools, Colleges and Universities
rdf:type= dbpedia:
geonames:featureCode=
Educational_Institution {S.SCH, S.SCHC, S.UNIV}
=
98.
0
396/
404
S.BLDG,
S.HSP,
S.MUS,
etc.
=
99.
2
1981/
1996
S.AIRF,
S.FRMT,
S.SCH,
T.HLL, etc.
98.
0
1939/
1978
dbpedia:
Kingdom_o
f_the_Neth
erlands
System can flag outliers that may need to be corrected
rdf:type=
dbpedia:Airport
Geonames:featureCode=
{S.AIRB, S.AIRP}
System is able to find all terms used for the country Netherlands
geonames:
countryCode=NL
dbpedia:country=
{dbpedia:The_Netherlands,db
pedia:Flag_of_the_Netherland
s.svg, dbpedia:Netherlands}
=
33. Related Work
• Other Ontology Alignment efforts in the Web of
Linked Data
• BLOOMS, BLOOMS+ [Jain et al. ISWC 2010, 2011]
• Linked Open Data ontologies aligned with central
ontology called ‘Proton’ using structural similarity
• Agreement Maker [Cruz et al. 2011]
• Similarity Metrics on labels of classes
• Statistical schema induction [Volker et al. ISWC 2011]
• Mines associativity rules from intermediate ‘transaction
data sets’ -> OWL2 Axioms.
• Formalization of Ontology Mappings [Atencia et al.
ISWC 2012]
• A related work that provides a formalization of weighted
ontology mappings
34. Conclusion and Future Work
• Conclusion
• Our approach is able to find alignments
• Automatically, across any two linked sources
• Even in the case of a rudimentary ontology
• Types of Alignments
• Atomic Restriction Classes
• Conjunctive Restriction Classes
• Disjunctive Restriction Classes i.e. Concept Coverings
• And detect Outliers that help identify inconsistencies in the
data
• Future work
• Add support for negation
• Build complete descriptions of sources
• Use algorithms to negotiate meaning between agents on the fly
35. References for Additional Detail:
Any questions?
THANK YOU
Rahul Parundekar, Craig A. Knoblock, and
Jose Luis Ambite.
Linking and building ontologies of linked data.
The Semantic Web, ISWC 2010.
Rahul Parundekar, Craig A. Knoblock, and
Jose Luis Ambite.
Discovering concept coverings in ontologies of
linked data sources
The Semantic Web, ISWC 2012.
Rahul Parundekar, Craig A. Knoblock, and
Jose Luis Ambite.
Discovering alignments in ontologies of linked
data
IJCAI-2013
Notas do Editor
Instances are linked across multiple sourcesEquivalent instances in the different domains connected with owl:sameAsDifferent sources with different schemas
We need to find links at the Ontology Level
Replacing with conjunctive
Put slide before 25
Put short citations in
Conclusion should contain conjunctive and disjunctiveCreate an impactful conclusion. - We are able to create ontology where no class exists.