2. Public linked data “Linking” in the Linked Data cloud: References to instance URIs described in external sources Special case: identity links between equivalent resources Linking Open Data cloud diagram, by Richard Cyganiak and AnjaJentzsch. http://lod-cloud.net/
3. Motivation Schema heterogeneity is an obstacle both for creating and for utilising these links Extracting information on the same topic from different repositories Discovering equivalence links between individuals Motivation for our work: discovering instance-level links How to choose the repositories to connect a new one? Which subsets of repositories contain co-referring instances? ? LinkedMDB TV programs DBPedia Freebase movies pieces of music MusicBrainz
5. Matching approaches “Top-down” Analyzing schema ontologies and generating alignments (manually or automatically) UMBEL Using CYC as a “backbone” Mapping commonly used schema ontologies “Bottom-up” Inferring schema mappings based on instance-level information
6. Our approach Constructing a large-scale network of schema mappings Applying a light-weight instance-based matcher Analysing the resulting network What does it tell us about the use of ontologies?
7. Motivating factors Potential use case scenarios Discovering relevant sources for connection Discovering relevant subsets of comparable instances Tolerance to the quality of mappings A mapping between “strongly overlapping” classes is still useful even if there is no strict equivalence/subsumption
8. Instance-based matching Use of instance-based matching Some implicit schema-level assumptions cannot be captured using only schema-level evidence Interpretation mismatches dbpedia:Actor = professional actor (film or stage) movie:actor = anybody who participated in a movie Class interpretation “as used” vs “as designed” FOAF: foaf:Person = any person DBLP: foaf:Person = computer scientist
12. Test Training Training set: 6000 overlapping pairs of classes Test: 10-fold cross-validation Training Training set: 6000 overlapping pairs of classes Test: 10-fold cross-validation Applying 2 networks of class mappings
13. Observations: class mappings Association-based network: classes involved into the largest number of mappings High-level classes represented concepts covered in many repositories … and describing categories with very fine-grained class decomposition Usually also the most populated ones geonames:Feature freebase:people.person yago:PhysicalEntity linkedmdb:film umbel:Person akt:Person akt:ArticleReference … “under-linked” ones?
14. Observations: class mappings Co-typing-based network: classes involved into the largest number of mappings Popular classes reused in many repositories … or in DBPedia … and describing categories with fine-grained class decomposition Usually also the most populated ones foaf:Person umbel:Person dbpedia:Person dbpedia:FootballPlayer wordnet:Person dbpedia:Album sioc:WikiArticle geonames:Feature …
15. Links between ontologies Aggregated network: connections between ontologies Mapping-based links between ontologies At least 1 mapping between corresponding classes must exist
26. Co-typing-based network Main factor: Popularity for reuse FOAF and WordNet: the most popular DBPedia, YAGO, OpenCYC, UMBEL Reused for DBPedia instances
27. Outcomes Possible usage scenarios for mappings Selecting suitable sources to connect “LinkedMDB contains more movies than DBPedia – more likely to cover all my instances” Selecting an ontology to reuse to structure new instances Which sources use this ontology? Do I want my data to be integrated with them? Other data-driven tasks E.g., exploratory search Generic challenges How to take into account task requirements in ontology matching? Recall vs precision, fuzzy vs exact How to capture changes in the data? BTC 2009 is almost obsolete by now
28. Limitations and future work Limitations Light-weight matcher can lead to lower quality mappings OK for our scenario but not others Pre-existing instance-level mappings are not always available Future work Combining with schema-based ontology matching techniques Taking into account properties and complex correspondences
30. Disjoint but overlapping Spurious owl:sameAs link dbpedia:Hippocrates(Hippocrates) = bookmashup:9004095748 (Hippocratic Lives and Legends (Studies in Ancient Medicine, Vol 4)) Spurious rdf:typeassignment dbpedia:Celtic_Frost (band) defined as Person in DBPedia (fixed in the current version of DBPedia) Modelling assumptions dbpedia:Masada describes both the geographical place and the battle