03 interlinking-dass

Linked Data
Interlinking
Diego Pessoa
derp@cin.ufpe.br

REVIEWING…
WEB AND LINKED DA

“These data are formated
for human consumption!”
And the machine

Interlinking - Definition
Interlinking refers to the degree to which entities that
represent the same concept are linked to each other.
Introduction to linked data and its lifecycle on the web
(Auer, Sören Lehmann, Jens Ngomo, CAN Zaveri, Amrapali)
“connecting things that are somehow related”

Interlinking - Definition
Metrics. Interlinking can be measured by
- Using network measures that calculate the interlinking degree
- Cluster coefficient
- SameAs chains
- Centrality and description richness through sameAs links.
Airline dataset Spatial Dataset
URI: americaairlines.com/country/America URI: dbpedia.org/page/United_StatesSameAs

Why Interlinking?
“Include links to other URIs, so that they can discover
more things” 4th principle of LD (most important)
The goal of linking is to transform the Web into a
platform for data and information integration as well
as for search and querying.
Triples in Linked Data sources > 31 billions
-> Links consititute less than 5% of these triples

LINK DISCOVERY
Two categories of frameworks:
Linking on the Web of Data is a more generic and thus more
complex task, as it is not limited to finding equivalent entities
in two knowledge bases
Frameworks have been developed to address the lack of
links between the different knowledge bases on the web.
1) Ontology matching:
establish links between
ontologies underlying two
data sources.
2) Instance matching (link
discovery):
discover links between instances
contained in two data sources.

LINK DISCOVERY
Formally…
Given Two sets S (source); T (target) of instances,
a (complex) semantic similarity measure
σ : S × T → [0, 1] and a threshold θ ∈ [0, 1]
The goal of link discovery task is to compute the set
M = {(s, t), σ(s, t) ≥ θ}.
In general, the similarity function used to carry out a link
discovery task is described by using a link specification
(sometimes called linkage decision rule).

CHALLENGES
Two key challenges arise when trying to
discover links between two sets of instances:
1) computational complexity of the matching task
2) selection of an appropriate link specification.

CHALLENGES
1) Computational complexity of the matching task
• The time complexity of a matching task can be measured
by the number of comparisons necessary to complete
this task
• Reduction of the time complexity of link discovery is a
key requirement to instance linking frameworks for
Linked Data.
Ex.: discovering duplicate cities in Dbpedia would necessitate
approximately 0.15 × 109 similarity computations.

CHALLENGES
2) Selection of an appropriate link specification.
• The configuration of link discovery frameworks is
usually carried out manually, in most cases simply
by guessing
• Methods such as supervised and active learning
can be used to guide the user in need of mapping
to a suitable linking configuration for his matching
task

APPROACHES TO LINK
DISCOVERYCurrent frameworks for link discovery
can be subdivided into two main categories:
Domain-specific
Universal
• RKBExplorer (academic purposes)
• GNAT (music)
• RDF-AI (not time optimized)
• LIMES (time optimized)
• SILK

ACTIVE LEARNING OF LINK
SPECIFICATIONSThe second challenge of Link Discovery is the time-efficient
discovery of link specifications for a particular linking tasks.
Several approaches have been proposed to achieve this
goal, of which most rely on genetic programming
COALA (Correlation-Aware Active Learning) approach
was implemented on top of the genetic programming

CONCLUSIONS
• First works on running link discovery in parallel have
shown that using massively parallel hardware such
as GPUs can lead to better results that using cloud
implementations even on considerably large
datasets.
• Detecting the right resources for linking
automatically given a hardware landscape is yet still
a dream to achieve.

CURRENT CHALLENGES
• Authoring
• Extraction from structured fonts (RDBS, XML)
• Natural Language Queries
• Automatic Management of Resources for Linking
• Linked Data Visualization
• Linked Data Quality/Reliability

Main
References[Book] Linked Data – Structured data
on the web. David Wood, Marsha
Zaidman, Luke Ruth. 2014
[Paper] Linked Data – The story so
far. Berners Lee.

Linked Data
Interlinking
Diego Pessoa
derp@cin.ufpe.br
Thanks!

03 interlinking-dass

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (8)

Semelhante a 03 interlinking-dass

Semelhante a 03 interlinking-dass (20)

03 interlinking-dass

Notas do Editor