SlideShare uma empresa Scribd logo
1 de 21
Linked Data
Interlinking
Diego Pessoa
derp@cin.ufpe.br
REVIEWING…
WEB AND LINKED DA
World Wide Web is ful
“These data are formated
for human consumption!”
And the machine
Linked
Data
mug
Linked
Data
Lifecycle
INTERLIN
KING
Interlinking - Definition
Interlinking refers to the degree to which entities that
represent the same concept are linked to each other.
Introduction to linked data and its lifecycle on the web
(Auer, Sören Lehmann, Jens Ngomo, CAN Zaveri, Amrapali)
“connecting things that are somehow related”
Interlinking - Definition
Metrics. Interlinking can be measured by
- Using network measures that calculate the interlinking degree
- Cluster coefficient
- SameAs chains
- Centrality and description richness through sameAs links.
Airline dataset Spatial Dataset
URI: americaairlines.com/country/America URI: dbpedia.org/page/United_StatesSameAs
Why Interlinking?
“Include links to other URIs, so that they can discover
more things” 4th principle of LD (most important)
The goal of linking is to transform the Web into a
platform for data and information integration as well
as for search and querying.
Triples in Linked Data sources > 31 billions
-> Links consititute less than 5% of these triples
LINK DISCOVERY
Two categories of frameworks:
Linking on the Web of Data is a more generic and thus more
complex task, as it is not limited to finding equivalent entities
in two knowledge bases
Frameworks have been developed to address the lack of
links between the different knowledge bases on the web.
1) Ontology matching:
establish links between
ontologies underlying two
data sources.
2) Instance matching (link
discovery):
discover links between instances
contained in two data sources.
LINK DISCOVERY
Formally…
Given Two sets S (source); T (target) of instances,
a (complex) semantic similarity measure
σ : S × T → [0, 1] and a threshold θ ∈ [0, 1]
The goal of link discovery task is to compute the set
M = {(s, t), σ(s, t) ≥ θ}.
In general, the similarity function used to carry out a link
discovery task is described by using a link specification
(sometimes called linkage decision rule).
CHALLENGES
Two key challenges arise when trying to
discover links between two sets of instances:
1) computational complexity of the matching task
2) selection of an appropriate link specification.
CHALLENGES
1) Computational complexity of the matching task
• The time complexity of a matching task can be measured
by the number of comparisons necessary to complete
this task
• Reduction of the time complexity of link discovery is a
key requirement to instance linking frameworks for
Linked Data.
Ex.: discovering duplicate cities in Dbpedia would necessitate
approximately 0.15 × 109 similarity computations.
CHALLENGES
2) Selection of an appropriate link specification.
• The configuration of link discovery frameworks is
usually carried out manually, in most cases simply
by guessing
• Methods such as supervised and active learning
can be used to guide the user in need of mapping
to a suitable linking configuration for his matching
task
APPROACHES TO LINK
DISCOVERYCurrent frameworks for link discovery
can be subdivided into two main categories:
Domain-specific
Universal
• RKBExplorer (academic purposes)
• GNAT (music)
• RDF-AI (not time optimized)
• LIMES (time optimized)
• SILK
ACTIVE LEARNING OF LINK
SPECIFICATIONSThe second challenge of Link Discovery is the time-efficient
discovery of link specifications for a particular linking tasks.
Several approaches have been proposed to achieve this
goal, of which most rely on genetic programming
COALA (Correlation-Aware Active Learning) approach
was implemented on top of the genetic programming
CONCLUSIONS
• First works on running link discovery in parallel have
shown that using massively parallel hardware such
as GPUs can lead to better results that using cloud
implementations even on considerably large
datasets.
• Detecting the right resources for linking
automatically given a hardware landscape is yet still
a dream to achieve.
CURRENT CHALLENGES
• Authoring
• Extraction from structured fonts (RDBS, XML)
• Natural Language Queries
• Automatic Management of Resources for Linking
• Linked Data Visualization
• Linked Data Quality/Reliability
Main
References[Book] Linked Data – Structured data
on the web. David Wood, Marsha
Zaidman, Luke Ruth. 2014
[Paper] Linked Data – The story so
far. Berners Lee.
Linked Data
Interlinking
Diego Pessoa
derp@cin.ufpe.br
Thanks!

Mais conteúdo relacionado

Mais procurados

PhD Projects in Constant Bitrate Network Research Ideas
PhD Projects in Constant Bitrate Network Research IdeasPhD Projects in Constant Bitrate Network Research Ideas
PhD Projects in Constant Bitrate Network Research IdeasPhD Services
 
PhD Research Topics in Cloud Computing Tutorials
PhD Research Topics in Cloud Computing  TutorialsPhD Research Topics in Cloud Computing  Tutorials
PhD Research Topics in Cloud Computing TutorialsPhD Services
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksParang Saraf
 
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...Ralf Stockmann
 
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...csandit
 
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...IEEEMEMTECHSTUDENTPROJECTS
 
balloon: LOD forecasting - cloudy with a chance of services
balloon: LOD forecasting - cloudy with a chance of servicesballoon: LOD forecasting - cloudy with a chance of services
balloon: LOD forecasting - cloudy with a chance of servicesKai Schlegel
 
Occt a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkageOcct a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkagePapitha Velumani
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
 
An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)Robert Grossman
 
SexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial DataSexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial DataCharalampos (Babis) Nikolaou
 
2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE Data
2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE Data2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE Data
2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE DataRudolf Husar
 
PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)Josep Flix
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Robert Grossman
 
GYOD - Give Your Own Data (technical)
GYOD - Give Your Own Data (technical)GYOD - Give Your Own Data (technical)
GYOD - Give Your Own Data (technical)Federico Stefani
 
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEMEFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEMNexgen Technology
 

Mais procurados (20)

PhD Projects in Constant Bitrate Network Research Ideas
PhD Projects in Constant Bitrate Network Research IdeasPhD Projects in Constant Bitrate Network Research Ideas
PhD Projects in Constant Bitrate Network Research Ideas
 
PhD Research Topics in Cloud Computing Tutorials
PhD Research Topics in Cloud Computing  TutorialsPhD Research Topics in Cloud Computing  Tutorials
PhD Research Topics in Cloud Computing Tutorials
 
Safeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist NetworksSafeguarding Abila: Discovering Evolving Activist Networks
Safeguarding Abila: Discovering Evolving Activist Networks
 
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
Controlled Vocabularies and Text Mining - Use Cases at the Goettingen State a...
 
Apresent
ApresentApresent
Apresent
 
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
O N T HE D ISTRIBUTION OF T HE M AXIMAL C LIQUE S IZE F OR T HE V ERTICES IN ...
 
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...
IEEE 2014 DOTNET DATA MINING PROJECTS Similarity preserving snippet based vis...
 
balloon: LOD forecasting - cloudy with a chance of services
balloon: LOD forecasting - cloudy with a chance of servicesballoon: LOD forecasting - cloudy with a chance of services
balloon: LOD forecasting - cloudy with a chance of services
 
Occt a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkageOcct a one class clustering tree for implementing one-to-man data linkage
Occt a one class clustering tree for implementing one-to-man data linkage
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)
 
COM494_SNA_DataPrep&NetworkTypes
COM494_SNA_DataPrep&NetworkTypesCOM494_SNA_DataPrep&NetworkTypes
COM494_SNA_DataPrep&NetworkTypes
 
SexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial DataSexTant: Visualizing Time-Evolving Linked Geospatial Data
SexTant: Visualizing Time-Evolving Linked Geospatial Data
 
Seeds Poster2
Seeds Poster2Seeds Poster2
Seeds Poster2
 
2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE Data
2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE Data2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE Data
2004-10-15 SHAirED: Services for Helping the Air-quality Community use ESE Data
 
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...
Climate Extremes Workshop - Networks and Extremes: Review and Further Studies...
 
PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)PIC Tier-1 (LHCP Conference / Barcelona)
PIC Tier-1 (LHCP Conference / Barcelona)
 
Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)Bionimbus - An Overview (2010-v6)
Bionimbus - An Overview (2010-v6)
 
GYOD - Give Your Own Data (technical)
GYOD - Give Your Own Data (technical)GYOD - Give Your Own Data (technical)
GYOD - Give Your Own Data (technical)
 
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEMEFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
EFFICIENT R-TREE BASED INDEXING SCHEME FOR SERVER-CENTRIC CLOUD STORAGE SYSTEM
 

Destaque

HTML 5, CSS 3 e o futuro da Web
HTML 5, CSS 3 e o futuro da WebHTML 5, CSS 3 e o futuro da Web
HTML 5, CSS 3 e o futuro da WebDiego Pessoa
 
Introdução ao mongodb - José Inoue - TchêLinux Uruguaiana
Introdução ao mongodb - José Inoue - TchêLinux UruguaianaIntrodução ao mongodb - José Inoue - TchêLinux Uruguaiana
Introdução ao mongodb - José Inoue - TchêLinux UruguaianaTchelinux
 
API Do Email Marketing Locaweb
API Do Email Marketing LocawebAPI Do Email Marketing Locaweb
API Do Email Marketing LocawebLocaweb
 
DoS: Negação de Serviço e formas de defesa - Elgio Schlemer
DoS: Negação de Serviço e formas de defesa - Elgio SchlemerDoS: Negação de Serviço e formas de defesa - Elgio Schlemer
DoS: Negação de Serviço e formas de defesa - Elgio SchlemerTchelinux
 
Workshop: WebSockets com HTML 5 & PHP - Gustavo Ciello
Workshop: WebSockets com HTML 5 & PHP - Gustavo CielloWorkshop: WebSockets com HTML 5 & PHP - Gustavo Ciello
Workshop: WebSockets com HTML 5 & PHP - Gustavo CielloTchelinux
 
Programação de dispositivos móveis com Java ME e NetBeans - Leandro Nunes
Programação de dispositivos móveis com Java ME e NetBeans - Leandro NunesProgramação de dispositivos móveis com Java ME e NetBeans - Leandro Nunes
Programação de dispositivos móveis com Java ME e NetBeans - Leandro NunesTchelinux
 
Hacking Linux: Princípios Básicos de Segurança - Bruna Griebeler
Hacking Linux: Princípios Básicos de Segurança - Bruna GriebelerHacking Linux: Princípios Básicos de Segurança - Bruna Griebeler
Hacking Linux: Princípios Básicos de Segurança - Bruna GriebelerTchelinux
 

Destaque (8)

HTML 5, CSS 3 e o futuro da Web
HTML 5, CSS 3 e o futuro da WebHTML 5, CSS 3 e o futuro da Web
HTML 5, CSS 3 e o futuro da Web
 
Introdução ao mongodb - José Inoue - TchêLinux Uruguaiana
Introdução ao mongodb - José Inoue - TchêLinux UruguaianaIntrodução ao mongodb - José Inoue - TchêLinux Uruguaiana
Introdução ao mongodb - José Inoue - TchêLinux Uruguaiana
 
API Do Email Marketing Locaweb
API Do Email Marketing LocawebAPI Do Email Marketing Locaweb
API Do Email Marketing Locaweb
 
HTML5 & suas APIs
HTML5 & suas APIsHTML5 & suas APIs
HTML5 & suas APIs
 
DoS: Negação de Serviço e formas de defesa - Elgio Schlemer
DoS: Negação de Serviço e formas de defesa - Elgio SchlemerDoS: Negação de Serviço e formas de defesa - Elgio Schlemer
DoS: Negação de Serviço e formas de defesa - Elgio Schlemer
 
Workshop: WebSockets com HTML 5 & PHP - Gustavo Ciello
Workshop: WebSockets com HTML 5 & PHP - Gustavo CielloWorkshop: WebSockets com HTML 5 & PHP - Gustavo Ciello
Workshop: WebSockets com HTML 5 & PHP - Gustavo Ciello
 
Programação de dispositivos móveis com Java ME e NetBeans - Leandro Nunes
Programação de dispositivos móveis com Java ME e NetBeans - Leandro NunesProgramação de dispositivos móveis com Java ME e NetBeans - Leandro Nunes
Programação de dispositivos móveis com Java ME e NetBeans - Leandro Nunes
 
Hacking Linux: Princípios Básicos de Segurança - Bruna Griebeler
Hacking Linux: Princípios Básicos de Segurança - Bruna GriebelerHacking Linux: Princípios Básicos de Segurança - Bruna Griebeler
Hacking Linux: Princípios Básicos de Segurança - Bruna Griebeler
 

Semelhante a 03 interlinking-dass

Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463IJRAT
 
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4Jijcsity
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routingIEEEMEMTECHSTUDENTSPROJECTS
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEEFINALYEARSTUDENTPROJECTS
 
Cluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector MachineCluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector MachineCSCJournals
 
ChemConnect: Poster for European Combustion Meeting 2017
ChemConnect: Poster for European Combustion Meeting 2017ChemConnect: Poster for European Combustion Meeting 2017
ChemConnect: Poster for European Combustion Meeting 2017Edward Blurock
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia
 
JPJ1423 Keyword Query Routing
JPJ1423   Keyword Query RoutingJPJ1423   Keyword Query Routing
JPJ1423 Keyword Query Routingchennaijp
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpagescsandit
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpagescsandit
 
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESEFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESIJCSEIT Journal
 
Query Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph DatabasesQuery Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph Databasesijdms
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
 
Analysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAnalysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAbhishek Mungoli
 
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIESENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIEScsandit
 
Enhancing keyword search over relational databases using ontologies
Enhancing keyword search over relational databases using ontologiesEnhancing keyword search over relational databases using ontologies
Enhancing keyword search over relational databases using ontologiescsandit
 

Semelhante a 03 interlinking-dass (20)

Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463
 
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
 
Cluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector MachineCluster Based Web Search Using Support Vector Machine
Cluster Based Web Search Using Support Vector Machine
 
ChemConnect: Poster for European Combustion Meeting 2017
ChemConnect: Poster for European Combustion Meeting 2017ChemConnect: Poster for European Combustion Meeting 2017
ChemConnect: Poster for European Combustion Meeting 2017
 
G1803054653
G1803054653G1803054653
G1803054653
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 
JPJ1423 Keyword Query Routing
JPJ1423   Keyword Query RoutingJPJ1423   Keyword Query Routing
JPJ1423 Keyword Query Routing
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
 
Topic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep WebpagesTopic Modeling : Clustering of Deep Webpages
Topic Modeling : Clustering of Deep Webpages
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASESEFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES
 
Query Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph DatabasesQuery Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph Databases
 
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...
 
Analysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAnalysis of different similarity measures: Simrank
Analysis of different similarity measures: Simrank
 
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIESENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
 
Enhancing keyword search over relational databases using ontologies
Enhancing keyword search over relational databases using ontologiesEnhancing keyword search over relational databases using ontologies
Enhancing keyword search over relational databases using ontologies
 

03 interlinking-dass

  • 3. World Wide Web is ful
  • 4. “These data are formated for human consumption!” And the machine
  • 8. Interlinking - Definition Interlinking refers to the degree to which entities that represent the same concept are linked to each other. Introduction to linked data and its lifecycle on the web (Auer, Sören Lehmann, Jens Ngomo, CAN Zaveri, Amrapali) “connecting things that are somehow related”
  • 9. Interlinking - Definition Metrics. Interlinking can be measured by - Using network measures that calculate the interlinking degree - Cluster coefficient - SameAs chains - Centrality and description richness through sameAs links. Airline dataset Spatial Dataset URI: americaairlines.com/country/America URI: dbpedia.org/page/United_StatesSameAs
  • 10. Why Interlinking? “Include links to other URIs, so that they can discover more things” 4th principle of LD (most important) The goal of linking is to transform the Web into a platform for data and information integration as well as for search and querying. Triples in Linked Data sources > 31 billions -> Links consititute less than 5% of these triples
  • 11. LINK DISCOVERY Two categories of frameworks: Linking on the Web of Data is a more generic and thus more complex task, as it is not limited to finding equivalent entities in two knowledge bases Frameworks have been developed to address the lack of links between the different knowledge bases on the web. 1) Ontology matching: establish links between ontologies underlying two data sources. 2) Instance matching (link discovery): discover links between instances contained in two data sources.
  • 12. LINK DISCOVERY Formally… Given Two sets S (source); T (target) of instances, a (complex) semantic similarity measure σ : S × T → [0, 1] and a threshold θ ∈ [0, 1] The goal of link discovery task is to compute the set M = {(s, t), σ(s, t) ≥ θ}. In general, the similarity function used to carry out a link discovery task is described by using a link specification (sometimes called linkage decision rule).
  • 13. CHALLENGES Two key challenges arise when trying to discover links between two sets of instances: 1) computational complexity of the matching task 2) selection of an appropriate link specification.
  • 14. CHALLENGES 1) Computational complexity of the matching task • The time complexity of a matching task can be measured by the number of comparisons necessary to complete this task • Reduction of the time complexity of link discovery is a key requirement to instance linking frameworks for Linked Data. Ex.: discovering duplicate cities in Dbpedia would necessitate approximately 0.15 × 109 similarity computations.
  • 15. CHALLENGES 2) Selection of an appropriate link specification. • The configuration of link discovery frameworks is usually carried out manually, in most cases simply by guessing • Methods such as supervised and active learning can be used to guide the user in need of mapping to a suitable linking configuration for his matching task
  • 16. APPROACHES TO LINK DISCOVERYCurrent frameworks for link discovery can be subdivided into two main categories: Domain-specific Universal • RKBExplorer (academic purposes) • GNAT (music) • RDF-AI (not time optimized) • LIMES (time optimized) • SILK
  • 17. ACTIVE LEARNING OF LINK SPECIFICATIONSThe second challenge of Link Discovery is the time-efficient discovery of link specifications for a particular linking tasks. Several approaches have been proposed to achieve this goal, of which most rely on genetic programming COALA (Correlation-Aware Active Learning) approach was implemented on top of the genetic programming
  • 18. CONCLUSIONS • First works on running link discovery in parallel have shown that using massively parallel hardware such as GPUs can lead to better results that using cloud implementations even on considerably large datasets. • Detecting the right resources for linking automatically given a hardware landscape is yet still a dream to achieve.
  • 19. CURRENT CHALLENGES • Authoring • Extraction from structured fonts (RDBS, XML) • Natural Language Queries • Automatic Management of Resources for Linking • Linked Data Visualization • Linked Data Quality/Reliability
  • 20. Main References[Book] Linked Data – Structured data on the web. David Wood, Marsha Zaidman, Luke Ruth. 2014 [Paper] Linked Data – The story so far. Berners Lee.

Notas do Editor

  1. Arquivos são linkados de arquivos HTML para HTML ou outros documentos. Eles são os dados que você pode interligar.
  2. Estes formatos de dados são formados para consumo humano. Requer uma ferramenta especializada para automatizar acesso, busca e reuso. Processamento adicional é necessário para que estes dados possam ser incorporados em novos projetos.
  3. 1- Dado disponivel na web (em qualquer formato, até imagem escaneada) 2- Disponivel num formato legivel por maquina (ex.: excel) 3- Disponivel em formato nao proprietario (CSV) 4- Publicado usando os padroes da W3c 5- Todos acima com links para outros dados
  4. is the most important Linked Data principle as it enables the paradigm change from data silos to interoperable data distributed across the Web. Furthermore, it plays a key role in important tasks such as cross-ontology question answering, large-scale inferences and data integration.
  5. Important to notice that while ontology and instance matching are similar to schema matching [127,126] and record linkage [161,37,25] respectively (as known in the research area of databases).
  6. The time complexity of a matching task can be measured by the number of comparisons necessary to complete this task Reduction of the time complexity of link discovery is a key requirement to instance linking frameworks for Linked Data.
  7. RKB extract RDF from heterogeneous data source so as to populate its knowledge bases with instances according to the AKT ontology instances of persons, publications and institutions were retrieved from several major metadata websites such as ACM and DBLP RDF-AI contains a series of modules that allow for computing instances matches by comparing their properties. RDF-AI does not comprise means for querying distributed data sets via SPARQL15. In addition, it suffers from not being time-optimized. Thus, mapping by using this tool can be very time-consuming. LIMES can make use of the fact that the edit distance is a distance metric to approximate distances without having to compute them.