SlideShare uma empresa Scribd logo
1 de 17
Merging controlled vocabularies
through semantic alignment
based on linked data
Authors: Konstantinos Kyprianos, Ioannis Papadakis

IONIAN UNIVERSITY
DEPARTMENT OF ARCHIVES, LIBRARY SCIENCE AND MUSEOLOGY
Ioannou Theotoki 72, 49100, Corfu

1
Presentation outline









Introduction
Proposed approach
Proof of concept
Deployment of the proposed
approach
Deployment results
Comparative evaluation
Conclusions
Future work

2
Introduction (1/2)


Controlled vocabularies are predefined lists of words for
knowledge organization and the description of libraries’
collections



Creation of semantically similar yet syntactically and
linguistically heterogeneous controlled vocabularies with
overlapping parts



Matching tools and techniques: Lexical similarity



Matching tools and techniques: Semantic alignment

◦ Compares terms according to the order of their characters
◦ Edit – distance, prefix / suffix variations, n-grams etc.
◦ Based on semantic techniques to identify similar terms
between two structured vocabularies

3
Introduction (2/2)


Our approach:

Methodology to bring together semantically similar
yet different vocabularies through the semantic
alignment of the underlying terms with the
employment of LOD technologies
◦ Semantic alignment is achieved through external linguistic
datasets
◦ There is no requirement of any kind of structure (schema or
ontology) to the compared datasets

4
Proposed approach

• S is the set of terms in Source dataset
• T is the set of terms in Target dataset
• L is the set of terms in the Linguistic
dataset
• L’ is the set of terms that are found to
be linguistically associated with some
terms of the Source dataset
• L’’ is the set of terms in L that are found
to be semantically associated with
some terms of the L’
• T’ contains the terms in T that are
linguistically associated with some
terms of L’ and L’’
5
Proof of concept (1/2)


University of Piraeus digital library (Dione)
◦ Theses and dissertations
◦ 3,323 bilingual subject headings
◦ DSpace installation



New York Times – NYT
◦ Approximately 10.000 subject headings
◦ Journal articles



DBpedia
◦ Extracts structured information from Wikipedia
◦ 3,5 million entities



WordNet
◦ Lexical database
◦ Consists of synsets (~117,659 distinct concepts containing terms
interlinked through conceptual-semantic relations)
6
Proof of concept (2/2)
1. let the source dataset S be D (i.e. Dione)
2. let the target dataset T be N (i.e. NYT)
3. let the linguistic datasetA L be DB (i.e.
DBpedia) and
4. let the linguistic datasetB L be W
(i.e.WordNet)
5. D1’ corresponds to S’, assuming that the
linguistic dataset L is DB. In a similar
manner, D2’ corresponds to S’, assuming
that the linguistic dataset L is W.
6. DB’ and DB’’ correspond to L’ and L’’
respectively, assuming that the linguistic
dataset L is DB. In a similar manner, W’
and W’’ correspond to L’ and L’’
respectively, assuming that the linguistic
dataset L is W.
7. N1’ corresponds to T’, assuming that the
linguistic dataset L is DB. In a similar
manner, N2’ corresponds to T’ assuming
that the linguistic dataset L is W.

7
Deployment of the proposed approach




◦
◦
◦

Google Refine

Tool to manipulate tabular data
Reconciliation of data with existent knowledge bases
RDF extension

Process
1.
2.
3.

4.

5.
6.

Subject headings from Dione are imported to Google Refine
DBpedia and WordNet endpoints are registered in Google Refine as
SPARQL reconciliation services
The subject headings of Dione are linguistically matched (i.e. lexical
similarity) against DBpedia’s and WordNet’s reconciliation services
creating the corresponding subsets
The terms in the subsets of step 3 are extended with semantically
equivalent terms (i.e. semantic alignment) deriving from the rest of
DBpedia and WordNet
Subject headings from NYT are imported to Google Refine
The subject headings of NYT are linguistically matched (i.e. lexical
similarity) against the terms belonging to the subsets that are
described in steps 3 and 4
8
Deployment results (1/2)
Linguistically
matched terms
between



◦
◦

Dione and DBpedia
Dione and Wordnet

through lexical
similarity techniques

Dione

DBpedia

WordNet

One-word
Subject
Headings

331 (29%)

297 (65%)

Two-words
Subject
Headings

658 (59%)

128 (28%)

Subject
Headings with
3+ words

130 (12%)

30 (7%)

Subject
Headings with
Subdivisions

0

0

Sum
(1,574)

1,119

455

9
Deployment results (2/2)
D = 3,323 terms

D
D2’

D1’

1119

DB

DB’’

455

DB’

W’’

W’

W

986

5,700
72

86
77

45

N

N = 10,000 terms
N1’ = 163

N2’ = 117

10
Comparative evaluation (1/4)


The proposed methodology is compared against
an algorithm (introduced in a previous work*)
addressed to Dione and NYT based only on
lexical similarity techniques
 Dione and NYT are not described by schemas.
Thus, any attempt to merge their underlying
terms cannot be based on traditional ontologyalignment techniques

*Papadakis, I., Kyprianos, K.: Merging Controlled Vocabularies for More Efficient

Subject-based Search. International Journal of Knowledge Management. 7(3), 76-90,
July-September (2011)
11
Comparative evaluation (2/4)
List A

List B

207
280

List A. Previous work: only lexically matched pairs between Dione and NYT
List B. Proposed work: lexically AND semantically matched pairs between Dione
and NYT
12
Comparative evaluation (3/4)
List B

List A

27

180

100

List A ∧ List B = 180 terms

13
Comparative evaluation (4/4)
Matched
pairs

List A

List B

D1-NYT1

 (lexical)

 (lexical)

…

 (lexical)

 (lexical)

D158-NYT158

 (lexical)

 (lexical)

…

 (lexical)

 (semantic)

D180-NYT180

 (lexical)

 (semantic)

…



 (semantic)

D280-NYT280



 (semantic)

…

 (lexical)



D307-NYT307

 (lexical)


TOTAL:

No. of pairs

158
180
22
100
27
307

14
Conclusions


A methodology was presented that is capable of finding
equivalent terms between semantically similar controlled
vocabularies



Lexical similarities discovery and semantic alignment
through external LOD datasets



Google Refine renders the deployment of the proposed
methodology as a straightforward process that can be
applied to other cases aiming in discovering equivalent
terms in different yet semantically similar datasets



The deployment of the proposed methodology is facilitated
through the employment of linked data technologies

15
Future work


Future work is targeted towards the reconciliation of
Dione’s subject headings with linked data services such as
French National Library (RAMEAU), German National
Library (GND), Biblioteca National de Espana (BNE) and
LIBRIS.

16
Thank you for your attention!
Questions?

17

Mais conteúdo relacionado

Mais procurados

IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET Journal
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - IntroductionChristian Perone
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...ijnlc
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchBhaskar Mitra
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articlesijma
 
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...Sebastian Ruder
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationEugene Nho
 
WCLOUDVIZ: Word Cloud Visualization of Indonesian News Articles Classificatio...
WCLOUDVIZ: Word Cloud Visualization of Indonesian News Articles Classificatio...WCLOUDVIZ: Word Cloud Visualization of Indonesian News Articles Classificatio...
WCLOUDVIZ: Word Cloud Visualization of Indonesian News Articles Classificatio...TELKOMNIKA JOURNAL
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modellingcsandit
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Sebastian Ruder
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYijnlc
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 

Mais procurados (20)

Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
 
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...
 
NLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic ClassificationNLP Project: Paragraph Topic Classification
NLP Project: Paragraph Topic Classification
 
WCLOUDVIZ: Word Cloud Visualization of Indonesian News Articles Classificatio...
WCLOUDVIZ: Word Cloud Visualization of Indonesian News Articles Classificatio...WCLOUDVIZ: Word Cloud Visualization of Indonesian News Articles Classificatio...
WCLOUDVIZ: Word Cloud Visualization of Indonesian News Articles Classificatio...
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Does sizematter
Does sizematterDoes sizematter
Does sizematter
 
Ir models
Ir modelsIr models
Ir models
 
Canini09a
Canini09aCanini09a
Canini09a
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 

Semelhante a Merging controlled vocabularies through semantic alignment based on linked data

Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksLeonardo Di Donato
 
The Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for HindiThe Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for Hindisinghg77
 
An Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationAn Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationSurabhi Verma
 
Automatize Document Topic And Subtopic Detection With Support Of A Corpus
Automatize Document Topic And Subtopic Detection With Support Of A CorpusAutomatize Document Topic And Subtopic Detection With Support Of A Corpus
Automatize Document Topic And Subtopic Detection With Support Of A CorpusRichard Hogue
 
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...Kim Daniels
 
Improving Text Categorization with Semantic Knowledge in Wikipedia
Improving Text Categorization with Semantic Knowledge in WikipediaImproving Text Categorization with Semantic Knowledge in Wikipedia
Improving Text Categorization with Semantic Knowledge in Wikipediachjshan
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextFulvio Rotella
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextUniversity of Bari (Italy)
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...Hiroki Shimanaka
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003Ajay Ohri
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 

Semelhante a Merging controlled vocabularies through semantic alignment based on linked data (20)

1 l5eng
1 l5eng1 l5eng
1 l5eng
 
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasksTopic Modeling for Information Retrieval and Word Sense Disambiguation tasks
Topic Modeling for Information Retrieval and Word Sense Disambiguation tasks
 
The Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for HindiThe Essay Scoring Tool (TEST) for Hindi
The Essay Scoring Tool (TEST) for Hindi
 
An Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense DisambiguationAn Improved Approach to Word Sense Disambiguation
An Improved Approach to Word Sense Disambiguation
 
Automatize Document Topic And Subtopic Detection With Support Of A Corpus
Automatize Document Topic And Subtopic Detection With Support Of A CorpusAutomatize Document Topic And Subtopic Detection With Support Of A Corpus
Automatize Document Topic And Subtopic Detection With Support Of A Corpus
 
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
A Comparative Analysis Of The Entropy And Transition Point Approach In Repres...
 
Improving Text Categorization with Semantic Knowledge in Wikipedia
Improving Text Categorization with Semantic Knowledge in WikipediaImproving Text Categorization with Semantic Knowledge in Wikipedia
Improving Text Categorization with Semantic Knowledge in Wikipedia
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from TextCooperating Techniques for Extracting Conceptual Taxonomies from Text
Cooperating Techniques for Extracting Conceptual Taxonomies from Text
 
W17 5406
W17 5406W17 5406
W17 5406
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
Distributional semantics
Distributional semanticsDistributional semantics
Distributional semantics
 
Networks and Natural Language Processing
Networks and Natural Language ProcessingNetworks and Natural Language Processing
Networks and Natural Language Processing
 
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
[Paper Reading] Supervised Learning of Universal Sentence Representations fro...
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003
 
E-text in EFL - Four flavours
E-text in EFL - Four flavoursE-text in EFL - Four flavours
E-text in EFL - Four flavours
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 

Último

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 

Último (20)

Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 

Merging controlled vocabularies through semantic alignment based on linked data

  • 1. Merging controlled vocabularies through semantic alignment based on linked data Authors: Konstantinos Kyprianos, Ioannis Papadakis IONIAN UNIVERSITY DEPARTMENT OF ARCHIVES, LIBRARY SCIENCE AND MUSEOLOGY Ioannou Theotoki 72, 49100, Corfu 1
  • 2. Presentation outline         Introduction Proposed approach Proof of concept Deployment of the proposed approach Deployment results Comparative evaluation Conclusions Future work 2
  • 3. Introduction (1/2)  Controlled vocabularies are predefined lists of words for knowledge organization and the description of libraries’ collections  Creation of semantically similar yet syntactically and linguistically heterogeneous controlled vocabularies with overlapping parts  Matching tools and techniques: Lexical similarity  Matching tools and techniques: Semantic alignment ◦ Compares terms according to the order of their characters ◦ Edit – distance, prefix / suffix variations, n-grams etc. ◦ Based on semantic techniques to identify similar terms between two structured vocabularies 3
  • 4. Introduction (2/2)  Our approach: Methodology to bring together semantically similar yet different vocabularies through the semantic alignment of the underlying terms with the employment of LOD technologies ◦ Semantic alignment is achieved through external linguistic datasets ◦ There is no requirement of any kind of structure (schema or ontology) to the compared datasets 4
  • 5. Proposed approach • S is the set of terms in Source dataset • T is the set of terms in Target dataset • L is the set of terms in the Linguistic dataset • L’ is the set of terms that are found to be linguistically associated with some terms of the Source dataset • L’’ is the set of terms in L that are found to be semantically associated with some terms of the L’ • T’ contains the terms in T that are linguistically associated with some terms of L’ and L’’ 5
  • 6. Proof of concept (1/2)  University of Piraeus digital library (Dione) ◦ Theses and dissertations ◦ 3,323 bilingual subject headings ◦ DSpace installation  New York Times – NYT ◦ Approximately 10.000 subject headings ◦ Journal articles  DBpedia ◦ Extracts structured information from Wikipedia ◦ 3,5 million entities  WordNet ◦ Lexical database ◦ Consists of synsets (~117,659 distinct concepts containing terms interlinked through conceptual-semantic relations) 6
  • 7. Proof of concept (2/2) 1. let the source dataset S be D (i.e. Dione) 2. let the target dataset T be N (i.e. NYT) 3. let the linguistic datasetA L be DB (i.e. DBpedia) and 4. let the linguistic datasetB L be W (i.e.WordNet) 5. D1’ corresponds to S’, assuming that the linguistic dataset L is DB. In a similar manner, D2’ corresponds to S’, assuming that the linguistic dataset L is W. 6. DB’ and DB’’ correspond to L’ and L’’ respectively, assuming that the linguistic dataset L is DB. In a similar manner, W’ and W’’ correspond to L’ and L’’ respectively, assuming that the linguistic dataset L is W. 7. N1’ corresponds to T’, assuming that the linguistic dataset L is DB. In a similar manner, N2’ corresponds to T’ assuming that the linguistic dataset L is W. 7
  • 8. Deployment of the proposed approach   ◦ ◦ ◦ Google Refine Tool to manipulate tabular data Reconciliation of data with existent knowledge bases RDF extension Process 1. 2. 3. 4. 5. 6. Subject headings from Dione are imported to Google Refine DBpedia and WordNet endpoints are registered in Google Refine as SPARQL reconciliation services The subject headings of Dione are linguistically matched (i.e. lexical similarity) against DBpedia’s and WordNet’s reconciliation services creating the corresponding subsets The terms in the subsets of step 3 are extended with semantically equivalent terms (i.e. semantic alignment) deriving from the rest of DBpedia and WordNet Subject headings from NYT are imported to Google Refine The subject headings of NYT are linguistically matched (i.e. lexical similarity) against the terms belonging to the subsets that are described in steps 3 and 4 8
  • 9. Deployment results (1/2) Linguistically matched terms between  ◦ ◦ Dione and DBpedia Dione and Wordnet through lexical similarity techniques Dione DBpedia WordNet One-word Subject Headings 331 (29%) 297 (65%) Two-words Subject Headings 658 (59%) 128 (28%) Subject Headings with 3+ words 130 (12%) 30 (7%) Subject Headings with Subdivisions 0 0 Sum (1,574) 1,119 455 9
  • 10. Deployment results (2/2) D = 3,323 terms D D2’ D1’ 1119 DB DB’’ 455 DB’ W’’ W’ W 986 5,700 72 86 77 45 N N = 10,000 terms N1’ = 163 N2’ = 117 10
  • 11. Comparative evaluation (1/4)  The proposed methodology is compared against an algorithm (introduced in a previous work*) addressed to Dione and NYT based only on lexical similarity techniques  Dione and NYT are not described by schemas. Thus, any attempt to merge their underlying terms cannot be based on traditional ontologyalignment techniques *Papadakis, I., Kyprianos, K.: Merging Controlled Vocabularies for More Efficient Subject-based Search. International Journal of Knowledge Management. 7(3), 76-90, July-September (2011) 11
  • 12. Comparative evaluation (2/4) List A List B 207 280 List A. Previous work: only lexically matched pairs between Dione and NYT List B. Proposed work: lexically AND semantically matched pairs between Dione and NYT 12
  • 13. Comparative evaluation (3/4) List B List A 27 180 100 List A ∧ List B = 180 terms 13
  • 14. Comparative evaluation (4/4) Matched pairs List A List B D1-NYT1  (lexical)  (lexical) …  (lexical)  (lexical) D158-NYT158  (lexical)  (lexical) …  (lexical)  (semantic) D180-NYT180  (lexical)  (semantic) …   (semantic) D280-NYT280   (semantic) …  (lexical)  D307-NYT307  (lexical)  TOTAL: No. of pairs 158 180 22 100 27 307 14
  • 15. Conclusions  A methodology was presented that is capable of finding equivalent terms between semantically similar controlled vocabularies  Lexical similarities discovery and semantic alignment through external LOD datasets  Google Refine renders the deployment of the proposed methodology as a straightforward process that can be applied to other cases aiming in discovering equivalent terms in different yet semantically similar datasets  The deployment of the proposed methodology is facilitated through the employment of linked data technologies 15
  • 16. Future work  Future work is targeted towards the reconciliation of Dione’s subject headings with linked data services such as French National Library (RAMEAU), German National Library (GND), Biblioteca National de Espana (BNE) and LIBRIS. 16
  • 17. Thank you for your attention! Questions? 17