SlideShare uma empresa Scribd logo
1 de 25
A Similarity Measure Based on Semantic and Linguistic Information Nitish Aggarwal DERI, NUI Galway firstname.lastname@deri.org Wednesday,15th June, 2011 DERI, Reading Group 1
Based On: “A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness” Authors: Giuseppe Pirro and JeoromeEuzenat Published: International Semantic Web Conference, 2010 “SyMSS: A syntax-based measure for short-text semantic similarity ” Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011  2
Overview Introduction Classical Approaches Ontology-based Similarity Set of relations  Information Content SyMSS (Syntax-based) Deep Parsing  Influence of adjectives and adverbs Conclusion 3
Introduction & Motivation Short-text Similarity Lack of Semantics and Linguistics Applications Semantic Annotation Semantic Search Information Retrieval and Extraction 4
Classical Approaches String Similarity Levenshteindistance, Dice Coefficient Corpus-based ESA, Google distance,Vector-Space Model Ontology-based Path distance, Information content Syntax Similarity Word-order, Part of Speech 5
First Paper: “A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness” Authors: Giuseppe Pirro and JeoromeEuzenat Published: International Semantic Web Conference, 2010 “SyMSS: A syntax-based measure for short-text semantic similarity ” Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011 6
Ontology-based - Overview Features Whole set of semantic relations defined in an ontology Resnik’s Information Content IC(c) = -log p(c) Intrinsic Information Content Overcome the analysis of large corpora Extended Information Content Map feature-based model to information theoretic domain 7
Ontology-based - Why whole set? 8 Relation: Part of Eyes Ears
Ontology-based - model Tversky’s feature-based similarity model common features of two concepts ~ similarity Extra feature ~ 1/similarity . Ratio-base formulation of Tverky’s model . 9
Ontology-based - Mapping 1 10 ,[object Object],1. MSCA:  Most Specific Common Abstraction
Ontology-based - Example 11 T1: Car T2: Bicycle 		Example of Concept Feature
Ontology-based - Example 12 T1: Car T2: Bicycle 		Example of Concept Feature
Ontology-based - Framework Intrinsic information content(iIC) . where sub(c) is number of sub-concept of given concept c. Extended information content(eIC) where EIC(c) is relatedness coefficient using all kind of relations 13
DataSet: 65 human evaluated pairs Correlation values: 14 Ontology-based – Evaluation of Similarity
Ontology-based – Evaluation of Relatedness DataSet : Wordnet 353 Correlationvalue: 15
Ontology-based - Summary Intrinsic similarity measure  Ontology-based similarity Outperforms corpus measures Limitation No short-text Model-based E,g, only concepts in the ontology are considered (e.g. car accident) 16
Second paper (SyMSS) “A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness” Authors: Giuseppe Pirro and JeoromeEuzenat Published: International Semantic Web Conference, 2010 “SyMSS: A syntax-based measure for short-text semantic similarity ” Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011  17
SyMSS - Overview SyMSS = “syntax-based similarity for short-term text” Syntactic Information Not only word order Deep Parsing Parts of speech Semantic Information Wordnet similarity Different ontology-based similarity 18
SyMSS - Semantic Information Path-base measure Shortest path Hirst and st. Onge (HSO) Information Content Resnik measure  Jiang and Corath measure Lin measure Gloss-base measure Gloss Overlap and Gloss vector 19
SyMSS - Syntactic Information Parse tree  phrases Head of phrases Head similarity Head of phrases which have same syntactic function  Penalization factor Non shared phrases 20
SyMSS - Model My brother has a dog with four legs My brother has four legs Sim(Has,Has) = 1 Sim(brother,brother) = 1 Sim(dog,leg) = 0.1414 PF = 0.03
SyMSS - Evaluation DataSet: 30 pairs out of 65 human evaluated pairs Correlation values: 22
SyMSS - Effect of adverb and adjective Sentence1: ”I have a big dog” Sentence2: ”I have a little dog” 8.68% gain in SyMSS with HSO 23
SyMSS - Summary Syntax-based similarity considers… Nouns and verbs Influence of adjectives and adverbs Limitation Depend on parsed structure E.g. not grammatically correct Depend on word similarity 24
Conclusion No established method for short text Parsing of phrases is difficult Concept similarity depend on model Weak model E.g. xebr: Extraordinary Income and xebr: Other Operating Income -> Pathlength = 0.2 and Expert = 0.8  Need a syntactic similarity for concepts tag (word or phrase)  25

Mais conteúdo relacionado

Semelhante a A similarity measure based on semantic and linguistic information

Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
Saswat Padhi
 
Mining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network ResearchMining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network Research
Marko Rodriguez
 
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
IJDKP
 

Semelhante a A similarity measure based on semantic and linguistic information (20)

IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATAIDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
IDENTIFYING THE SEMANTIC RELATIONS ON UNSTRUCTURED DATA
 
Identifying the semantic relations on
Identifying the semantic relations onIdentifying the semantic relations on
Identifying the semantic relations on
 
Semantic Integration for Heterogeneous Domain-specific Information: The NIF Case
Semantic Integration for Heterogeneous Domain-specific Information: The NIF CaseSemantic Integration for Heterogeneous Domain-specific Information: The NIF Case
Semantic Integration for Heterogeneous Domain-specific Information: The NIF Case
 
Relationship Web: Trailblazing, Analytics and Computing for Human Experience
Relationship Web: Trailblazing, Analytics and Computing for Human ExperienceRelationship Web: Trailblazing, Analytics and Computing for Human Experience
Relationship Web: Trailblazing, Analytics and Computing for Human Experience
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 
Mining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network ResearchMining and Supporting Community Structures in Sensor Network Research
Mining and Supporting Community Structures in Sensor Network Research
 
ASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITY
ASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITYASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITY
ASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITY
 
ASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITY
ASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITYASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITY
ASSESSING SIMILARITY BETWEEN ONTOLOGIES: THE CASE OF THE CONCEPTUAL SIMILARITY
 
Semantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender SystemsSemantics-aware Content-based Recommender Systems
Semantics-aware Content-based Recommender Systems
 
NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生NAISTビッグデータシンポジウム - 情報 松本先生
NAISTビッグデータシンポジウム - 情報 松本先生
 
The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...The impact of standardized terminologies and domain-ontologies in multilingua...
The impact of standardized terminologies and domain-ontologies in multilingua...
 
Learning Relations from Social Tagging Data
Learning Relations from Social Tagging DataLearning Relations from Social Tagging Data
Learning Relations from Social Tagging Data
 
Enhancing Semantic Mining
Enhancing Semantic MiningEnhancing Semantic Mining
Enhancing Semantic Mining
 
About Correlation Technology
About Correlation TechnologyAbout Correlation Technology
About Correlation Technology
 
A Topic map-based ontology IR system versus Clustering-based IR System: A Com...
A Topic map-based ontology IR system versus Clustering-based IR System: A Com...A Topic map-based ontology IR system versus Clustering-based IR System: A Com...
A Topic map-based ontology IR system versus Clustering-based IR System: A Com...
 
Text Analytics for Semantic Computing
Text Analytics for Semantic ComputingText Analytics for Semantic Computing
Text Analytics for Semantic Computing
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approach
 
Co word analysis
Co word analysisCo word analysis
Co word analysis
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
 
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...
 

Último

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Último (20)

INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 

A similarity measure based on semantic and linguistic information

  • 1. A Similarity Measure Based on Semantic and Linguistic Information Nitish Aggarwal DERI, NUI Galway firstname.lastname@deri.org Wednesday,15th June, 2011 DERI, Reading Group 1
  • 2. Based On: “A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness” Authors: Giuseppe Pirro and JeoromeEuzenat Published: International Semantic Web Conference, 2010 “SyMSS: A syntax-based measure for short-text semantic similarity ” Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011 2
  • 3. Overview Introduction Classical Approaches Ontology-based Similarity Set of relations Information Content SyMSS (Syntax-based) Deep Parsing Influence of adjectives and adverbs Conclusion 3
  • 4. Introduction & Motivation Short-text Similarity Lack of Semantics and Linguistics Applications Semantic Annotation Semantic Search Information Retrieval and Extraction 4
  • 5. Classical Approaches String Similarity Levenshteindistance, Dice Coefficient Corpus-based ESA, Google distance,Vector-Space Model Ontology-based Path distance, Information content Syntax Similarity Word-order, Part of Speech 5
  • 6. First Paper: “A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness” Authors: Giuseppe Pirro and JeoromeEuzenat Published: International Semantic Web Conference, 2010 “SyMSS: A syntax-based measure for short-text semantic similarity ” Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011 6
  • 7. Ontology-based - Overview Features Whole set of semantic relations defined in an ontology Resnik’s Information Content IC(c) = -log p(c) Intrinsic Information Content Overcome the analysis of large corpora Extended Information Content Map feature-based model to information theoretic domain 7
  • 8. Ontology-based - Why whole set? 8 Relation: Part of Eyes Ears
  • 9. Ontology-based - model Tversky’s feature-based similarity model common features of two concepts ~ similarity Extra feature ~ 1/similarity . Ratio-base formulation of Tverky’s model . 9
  • 10.
  • 11. Ontology-based - Example 11 T1: Car T2: Bicycle Example of Concept Feature
  • 12. Ontology-based - Example 12 T1: Car T2: Bicycle Example of Concept Feature
  • 13. Ontology-based - Framework Intrinsic information content(iIC) . where sub(c) is number of sub-concept of given concept c. Extended information content(eIC) where EIC(c) is relatedness coefficient using all kind of relations 13
  • 14. DataSet: 65 human evaluated pairs Correlation values: 14 Ontology-based – Evaluation of Similarity
  • 15. Ontology-based – Evaluation of Relatedness DataSet : Wordnet 353 Correlationvalue: 15
  • 16. Ontology-based - Summary Intrinsic similarity measure Ontology-based similarity Outperforms corpus measures Limitation No short-text Model-based E,g, only concepts in the ontology are considered (e.g. car accident) 16
  • 17. Second paper (SyMSS) “A Feature and Information Theoretic Framework for Semantic Similarity and Relatedness” Authors: Giuseppe Pirro and JeoromeEuzenat Published: International Semantic Web Conference, 2010 “SyMSS: A syntax-based measure for short-text semantic similarity ” Author: J. Oliva, J. Serrano, M. Castillo, and Ángel Iglesias Published: Journal Data & Knowledge Engineering, Volume 70 Issue 4 April,2011 17
  • 18. SyMSS - Overview SyMSS = “syntax-based similarity for short-term text” Syntactic Information Not only word order Deep Parsing Parts of speech Semantic Information Wordnet similarity Different ontology-based similarity 18
  • 19. SyMSS - Semantic Information Path-base measure Shortest path Hirst and st. Onge (HSO) Information Content Resnik measure Jiang and Corath measure Lin measure Gloss-base measure Gloss Overlap and Gloss vector 19
  • 20. SyMSS - Syntactic Information Parse tree phrases Head of phrases Head similarity Head of phrases which have same syntactic function Penalization factor Non shared phrases 20
  • 21. SyMSS - Model My brother has a dog with four legs My brother has four legs Sim(Has,Has) = 1 Sim(brother,brother) = 1 Sim(dog,leg) = 0.1414 PF = 0.03
  • 22. SyMSS - Evaluation DataSet: 30 pairs out of 65 human evaluated pairs Correlation values: 22
  • 23. SyMSS - Effect of adverb and adjective Sentence1: ”I have a big dog” Sentence2: ”I have a little dog” 8.68% gain in SyMSS with HSO 23
  • 24. SyMSS - Summary Syntax-based similarity considers… Nouns and verbs Influence of adjectives and adverbs Limitation Depend on parsed structure E.g. not grammatically correct Depend on word similarity 24
  • 25. Conclusion No established method for short text Parsing of phrases is difficult Concept similarity depend on model Weak model E.g. xebr: Extraordinary Income and xebr: Other Operating Income -> Pathlength = 0.2 and Expert = 0.8 Need a syntactic similarity for concepts tag (word or phrase) 25