SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
Multilingual Fine-grained
Entity Typing
Marieke van Erp
Piek Vossen
Take-home message
• Fine-grained entity typing is valuable for further
downstream NLP tasks
• Wikipedia text + DBpedia taxonomy + embeddings
enable distantly supervised fine-grained entity typing
• Experiments for Dutch and Spanish
• Code and experiments available at: https://github.com/
cltl/multilingual-finegrained-entity-typing
Why fine-grained entity typing?
• Traditional NERC approaches discern limited number of
types:
• CoNLL: Person, Organisation, Location, Miscellaneous
• ACE: Person, Organisation, Location, Facility, Weapon,
Vehicle and Geo-Political Entity
• Downstream NLP tasks may benefit from more specific
entity types, e.g.:
• relation extraction, coreference resolution, entity linking
Why fine-grained entity typing?
Paul Noonan
(Singer/songwriter)
Paul Noonan
(Failure Analysis Engineer)
Approach
El rodaje tuvo lugar en
varios lugares, entre
ellos Londres y Cardiff. place, londres, Xxxxxxx
place, cardiff, Xxxxxxx,
...
Model
?, swansea, Xxxxxxx
place
Labelled text from Wikipedia
Entity types from DBpedia
Feature vectors
Test instance
Predicted label
Approach
• Wikipedia links provide entity mentions
• Surrounding text provides context to
these entity mentions
• DBpedia provides type information to
entity mentions
• FIGER (Ling & Weld, 2012) & GFT (Gillick
et al. 2014) map types to Freebase via
Wikipedia categories: error prone
El rodaje tuvo lugar en
varios lugares, entre
ellos Londres y Cardiff.
Labelled text from Wikipedia
Entity types from DBpedia
Approach
• Context + entity mention + type information are used to
generate feature vectors
• Features based on previous work for English
place, londres, Xxxxxxx
place, cardiff, Xxxxxxx,
...
Feature vectors
Approach
• A model is trained using the Facebook
fastText algorithm
• Inspired by word2vec cbow model
• Incorporates character n-grams:
useful for morphologically rich
languages (such as Dutch and
Spanish)
Model
Approach
• The model is tested using a held-out
dataset
• 1/3 of all generated data
Model
?, swansea, Xxxxxxx
place
Test instance
Predicted label
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
Fine-grained entity types
← Ling & Weld, 2012
113 types listed
45 present in test data including
/livingthing/animal, /living_thing and
/transportation/road
Gillick et al., 2014 →
89 types listed
39 present in test data
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
Results
Comparative results on English GFT dataset
← Gillick et al. 2014 & ↑Yogatama et al. 2015
• Sample errors
Mention Gold Standard Prediction
Kaeso Fabius Person OfficeHolder
Lebeuville Municipality Settlement
Lodewijk Bruckman Artist Writer
Hyde Park Corner MetroStation Park
K.I.M. University Organisation
Francine Van Assche Athlete Actor
Haackgreerius miopus Reptile Insect
Wahab Akbar Politician Family
Baureihe 211 Locomotive Train
Congrier Municipality Settlement
SPA-Viberti AS.42 MilitaryVehicle Automobile
Moissey Kogan Artist FictionalCharacter
aluminiumplaat ChemicalElement ChemicalCompound
Jacob Black FictionalCharacter MusicalArtist
Sotla River Mountain
Nowa Deba PopulatedPlace TelevisionShow
Dean Woods Cyclist Actor
Abdullah the Butcher Wrestler FictionalCharacter
Ratislav Mores SoccerPlayer MusicalArtist
Christophe Laborie Cyclist PoliticalParty
DBpedia Type Coverage
• Not all 685 DBpedia classes are present in type file:
• 269 in Dutch DBpedia
• 143 Spanish DBpedia
• Type file only contains most specific class:
• http://nl.dbpedia.org/resource/Cheddar_(kaas) has type
“dbo:Cheese” in type file, “dbo:Food” needs to be inferred
(work in progress)
• Cultural differences:
• College sports are almost entirely absent in the Netherlands,
thus unlikely to find mentions of type
“dbo:NationalCollegiateAthleticAssociationAthlete”
Types and Roles
• DBpedia ontology adheres to single type per entity
• dbpedia:Arnold_Schwarzenegger is
dbo:OfficeHolder
• yago:Actor, yago:BodyBuilder, yago:Emigrant
• Trade-off:
• multiple types/roles can facilitate contextual typing
• may also introduce noise in the training data
Conclusions and future work
• Despite incomplete type coverage, Wikipedia + DBpedia
form a good basis for fine-grained entity typing
• Links between English and Dutch and Spanish DBpedia
versions may be leveraged to increase coverage
• DBpedia hierarchy is useful in generic setting
• But still has coverage gaps such as ‘cuisine’ and
‘education’
• Explore other hierarchies
https://github.com/cltl/multilingual-finegrained-entity-typing
References
• Gillick, D., Lazic, N., Ganchev, K., Kirchner, J., Huynh, D.:
Context-dependent fine-grained entity type tagging. arXiv (2014)
• Ling, X., Weld, D.S.: Fine-grained entity recognition. In: AAAI
(2012)
• Yogatama, D., Gillick, D., Lazic, N.: Embedding methods for fine
grained entity type classification. In: Proceedings of the 53rd
Annual Meeting of the Association for Computational Linguistics
and the 7th International Joint Conference on Natural Language
Processing (ACL-IJCNLP 2015), Short papers, Bejing, China,
26–31 July 2015, pp. 291–296. Association for Computational
Linguistics (2015)
Image sources
• Tree of Life: https://upload.wikimedia.org/wikipedia/commons/b/bc/
Haeckel_arbol_bn.png
• Paul Noonan (singer/songrwiter): http://images.entertainment.ie/
images_content/rectangle/620x372/paulnoonan.jpg
• Paul Noonan (failure analysis engineer): http://www.iopireland.org/careers/life/
page_49377.html
• SPA-Viberti AS.42: https://upload.wikimedia.org/wikipedia/commons/thumb/
2/23/AS42-1.gif/250px-AS42-1.gif
• Abdullah the Butcher: http://i.ebayimg.com/thumbs/images/g/
H1kAAOSwbopZPupi/s-l200.jpg

Mais conteúdo relacionado

Semelhante a Multilingual Fine-grained Entity Typing

This talk lasts 三十分钟
This talk lasts 三十分钟This talk lasts 三十分钟
This talk lasts 三十分钟thepilif
 
Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Janifer Gatenby
 
Controlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationControlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationTobias Kuhn
 
Laura Dent: Single-Source and Localization
Laura Dent: Single-Source and LocalizationLaura Dent: Single-Source and Localization
Laura Dent: Single-Source and LocalizationJack Molisani
 
Best Practices for Software Localization
Best Practices for Software LocalizationBest Practices for Software Localization
Best Practices for Software LocalizationLionbridge
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Project
 
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...David Beazley (Dabeaz LLC)
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana
 
apidays LIVE India 2022_Creating API documentation for international communit...
apidays LIVE India 2022_Creating API documentation for international communit...apidays LIVE India 2022_Creating API documentation for international communit...
apidays LIVE India 2022_Creating API documentation for international communit...apidays
 
Search-Driven Programming
Search-Driven ProgrammingSearch-Driven Programming
Search-Driven ProgrammingEthan Herdrick
 
Preparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsPreparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsHPCC Systems
 
Single-Sourcing and Localization
Single-Sourcing and LocalizationSingle-Sourcing and Localization
Single-Sourcing and LocalizationLaura Dent
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesChristophe Guéret
 
Linked Open Data Cloud
Linked Open Data CloudLinked Open Data Cloud
Linked Open Data CloudPretaLLOD
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesZoltan Varju
 
Introductiontogooglehacking part1
Introductiontogooglehacking part1Introductiontogooglehacking part1
Introductiontogooglehacking part1hacklessons
 
What if-your-application-could-speak, by Marcos Silveira
What if-your-application-could-speak, by Marcos SilveiraWhat if-your-application-could-speak, by Marcos Silveira
What if-your-application-could-speak, by Marcos SilveiraThoughtworks
 
What if-your-application-could-speak
What if-your-application-could-speakWhat if-your-application-could-speak
What if-your-application-could-speakMarcos Vinícius
 

Semelhante a Multilingual Fine-grained Entity Typing (20)

This talk lasts 三十分钟
This talk lasts 三十分钟This talk lasts 三十分钟
This talk lasts 三十分钟
 
Multilingualism ifla 2014 08
Multilingualism ifla 2014 08Multilingualism ifla 2014 08
Multilingualism ifla 2014 08
 
Controlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for StandardizationControlled Natural Language and Opportunities for Standardization
Controlled Natural Language and Opportunities for Standardization
 
Laura Dent: Single-Source and Localization
Laura Dent: Single-Source and LocalizationLaura Dent: Single-Source and Localization
Laura Dent: Single-Source and Localization
 
Best Practices for Software Localization
Best Practices for Software LocalizationBest Practices for Software Localization
Best Practices for Software Localization
 
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The ServicesLynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
Lynx Webinar #4: Lynx Services Platform (LySP) - Part 2 - The Services
 
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
Why Extension Programmers Should Stop Worrying About Parsing and Start Thinki...
 
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
Europeana meeting under Finland’s Presidency of the Council of the EU - Day 2...
 
apidays LIVE India 2022_Creating API documentation for international communit...
apidays LIVE India 2022_Creating API documentation for international communit...apidays LIVE India 2022_Creating API documentation for international communit...
apidays LIVE India 2022_Creating API documentation for international communit...
 
Galichet XML Workflow Brief History NISO
Galichet XML Workflow Brief History NISOGalichet XML Workflow Brief History NISO
Galichet XML Workflow Brief History NISO
 
First Stages and challenges of LibreOffice Translation in Hausa Language
First Stages and challenges  of LibreOffice Translation  in Hausa LanguageFirst Stages and challenges  of LibreOffice Translation  in Hausa Language
First Stages and challenges of LibreOffice Translation in Hausa Language
 
Search-Driven Programming
Search-Driven ProgrammingSearch-Driven Programming
Search-Driven Programming
 
Preparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for TranslationsPreparing an Open Source Documentation Repository for Translations
Preparing an Open Source Documentation Repository for Translations
 
Single-Sourcing and Localization
Single-Sourcing and LocalizationSingle-Sourcing and Localization
Single-Sourcing and Localization
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 
Linked Open Data Cloud
Linked Open Data CloudLinked Open Data Cloud
Linked Open Data Cloud
 
Babak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entitiesBabak Rasolzadeh: The importance of entities
Babak Rasolzadeh: The importance of entities
 
Introductiontogooglehacking part1
Introductiontogooglehacking part1Introductiontogooglehacking part1
Introductiontogooglehacking part1
 
What if-your-application-could-speak, by Marcos Silveira
What if-your-application-could-speak, by Marcos SilveiraWhat if-your-application-could-speak, by Marcos Silveira
What if-your-application-could-speak, by Marcos Silveira
 
What if-your-application-could-speak
What if-your-application-could-speakWhat if-your-application-could-speak
What if-your-application-could-speak
 

Mais de Marieke van Erp

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumMarieke van Erp
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebMarieke van Erp
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit Marieke van Erp
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceMarieke van Erp
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesMarieke van Erp
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Marieke van Erp
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research Marieke van Erp
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Marieke van Erp
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchMarieke van Erp
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Marieke van Erp
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsMarieke van Erp
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Marieke van Erp
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Marieke van Erp
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationMarieke van Erp
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Marieke van Erp
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction Marieke van Erp
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...Marieke van Erp
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...Marieke van Erp
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...Marieke van Erp
 
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsEvaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsMarieke van Erp
 

Mais de Marieke van Erp (20)

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH Symposium
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic Web
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and Space
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital Humanities
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologists
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the Conversation
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
 
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and TweetsEvaluating Named Entity Recognition and Disambiguation in News and Tweets
Evaluating Named Entity Recognition and Disambiguation in News and Tweets
 

Último

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxCherry
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfCherry
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCherry
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACherry
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxCherry
 
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsKanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsDeepika Singh
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCherry
 

Último (20)

Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Site specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdfSite specific recombination and transposition.........pdf
Site specific recombination and transposition.........pdf
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Cot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNACot curve, melting temperature, unique and repetitive DNA
Cot curve, melting temperature, unique and repetitive DNA
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot GirlsKanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
Kanchipuram Escorts 🥰 8617370543 Call Girls Offer VIP Hot Girls
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 

Multilingual Fine-grained Entity Typing

  • 2. Take-home message • Fine-grained entity typing is valuable for further downstream NLP tasks • Wikipedia text + DBpedia taxonomy + embeddings enable distantly supervised fine-grained entity typing • Experiments for Dutch and Spanish • Code and experiments available at: https://github.com/ cltl/multilingual-finegrained-entity-typing
  • 3. Why fine-grained entity typing? • Traditional NERC approaches discern limited number of types: • CoNLL: Person, Organisation, Location, Miscellaneous • ACE: Person, Organisation, Location, Facility, Weapon, Vehicle and Geo-Political Entity • Downstream NLP tasks may benefit from more specific entity types, e.g.: • relation extraction, coreference resolution, entity linking
  • 4. Why fine-grained entity typing? Paul Noonan (Singer/songwriter) Paul Noonan (Failure Analysis Engineer)
  • 5. Approach El rodaje tuvo lugar en varios lugares, entre ellos Londres y Cardiff. place, londres, Xxxxxxx place, cardiff, Xxxxxxx, ... Model ?, swansea, Xxxxxxx place Labelled text from Wikipedia Entity types from DBpedia Feature vectors Test instance Predicted label
  • 6. Approach • Wikipedia links provide entity mentions • Surrounding text provides context to these entity mentions • DBpedia provides type information to entity mentions • FIGER (Ling & Weld, 2012) & GFT (Gillick et al. 2014) map types to Freebase via Wikipedia categories: error prone El rodaje tuvo lugar en varios lugares, entre ellos Londres y Cardiff. Labelled text from Wikipedia Entity types from DBpedia
  • 7. Approach • Context + entity mention + type information are used to generate feature vectors • Features based on previous work for English place, londres, Xxxxxxx place, cardiff, Xxxxxxx, ... Feature vectors
  • 8. Approach • A model is trained using the Facebook fastText algorithm • Inspired by word2vec cbow model • Incorporates character n-grams: useful for morphologically rich languages (such as Dutch and Spanish) Model
  • 9. Approach • The model is tested using a held-out dataset • 1/3 of all generated data Model ?, swansea, Xxxxxxx place Test instance Predicted label
  • 10. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 11. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 12. Fine-grained entity types ← Ling & Weld, 2012 113 types listed 45 present in test data including /livingthing/animal, /living_thing and /transportation/road Gillick et al., 2014 → 89 types listed 39 present in test data
  • 13. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 14. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 15. Results Comparative results on English GFT dataset ← Gillick et al. 2014 & ↑Yogatama et al. 2015
  • 16. • Sample errors Mention Gold Standard Prediction Kaeso Fabius Person OfficeHolder Lebeuville Municipality Settlement Lodewijk Bruckman Artist Writer Hyde Park Corner MetroStation Park K.I.M. University Organisation Francine Van Assche Athlete Actor Haackgreerius miopus Reptile Insect Wahab Akbar Politician Family Baureihe 211 Locomotive Train Congrier Municipality Settlement SPA-Viberti AS.42 MilitaryVehicle Automobile Moissey Kogan Artist FictionalCharacter aluminiumplaat ChemicalElement ChemicalCompound Jacob Black FictionalCharacter MusicalArtist Sotla River Mountain Nowa Deba PopulatedPlace TelevisionShow Dean Woods Cyclist Actor Abdullah the Butcher Wrestler FictionalCharacter Ratislav Mores SoccerPlayer MusicalArtist Christophe Laborie Cyclist PoliticalParty
  • 17. DBpedia Type Coverage • Not all 685 DBpedia classes are present in type file: • 269 in Dutch DBpedia • 143 Spanish DBpedia • Type file only contains most specific class: • http://nl.dbpedia.org/resource/Cheddar_(kaas) has type “dbo:Cheese” in type file, “dbo:Food” needs to be inferred (work in progress) • Cultural differences: • College sports are almost entirely absent in the Netherlands, thus unlikely to find mentions of type “dbo:NationalCollegiateAthleticAssociationAthlete”
  • 18. Types and Roles • DBpedia ontology adheres to single type per entity • dbpedia:Arnold_Schwarzenegger is dbo:OfficeHolder • yago:Actor, yago:BodyBuilder, yago:Emigrant • Trade-off: • multiple types/roles can facilitate contextual typing • may also introduce noise in the training data
  • 19. Conclusions and future work • Despite incomplete type coverage, Wikipedia + DBpedia form a good basis for fine-grained entity typing • Links between English and Dutch and Spanish DBpedia versions may be leveraged to increase coverage • DBpedia hierarchy is useful in generic setting • But still has coverage gaps such as ‘cuisine’ and ‘education’ • Explore other hierarchies
  • 21. References • Gillick, D., Lazic, N., Ganchev, K., Kirchner, J., Huynh, D.: Context-dependent fine-grained entity type tagging. arXiv (2014) • Ling, X., Weld, D.S.: Fine-grained entity recognition. In: AAAI (2012) • Yogatama, D., Gillick, D., Lazic, N.: Embedding methods for fine grained entity type classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2015), Short papers, Bejing, China, 26–31 July 2015, pp. 291–296. Association for Computational Linguistics (2015)
  • 22. Image sources • Tree of Life: https://upload.wikimedia.org/wikipedia/commons/b/bc/ Haeckel_arbol_bn.png • Paul Noonan (singer/songrwiter): http://images.entertainment.ie/ images_content/rectangle/620x372/paulnoonan.jpg • Paul Noonan (failure analysis engineer): http://www.iopireland.org/careers/life/ page_49377.html • SPA-Viberti AS.42: https://upload.wikimedia.org/wikipedia/commons/thumb/ 2/23/AS42-1.gif/250px-AS42-1.gif • Abdullah the Butcher: http://i.ebayimg.com/thumbs/images/g/ H1kAAOSwbopZPupi/s-l200.jpg