This document presents an approach for multilingual fine-grained entity typing using Wikipedia text and DBpedia taxonomy. Feature vectors are generated from entity mentions, surrounding text, and type information to train a model using fastText. The approach is tested on Dutch and Spanish, achieving results comparable to prior work on English datasets. Challenges include incomplete type coverage in DBpedia and cultural differences in entity types between languages. Code and experiments are available online.
2. Take-home message
• Fine-grained entity typing is valuable for further
downstream NLP tasks
• Wikipedia text + DBpedia taxonomy + embeddings
enable distantly supervised fine-grained entity typing
• Experiments for Dutch and Spanish
• Code and experiments available at: https://github.com/
cltl/multilingual-finegrained-entity-typing
3. Why fine-grained entity typing?
• Traditional NERC approaches discern limited number of
types:
• CoNLL: Person, Organisation, Location, Miscellaneous
• ACE: Person, Organisation, Location, Facility, Weapon,
Vehicle and Geo-Political Entity
• Downstream NLP tasks may benefit from more specific
entity types, e.g.:
• relation extraction, coreference resolution, entity linking
4. Why fine-grained entity typing?
Paul Noonan
(Singer/songwriter)
Paul Noonan
(Failure Analysis Engineer)
5. Approach
El rodaje tuvo lugar en
varios lugares, entre
ellos Londres y Cardiff. place, londres, Xxxxxxx
place, cardiff, Xxxxxxx,
...
Model
?, swansea, Xxxxxxx
place
Labelled text from Wikipedia
Entity types from DBpedia
Feature vectors
Test instance
Predicted label
6. Approach
• Wikipedia links provide entity mentions
• Surrounding text provides context to
these entity mentions
• DBpedia provides type information to
entity mentions
• FIGER (Ling & Weld, 2012) & GFT (Gillick
et al. 2014) map types to Freebase via
Wikipedia categories: error prone
El rodaje tuvo lugar en
varios lugares, entre
ellos Londres y Cardiff.
Labelled text from Wikipedia
Entity types from DBpedia
7. Approach
• Context + entity mention + type information are used to
generate feature vectors
• Features based on previous work for English
place, londres, Xxxxxxx
place, cardiff, Xxxxxxx,
...
Feature vectors
8. Approach
• A model is trained using the Facebook
fastText algorithm
• Inspired by word2vec cbow model
• Incorporates character n-grams:
useful for morphologically rich
languages (such as Dutch and
Spanish)
Model
9. Approach
• The model is tested using a held-out
dataset
• 1/3 of all generated data
Model
?, swansea, Xxxxxxx
place
Test instance
Predicted label
12. Fine-grained entity types
← Ling & Weld, 2012
113 types listed
45 present in test data including
/livingthing/animal, /living_thing and
/transportation/road
Gillick et al., 2014 →
89 types listed
39 present in test data
16. • Sample errors
Mention Gold Standard Prediction
Kaeso Fabius Person OfficeHolder
Lebeuville Municipality Settlement
Lodewijk Bruckman Artist Writer
Hyde Park Corner MetroStation Park
K.I.M. University Organisation
Francine Van Assche Athlete Actor
Haackgreerius miopus Reptile Insect
Wahab Akbar Politician Family
Baureihe 211 Locomotive Train
Congrier Municipality Settlement
SPA-Viberti AS.42 MilitaryVehicle Automobile
Moissey Kogan Artist FictionalCharacter
aluminiumplaat ChemicalElement ChemicalCompound
Jacob Black FictionalCharacter MusicalArtist
Sotla River Mountain
Nowa Deba PopulatedPlace TelevisionShow
Dean Woods Cyclist Actor
Abdullah the Butcher Wrestler FictionalCharacter
Ratislav Mores SoccerPlayer MusicalArtist
Christophe Laborie Cyclist PoliticalParty
17. DBpedia Type Coverage
• Not all 685 DBpedia classes are present in type file:
• 269 in Dutch DBpedia
• 143 Spanish DBpedia
• Type file only contains most specific class:
• http://nl.dbpedia.org/resource/Cheddar_(kaas) has type
“dbo:Cheese” in type file, “dbo:Food” needs to be inferred
(work in progress)
• Cultural differences:
• College sports are almost entirely absent in the Netherlands,
thus unlikely to find mentions of type
“dbo:NationalCollegiateAthleticAssociationAthlete”
18. Types and Roles
• DBpedia ontology adheres to single type per entity
• dbpedia:Arnold_Schwarzenegger is
dbo:OfficeHolder
• yago:Actor, yago:BodyBuilder, yago:Emigrant
• Trade-off:
• multiple types/roles can facilitate contextual typing
• may also introduce noise in the training data
19. Conclusions and future work
• Despite incomplete type coverage, Wikipedia + DBpedia
form a good basis for fine-grained entity typing
• Links between English and Dutch and Spanish DBpedia
versions may be leveraged to increase coverage
• DBpedia hierarchy is useful in generic setting
• But still has coverage gaps such as ‘cuisine’ and
‘education’
• Explore other hierarchies
21. References
• Gillick, D., Lazic, N., Ganchev, K., Kirchner, J., Huynh, D.:
Context-dependent fine-grained entity type tagging. arXiv (2014)
• Ling, X., Weld, D.S.: Fine-grained entity recognition. In: AAAI
(2012)
• Yogatama, D., Gillick, D., Lazic, N.: Embedding methods for fine
grained entity type classification. In: Proceedings of the 53rd
Annual Meeting of the Association for Computational Linguistics
and the 7th International Joint Conference on Natural Language
Processing (ACL-IJCNLP 2015), Short papers, Bejing, China,
26–31 July 2015, pp. 291–296. Association for Computational
Linguistics (2015)
22. Image sources
• Tree of Life: https://upload.wikimedia.org/wikipedia/commons/b/bc/
Haeckel_arbol_bn.png
• Paul Noonan (singer/songrwiter): http://images.entertainment.ie/
images_content/rectangle/620x372/paulnoonan.jpg
• Paul Noonan (failure analysis engineer): http://www.iopireland.org/careers/life/
page_49377.html
• SPA-Viberti AS.42: https://upload.wikimedia.org/wikipedia/commons/thumb/
2/23/AS42-1.gif/250px-AS42-1.gif
• Abdullah the Butcher: http://i.ebayimg.com/thumbs/images/g/
H1kAAOSwbopZPupi/s-l200.jpg