This document summarizes a survey on unsupervised graph-based word sense disambiguation. It introduces word sense disambiguation and the difference between supervised and unsupervised approaches. It then discusses the state of the art in graph-based word sense disambiguation, including methods like semantic graph construction, PageRank, HITS, and P-Rank. The document describes experiments on Senseval 2 and 3 data sets comparing different graph-based methods. It concludes that recent unsupervised systems minimize the gap with supervised approaches and future work could implement P-Rank using different graph representations.
3. Introduction
● WSD = assign automatically the most
appropriate meaning to a polysemous word
within a given context (Sinha et al, 2007)
● Use Cases:
● Machine translation
● Speech processing
● Boosting the performance of tasks like text retrieval, document
classification and document clustering
Elena-Oana Tabaranu 3
4. State of the Art
● Supervised WSD vs Unsupervised WSD
● GWSD and Semantic Graph Construction
● SAN Method
● Page-Rank Method
● HITS Method
● P-Rank Method
Elena-Oana Tabaranu 4
5. Supervised WSD vs Unsupervised WSD
● Most approaches transform ● Identify the best sense
the sense of the word into a candidate for a model of the
feature vector word sense dependency in
text
● Low execution time
● Ranking algorithm to choose
● Accuracy of 60%-70%
their most likely combination
● Major disadvantage: ● Window, graph based
knowledge aquisition
representation of the model
bottleneck (accuracy
connected to the amount of ● Fast execution time
manually anotated data) ● Accuracy of 40%-60%
Elena-Oana Tabaranu 5
6. Graph-based WSD
● GWSD = graph representation used to model
word sense dependencies in text (WSD with
graphs, not just word window)
● Goal: identify the most probable sense (label)
for each word
● Advantage: takes into account information
drawn from the entire graph
Elena-Oana Tabaranu 6
9. The Page-Rank Method (Brin and
Page, 1998)
● Ranking algorithm based on the idea of voting:
when one node links to another it offers a vote
to that other node
● The higher the number of votes for a note, the
higher the importance of the node
● Recursively score the candidate nodes for a
weighted undirected graph
Elena-Oana Tabaranu 9
10. The P-Rank Method (Zao et al,
2009)
● Check the structural similarity of nodes in an
information network
● Based on the idea that two nodes are similar if
they reference and also reference similar nodes
● Represents a generalization of other state of
the art measures like CoCitation, Coupling,
Amsler, SimLink
Elena-Oana Tabaranu 10
11. The HITS Method (Kleinberg,1999)
● Identify authorities = the most important nodes
in the graph
● Identify hubs = the nodes which point to
authorities
● The sense with the highest authority is chosen
as the most likely one for each word
● Major disadvantage: densely connected nodes
can attract the highest score (clique attack)
Elena-Oana Tabaranu 11
12. Experiments and Results (I)
● Senseval 2 and 3 data sets often used for testing
● Occurencies for Senseval 2 using WordNet 2
● Occurencies for Senseval 3 using WordNet 2
Elena-Oana Tabaranu 12
13. Experiments and Results (II)
● Accuracies on the Senseval 2 and 3 English All
Words Task data sets (Tsatsaronis et al)
Elena-Oana Tabaranu 13
14. Conclusions
● Recent systems minimise the gap between supervised
and unsupervised approaches.
● The graph-based methods make the most of the rich
semantic model they employ.
● Unsupervised approaches seek the optimal value for
the parameters using as little training data as possible
and testing on as large a dataset as possible.
● Future work: implement P-Rank using a different
representation, for example Sinha et al.
Elena-Oana Tabaranu 14
15. References
1. Tsatsaronis, G., Varlamis, I., Norvag, K. : An Experimental
Study on Unsupervised Graph-based Word Sense
Disambiguation. In Proc. of CICLing (2010).
2. Sinha, R., Mihalcea, R. :Unsupervised graph-based word
sense disambiguation using measures of semantic similarity. In
Proc. of ICSC (2007).
3. Mihalcea, R., Csomai, A. : Senselearner: Word sense
disambiguation for all words in unrestricted text. In Proc. of
ACL, pages 53-56 (2005).
4. Tsatsaronis, G., Vazirgiannis, M., Androutsopoulos, I. :Word
Sense Disambiguation with Spreading Activation Networks
Generated from Thesauri. In Proc. of IJCAI (2007).
Elena-Oana Tabaranu 15