A Survey on Unsupervised Graph-based Word Sense Disambiguation

A Survey on Unsupervised Graph-based Word
Sense Disambiguation

Elena-Oana Tabaranu
elena.tabaranu@info.uaic.ro
UAIC, Iasi

Plan
1.Introduction
2.State of the Art
3.Experiments and Results
4.Conclusions
5.References

Elena-Oana Tabaranu 2

Introduction
● WSD = assign automatically the most
appropriate meaning to a polysemous word
within a given context (Sinha et al, 2007)
● Use Cases:
● Machine translation
● Speech processing
● Boosting the performance of tasks like text retrieval, document
classification and document clustering


State of the Art
● Supervised WSD vs Unsupervised WSD
● GWSD and Semantic Graph Construction
● SAN Method
● Page-Rank Method
● HITS Method
● P-Rank Method


Supervised WSD vs Unsupervised WSD
● Most approaches transform ● Identify the best sense
the sense of the word into a candidate for a model of the
feature vector word sense dependency in
text
● Low execution time
● Ranking algorithm to choose
● Accuracy of 60%-70%
their most likely combination
● Major disadvantage: ● Window, graph based
knowledge aquisition
representation of the model
bottleneck (accuracy
connected to the amount of ● Fast execution time
manually anotated data) ● Accuracy of 40%-60%


Graph-based WSD
● GWSD = graph representation used to model
word sense dependencies in text (WSD with
graphs, not just word window)
● Goal: identify the most probable sense (label)
for each word
● Advantage: takes into account information
drawn from the entire graph


Semantic Graph Construction (I)
● Example (Sinha et al, 2007)


Semantic Graph Construction (II)
● Example (Tsatsaronis et al, 2010)


The Page-Rank Method (Brin and
Page, 1998)
● Ranking algorithm based on the idea of voting:
when one node links to another it offers a vote
to that other node
● The higher the number of votes for a note, the
higher the importance of the node
● Recursively score the candidate nodes for a
weighted undirected graph


The P-Rank Method (Zao et al,
2009)
● Check the structural similarity of nodes in an
information network
● Based on the idea that two nodes are similar if
they reference and also reference similar nodes
● Represents a generalization of other state of
the art measures like CoCitation, Coupling,
Amsler, SimLink


The HITS Method (Kleinberg,1999)
● Identify authorities = the most important nodes
in the graph
● Identify hubs = the nodes which point to
authorities
● The sense with the highest authority is chosen
as the most likely one for each word
● Major disadvantage: densely connected nodes
can attract the highest score (clique attack)


Experiments and Results (I)
● Senseval 2 and 3 data sets often used for testing
● Occurencies for Senseval 2 using WordNet 2

● Occurencies for Senseval 3 using WordNet 2


Experiments and Results (II)
● Accuracies on the Senseval 2 and 3 English All
Words Task data sets (Tsatsaronis et al)


Conclusions
● Recent systems minimise the gap between supervised
and unsupervised approaches.
● The graph-based methods make the most of the rich
semantic model they employ.
● Unsupervised approaches seek the optimal value for
the parameters using as little training data as possible
and testing on as large a dataset as possible.
● Future work: implement P-Rank using a different
representation, for example Sinha et al.


References
1. Tsatsaronis, G., Varlamis, I., Norvag, K. : An Experimental
Study on Unsupervised Graph-based Word Sense
Disambiguation. In Proc. of CICLing (2010).
2. Sinha, R., Mihalcea, R. :Unsupervised graph-based word
sense disambiguation using measures of semantic similarity. In
Proc. of ICSC (2007).
3. Mihalcea, R., Csomai, A. : Senselearner: Word sense
disambiguation for all words in unrestricted text. In Proc. of
ACL, pages 53-56 (2005).
4. Tsatsaronis, G., Vazirgiannis, M., Androutsopoulos, I. :Word
Sense Disambiguation with Spreading Activation Networks
Generated from Thesauri. In Proc. of IJCAI (2007).


Questions?


A Survey on Unsupervised Graph-based Word Sense Disambiguation

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (12)

Semelhante a A Survey on Unsupervised Graph-based Word Sense Disambiguation

Semelhante a A Survey on Unsupervised Graph-based Word Sense Disambiguation (20)

Mais de Elena-Oana Tabaranu

Mais de Elena-Oana Tabaranu (7)

Último

Último (20)

A Survey on Unsupervised Graph-based Word Sense Disambiguation