SlideShare a Scribd company logo
1 of 8
Download to read offline
WSD for SENSEVAL-3 dataset using TiMBL
Milind Gokhale
Indian University
Bloomington
Indiana
USA
mgokhale@indiana.edu
Renuka Deshmukh
Indian University
Bloomington
Indiana
USA
renudesh@umail.iu.edu
Abstract
Word Sense Disambiguation is a chal-
lenge faced in multiple fields involving
natural language processing. We present
a word sense disambiguation approach
that uses memory-based learning to train
and test the machine learner. We use
TiMBL as our machine learner and the
dataset provided in the SENSEVAL-3 as
test and train set. With use of simple fea-
tures like context words, word colloca-
tions, Part of Speech tags and keyword
information, we could achieve an average
accuracy of 66.69%.
1 Introduction
The task of Word Sense Disambiguation entails
assigning a polysemous word the most appropri-
ate meaning based of its usage in the current con-
text. A large number of approaches like Naïve
Bayesian models, parallel corpora, SVD, combi-
nation of knowledge sources [3], hierarchical
decision lists [2], etc. have been proposed for
Word Sense Disambiguation. Recent trends in
the Natural Language Processing community has
seen a tremendous interest in Word Sense Dis-
ambiguation, especially since SENSEVAL-1 in
1999.
Word Sense Disambiguation has, since, be-
come a motivation for more and more research-
ers to come up with an approach to solve this
problem. SENSEVAL is a word sense evaluation
series organized by ACL-SIGLEX. SENSEVAL
1, 2 and 3 were focused on word sense disam-
biguation for growing number of languages.
Since 2007, SEMEVAL evaluation series was
started, which also includes semantic analysis in
addition to WSD.
The most common approaches to Word sense
Disambiguation as discussed below. The
Knowledge-based approach uses information
from an explicit lexicon or a knowledge- source.
This is one of the most common methodologies
used in WSD. It makes use of existing
knowledge sources like WordNet. Corpus-based
approach does not use an explicit knowledge
source. Instead, it uses information gained from a
disambiguated or raw training corpus. The Dis-
ambiguated Corpora-based approach uses a dis-
ambiguated training set. It applies some machine
learning algorithm to a set of features extracted
from the training set. This trained model is then
applied to new examples to disambiguate them.
The Raw-Corpora-based approach uses raw da-
taset as opposed to the disambiguated dataset
used in the Disambiguation-Corpora-based ap-
proach as it is difficult to obtain annotated da-
tasets in most of the cases. This is an unsuper-
vised model and is used for Word Sense Dis-
crimination rather than Disambiguation. The last
approach is Hybrid Approach, which is a combi-
nation of knowledge and corpus based approach.
[4]
In this paper we describe a memory-based ap-
proach for word sense disambiguation (WSD) as
defined in the SENSEVAL task: the association of a
word in context with its contextually appropriate
sense tag. We propose a WSD technique based
on disambiguated corpora provided in the
SENSEVAL-3. We also incorporate features like
having context words as features, word colloca-
tions, POS Tagging and count of keywords to
improve accuracy. We will discuss our approach
in more detail in the following sections.
In this paper, we propose a WSD technique
based on disambiguated corpora provided in the
SENSEVAL-3. We also incorporate features like
having context words as features, word colloca-
tions, POS Tagging and count of keywords to
improve accuracy. We will discuss our approach
in more detail in the following sections.
The rest of the paper is organized as follows.
Section 2 gives an overview of the related work.
Section 3 describes our methodology and feature
selection process. Section 4 gives the accuracy
obtained by our approach, followed by Discus-
sion in section 5, Future Work in section 6 and
Conclusion on section 7.
2 Related Work
One of the first attempts for word sense disam-
biguation was done by Lesk in 1986 using dic-
tionary definitions. This first attempt consisted of
counting the number of overlaps between the
dictionary definitions of the target word and the
words near the target word contextually [6].
Banerjee, Patwardhan and Pederson in 2003 later
extended this approach by including the diction-
ary definitions of the words related to the target
words. These related words were determined ac-
cording to the structure of WordNet. This they
later called as determination of semantic related-
ness and the determination of the most related
sense of a target word is in fact disambiguation
of the target word. They implemented their idea
in a freely available software package called
WordNet::Similarity.
[5] presents a corpus-based approach that re-
sults in high accuracy by combining a number of
very simple classifiers into an ensemble that per-
forms disambiguation via a majority vote. Since
enhancement of feature set or learning algorithm
used in a corpus-based approach does not usually
improve disambiguation accuracy beyond what
can be attained with shallow lexical features or
simple algorithm.
In [7], Veenstra, Bosch et al. follow a memory
based approach to word sense disambiguation in
which relies on only a small number of annotated
examples for each sense of the target word.
There is based on on a POS-tagged corpus ex-
amples and selected dictionary entries. Since it
depended on less number of annotated examples,
it was portable and easily adaptable. They also
constructed a separate classifier called the word
expert for each of the target words.
There have been supervised methods for word
sense disambiguation and [8] argues that typical
supervised approach relies on feature vector con-
structed from the sense-tagged occurrences of
target word, however there is a shortcoming in
supervised approaches since they are only appli-
cable to those words whose sense tagged data is
available. Thus their accuracy is dependent on
the amount of labeled data on target words avail-
able. So Rada Mihalcea focuses on an approach
which can work when only few annotated exam-
ples are available for the target word using co-
training and self-training methods.
3 Methodology
3.1 Classifier
Memory based learning is a supervised learning
approach used for machine learning. A memory
based learning system, has two components – a
memory-based learning component and a simi-
larity-based performance component. The learn-
ing component is used to train the learner, and
involves storing the training instances in the
memory along with their classifier information.
The memory storage is done without any data
processing or analysis. Hence, memory based
learning is also known as lazy learning. The per-
formance component takes new instances that
need to be classified, as input, and classifies the
input based on the similarity with the instances
stored in memory.
Among two popular choices for MBL- TiMBL
and SVM, we decided to use TiMBL due to its
ease of use and ample documentation available
online. Additionally, TiMBL gives us the ability
of varying algorithms and settings in order to get
better accuracy. We did a sample test using both
TiMBL and SVM for three words – arm, diffi-
culty and interest, and found that TiMBL per-
formed better for all the three words, as com-
pared to SVM. A probable cause for this could
the limited number of examples in the train set.
Hence, looking at the positive results from
TiMBL, we have decided to use TiMBL as the
machine learner for the given task of WSD.
3.2 Dataset
Our data set is taken from the SENSEVAL-3,
which was held in 2004. The dataset was part of
the SENSEVAL project organized by ACL-
SIGLEX as part of a competition for evaluating
various WSD systems. These systems are trained
and tested on the dataset provided in the
SENSEVAL for various words and languages.
In this paper, we present the best obtained re-
sults for disambiguation of every target word in
the dataset, by using the words in a widow of ±3
words as feature set, and then adding more fea-
tures to the feature set (as described in section
3.3) for better accuracy.
3.3 Feature Set
TiMBL requires a set of features along with the
classifier as the last column in the training set.
To disambiguate the 57 words given in the
SENSEVAL-3 dataset for WSD, we performed
our experiments in four stages. Each stage con-
sists of adding new features to the already exist-
ing feature set, and then running this feature set
on TiMBL to compare the accuracy obtained in
each set. Our feature types can be broadly classi-
fied into four main categories:
Context Words: Words occurring in the im-
mediate context of the target word, also called
local context, are very helpful in discerning the
sense of the target word. Most of the approaches
use a ±2, ±3 or ±4 window size. The context
words in our approach comprise of the ±3-word
window around the target word. The context
words form the basic feature for our WSD sys-
tem. Additional features are built on top of these
features.
Word Collocations: The word collocation are
the bigrams and trigrams formed by appending
the words around the target word. Each colloca-
tion is represented as one feature in the feature
set. We consider word collocations, because they
often contribute to the disambiguation accuracy
in a big way, as opposed to just having context
words in the feature set [5].
POS Tags: Parts of Speech tagging and WSD
are very closely related to each other. Parts of
Speech tagging is carried out to add more mean-
ing to the extracted feature set, comprising of
words and collocations. [9] and [10] show that
adding POS tags to the feature set improves the
accuracy significantly. We used the Stanford
POS tagger to tag our file with the correct POS,
and then added these extracted POS tags to the
feature set.
Count of Keywords: Considering the key-
words occurring in the given example can also
improve the accuracy of the WSD system. Since
our machine learner needs a fixed number of
columns in the feature set, finding keywords of
each example might not be the correct solution.
In our implementation, we have the total count of
the keywords as the extended feature. These
keywords include the total number of nouns,
verbs, adjectives, and adverbs occurring in the
given example, as flagged by the Stanford POS
tagger.
3.4 Representation
We have divided our implementation into four
stages. This is to get a better idea of the contribu-
tion each type of feature viz. POS tags, context
words, keywords and word collocations makes
towards disambiguation of the target words. This
also gives us an insight into the effect each kind
of feature has on the accuracy of the learner.
The baseline setting for our approach com-
prises of the words in the ±3-word window,
around the target word. Whenever a word is en-
countered that falls out of the boundaries of the
sentence for any example, we assign it a default
value of ‘_’ [11]. Word collocations in terms of
bigrams and trigrams are added after the context
words. The last column is the classifier, or the
sense id of the target word. In our approach we
trained a different learner for each of the 57
words in the SENSEVAL-3 dataset, to be able to
test the results obtained for individual words.
One of the earliest approaches to WSD in-
volved directly assigning the most frequent sense
to the ambiguous word. This was done with the
understanding that there will be errors, but they
will be well under the acceptable limit. In the
first stage, called the basic setting, we test the
accuracy of this baseline feature set with the
classifier as the most frequent sense. The accura-
cy obtained in this step also serves as the bench-
mark to evaluate the results with extended fea-
ture set.
other plans not now were also otherplans plans-
not notnow nowwere werealso otherplansnot
plansnotnow notnowwere nowwerealso 38201
Table 1: Basic Feature Representation
In the second stage, the most frequent sense is
replaced by the actual sense of the target word as
provided in the train set of the SENSEVAL-3
dataset. TiMBL is again trained with this updated
train set and the obtained accuracy is compared
with the accuracy obtained in the previous stage.
This stage is also serves as the baseline setting.
other plans not now were also otherplans plans-
not notnow nowwere werealso otherplansnot
plansnotnow notnowwere nowwerealso 38203
Table 2: Baseline Feature Representation
In the third stage, called the POS stage, POS
tags generated by the Stanford POS tagger are
also incorporated in the feature set. The feature
set is represented by adding a POS tag to each of
the context words, separated by ‘/’. For the word
collocations, collocations of POS of the respec-
tive words are formed and included in the feature
set in a fashion similar to the context word POS
tagging. TiMBL is trained on this feature set and
the obtained accuracy is compared with the base-
line accuracy.
other/JJ plans/NNS not/RB now/RB were/VBD
also/RB otherplans/JJNNS plansnot/NNSRB
notnow/RBRB nowwere/RBVBD were-
also/VBDRB otherplansnot/JJNNSRB plans-
notnow/NNSRBRB notnowwere/RBRBVBD
nowwerealso/RBVBDRB 38203
Table 3: POS Feature Representation
In the fourth and final stage, the count of key-
words is added as the last column of the feature
set, preceding the classifier column. The count
here is the total number of nouns, verbs, adjec-
tives and adverbs occurring in the context. This
information is also obtained from POS tagging
the test and train set and then counting the key-
words for each example. TiMBL is trained and
the obtained results as compared with the accu-
racy of the baseline setting.
other/JJ plans/NNS not/RB now/RB were/VBD
also/RB otherplans/JJNNS plansnot/NNSRB
notnow/RBRB nowwere/RBVBD were-
also/VBDRB otherplansnot/JJNNSRB plans-
notnow/NNSRBRB notnowwere/RBRBVBD
nowwerealso/RBVBDRB 38203
Table 4: Keyword Count Feature Represen-
tation
We tabulate our findings in section 4.
3.5 Metrics
We use various algorithms, weighing vectors and
distance metrics as provided by TiMBL. For al-
gorithms we use IB1, IGTREE, TRIBL. For dis-
tance metrics we use Overlap (O), mvdm and
Jeffry Divergence (JD). We use No Weight
(NW), Information Gain (IG), Chi Squared (CS)
and Gain Ratio (GR) as distance metrics. We
vary k (nearest neighbor) from 1 to 5 for each
setting. We tabulate our findings in the following
section.
4 Results
We tested the data with classification algorithms
– IB1, IGTREE and TRIBL along with the dis-
tance metrics – Overlap, MVDM and Jeffrey Di-
vergence and feature weighing vectors – No
weight, Gain ratio, Information gain and Chi
Squared. Then we train the classifiers using the
train files and test it against the test files. We
tried to optimize the classifier accuracy by ad-
justing the parameters for the classifiers. Out of
the derived outputs we selected the maximum
accuracy TiMBL setting and represent it in table
5. The average accuracy obtained by our ap-
proach is 66.69% which is an improvement over
baseline average of 62.41%. Thus we show that a
combination of context words, word colloca-
tions, POS tags and keyword information is a
good approach towards word sense disambigua-
tion. Figure 1 shows the average accuracy ob-
tained in our approach at each stage.
Figure 1: Average Accuracy
5 Discussion
We notice that IB1 algorithm gives best accuracy
for most of the words. Addition of POS tags did
not have a significant effect in improving the
overall accuracy. Introducing Keyword count
increased the overall accuracy of the classifica-
tion. We noticed significant improvements in
accuracy for some words and slight deterioration
in other words. Default setting also performs bet-
ter than basic setting which was followed in old-
er approaches. Baseline setting shows significant
improvement over default setting for most of the
words. This is because the classification algo-
rithm, distance metrics and weighing vectors can
be varied to achieve better accuracy.
Words like use, remain, provide, and receive
recorded best accuracy when the POS tags are
added to the feature set. The same words showed
best accuracy in baseline which shows that the
POS tags do not change the accuracy much, in
fact the accuracy deteriorated in some cases on
introducing POS tags in the feature set. Interest-
ingly for the word eat, the accuracy improved
significantly just by adding POS tags. Although
some of the words were outliers and showed fall
in accuracy on adding new features to the feature
set, we saw an overall accuracy improvement in
most of the words as we added more features.
For example words like difficulty, paper, im-
portant, performance and solid showed signifi-
cant improvement while the words like eat,
wash, expect showed a severe drop in accuracy.
On analysis we found that the increase or de-
crease in accuracy in adding keyword count is
not dependent on the number of test or train ex-
amples. It is more dependent on the coverage of
the train set. However we can include features
like the actual keywords in the feature set to im-
prove the results which can be considered as fu-
ture work.
Table 5. Best accuracy obtained during the four stage evaluation of our WSD approach.
Word
Most
Freq Default Baseline POS Tagging POS + Keywords
activate 69.12% 62.50%
69.12% (IB1 O k3
default)
69.12% (IB1 O k5
GR)
69.12% (IB1 O k5
GR)
add 41.50% 70.07%
71.26% (IB1 O k3
IG)
70.75% (IB1 O k1
NW)
54.74% (IB1 O k3
CS)
appear 40.97% 59.72%
65.97% (IB1 mvdm
k5 CS)
65.97% (IB1 JD k5
GR)
67.65% (IGTREE O
GR)
argument 44.19% 44.96%
45.74% (IB1 JD k5
GR)
47.29% (IB1 mvdm
k5 GR)
77.78% (IB1 O k3
CS)
arm 80.74% 85.93%
85.93% (IB1 O k1
NW)
85.93% (IB1 O k1
NW)
69.29% (IB1 JD k1
NW)
ask 22.98% 50.93%
53.42% (IB1 JD k3
IG)
48.45% (IB1 mvdm
k3 NW)
45.38% (IB1 O k3
CS)
atmosphere 52.94% 46.08%
52.94% (IB1 O k3
NW)
52.94% (IB1 O k5
GR)
53.85% (IB1 mvdm
k3 NW)
audience 62.62% 71.03%
78.50% (IB1 mvdm
k3 CS)
77.57% (IB1 mvdm
k3 CS)
39.13% (IB1 mvdm
k3 NW)
bank 64.49% 71.01%
75.36% (IB1 mvdm
k3 GR)
75.36% (IB1 mvdm
k3 NW)
42.20% (IB1 mvdm
k1 NW)
begin 50.00% 51.06%
55.32% (IB1 O k3
GR)
54.74% (IB1 O k3
CS)
87.64% (IGTREE O
NW)
climb 54.41% 66.18%
70.59% (IB1 JD k3
GR)
67.65% (IB1 mvdm
k3 GR)
69.23% (IB1 mvdm
k3 IG)
decide 66.67% 73.02%
79.37% (IB1 O k3
IG)
77.78% (IB1 mvdm
k5 NW)
70.07% (IB1 O k1
CS)
degree 55.71% 65.00%
71.43% (IB1 mvdm
k3 GR)
70.00% (IB1 JD k1
NW)
71.26% (IB1 JD k5
GR)
difference 35.38% 41.54%
44.62% (IB1 JD k5
GR)
45.38% (IB1 O k1
CS)
66.67% (IB1 O k5
GR)
different 48.08% 48.08%
53.85% (IB1 mvdm
k3 NW)
55.77% (IB1 mvdm
k3 NW)
62.50% (IB1 mvdm
k3 NW)
difficulty 17.39% 34.78%
34.78% (IB1 mvdm
k3 NW)
39.13% (IB1 mvdm
k3 NW)
79.55% (IGTREE O
IG)
disc 34.86% 36.70%
42.20% (IB1 O k3
CS)
40.37% (IB1 O k3
CS)
60.00% (IB1 mvdm
k1 GR)
eat 86.52% 82.02%
66.15% (IB1 mvdm
k5 NW)
87.64% (IB1 O k1
CS)
37.04% (IB1 JD k1
IG)
encounter 36.92% 60.00%
70.07% (IB1 mvdm
k5 NW)
66.15% (IB1 mvdm
k5 NW)
62.11% (IB1 O k5
GR)
expect 66.67% 68.97% 66.67% (IB1 O NW )
70.11% (IB1 O k1
NW)
43.75% (IB1 O k1
NW)
express 66.67% 61.40% 66.67% (IB1 O NW ) 66.67% (IB1 O k1 55.56% (IB1 O k3
NW) NW)
hear 46.88% 56.25%
59.38% (IB1 O k3
CS)
62.50% (IB1 mvdm
k5 CS)
67.50% (IB1 mvdm
k3 GR)
hot 77.27% 75.00%
79.55% (TRIBL O k1
CS)
79.55% (IGTREE O
IG)
65.97% (IB1 mvdm
k3 GR)
image 36.00% 52.00%
53.33% (IB1 O k1
GR)
53.33% (IB1 O k1
CS)
50.00% (IB1 mvdm
k3 IG)
important 22.22% 25.93%
29.63% (IB1 O k1
GR)
33.33% (IB1 mvdm
k1 NW)
63.24% (IB1 O k5
GR)
interest 61.05% 62.11%
62.11% (IB1 O k1
GR)
63.16% (TRIBL
mvdm k3 IG)
61.11% (IB1 JD k5
GR)
judgement 28.13% 43.75%
50.00% (IB1 O k1
CS)
50.00% (IB1 mvdm
k3 NW)
73.68% (IB1 O k1
CS)
lose 52.78% 44.44%
58.33% (IB1 JD K3
IG)
55.56% (IB1 JD k3
GR)
47.06% (IB1 JD k5
CS)
mean 52.50% 65.00%
65.00% (IB1 O k1
NW)
65.00% (IB1 JD k5
GR)
64.80% (IB1 mvdm
k3 NW)
miss 33.33% 46.67%
53.33% (IB1 mvdm
k5 GR)
53.33% (IB1 mvdm
k5 GR)
66.00% (IB1 O k1
NW)
note 55.88% 57.35%
67.65% (IB1 O k3
GR)
64.71% (IB1 O k3
CS)
70.00% (IB1 O k3
CS)
operate 38.89% 61.11%
66.67% (IB1 JD K5
NW)
61.11% (IB1 JD k5
GR)
48.42% (IB1 O k1
NW)
organization 71.93% 70.18%
77.19% (IB1 mvdm
k5 IG)
77.19% (IB1 mvdm
k5 GR)
58.95% (IB1 mvdm
k3 GR)
paper 22.06% 44.12%
48.53% (IB1 JD k5
NW)
46.32% (IB1 mvdm
k5 NW)
84.51% (IB1 O k5
GR)
party 57.60% 59.20%
65.60% (IB1 mvdm
k5 GR)
66.00% (IB1 mvdm
k5 GR)
88.89% (IB1 O k5
GR)
performance 25.56% 31.11%
32.22% (IB1 O k3
GR)
33.33% (IB1 O k3
CS)
88.89% (IB1 O k5
CS)
plan 69.00% 65.00%
71.00% (IB1 O k3
GR)
71.00% (IB1 O k3
CS)
90.14% (IB1 JD k1
GR)
play 46.15% 44.23%
51.92% (IB1 mvdm
k5 NW)
46.15% (IB1 O k5
GR)
70.00% (IB1 mvdm
k1 NW)
produce 51.58% 56.84%
62.11% (IB1 JD k5
GR)
57.89% (IB1 JD k5
NW)
54.78% (IB1 JD k5
NW)
provide 83.10% 88.73%
91.55% (IB1 O k1
GR)
93.32% (IB1 mvdm
k3 NW)
66.07% (IB1 O k1
NW)
receive 88.89% 74.07%
88.89% (IB1 O k3
NW)
88.89% (IB1 O k5
GR)
75.00% (IB1 mvdm
k1 GR)
remain 77.46% 83.10%
87.32% (IB1 O k3
GR)
87.32% (IB1 mvdm
k1 GR)
66.97% (IB1 O k1
NW)
rule 40.00% 66.67%
66.67% (IB1 O k1
NW)
66.67% (IB1 O k1
NW)
66.97% (TRIBL O
k1 GR)
shelter 38.26% 52.17%
53.04% (IB1 JD k3
CS)
53.04% (IB1 O k1
NW)
85.93% (IB1 O k1
NW)
simple 25.00% 25.00%
35.00% (IB1 mvdm
k1 NW)
38.00% (IB1 mvdm
k1 NW)
85.93% (IB1 mvdm
k1 NW)
smell 39.29% 69.64%
76.79% (IB1 mvdm
k3 NW)
78.57% (IB1 JD k5
CS)
72.60% (IB1 O k5
GR)
solid 26.47% 23.53%
29.41% (IB1 O k5
IG)
33.41% (IB1 mvdm
k3 GR)
75.34% (IB1 mvdm
k1 GR)
sort 57.80% 66.06%
66.06% (IB1 O k1
NW)
66.97% (TRIBL O k1
GR)
66.67% (IB1 O k1
NW)
source 60.00% 48.57%
60.00% (IB1 O k3
NW)
62.74% (IB1 O k5
GR)
93.33% (IGTREE O
GR)
suspend 35.94% 45.31%
50.00% (IB1 mvdm
k5 NW)
54.56% (IB1 mvdm
k5 GR)
76.47% (IB1 O k1
NW)
talk 72.60% 72.60%
76.71% (IB1 JD k3
GR)
76.71% (IB1 mvdm
k3 NW)
78.43% (IB1 mvdm
k3 NW)
treat 28.07% 36.84%
47.37% (IB1 mvdm
k3 NW)
48.86% (IB1 mvdm
k3 NW)
56.52% (IB1 O k1
NW)
use 66.67% 66.67%
93.33% (IGTREE
GR)
93.33% (IGTREE O
GR)
60.87% (IB1 JD k3
GR)
wash 69.70% 60.61%
72.73% (TRIBL O q1
no gain)
78.73% (TRIBL O k1
NW)
52.94% (IB1 O k5
GR)
watch 74.51% 78.43%
80.39% (IB1 mvdm
k1 NW)
78.43% (IB1 O k1
NW)
71.03% (IB1 O k1
NW)
win 43.59% 53.85%
56.41% (IB1 JD k3
NW)
56.41% (IB1 O k3
CS)
75.70% (IB1 mvdm
k3 CS)
write 34.78% 52.17%
52.17% (IB1 O k1
NW)
58.52% (IB1 O k1
CS)
76.09% (IB1 mvdm
k3 NW)
Average 51% 57% 62.41% 62.96% 66.69%
6 Future Work
There are a number of extensions to our work.
Implementing forward or backward selection
can be considered for future work. Since the
SENSEVAL-3 dataset for English language is
not exhaustive, we believe implementing
backward selection will be a good approach to
further improve the accuracy. In the current
approach, we use count of the keywords as a
feature value. Going one step further, and find-
ing the actual keywords related to the target
word, using the TF-IDF approach can be
worked upon in the future.
Publicly available lexical data sources, like
the WordNet, can be leveraged to extract addi-
tional contextual information about the target
word. Online dictionaries and thesaurus can be
used to glean alternative words for the target
and context words, and these words can be
added as features. Same word might appear in
different forms in the test and train file mis-
leading TiMBL to believe that they are differ-
ent words. This compromises the accuracy of
the WSD system. As future work, we plan to
substitute the context and target words with the
morphological root word. We believe that this
will have a positive effect on the results.
Lastly, we would like to add senses of
words to the feature set. Each sense of the tar-
get word can be added as a separate feature,
where the feature-value is the total number of
keywords present in the example which corre-
spond to the given sense. Thus, the feature-
value with highest number would be the most
probable sense of the target word.
7 Conclusion
We believe that the memory-based learning
methodology is well-suited for the problem of
Word Sense Disambiguation, as it can work
well with both dense and sparse data. Memory-
based learning can also easily work with ex-
ceptional and corner-cases.
In this paper, we present our approach for
WSD using memory-based learning. We ar-
gued that using TiMBL over SVM was a better
choice for the current dataset. We also show,
how a choice of feature set affects the accuracy
of the learner. We propose and subsequently
evaluate the improvements made by: 1) adding
POS tags to the feature set; and 2) adding
keyword information to the feature set. Lastly,
we are confident that having senses of the tar-
get word as part of the feature set with weights
as value of the feature will improve the outputs
significantly.
8 Acknowledgement
We would like to extend our sincere thanks to
Dr. Prof. Sandra Kuebler, for her guidance and
direction, without which this paper would not
have been a success. We would also like to
thank Prof. Kuebler for all the study material
and encouragement that lead to the completion
of this project.
References
[1] Daelemans, W., Zavrel, J., Van der Sloot K.,
and Van den Bosch, A.: TiMBL: Tilburg
Memory-Based Learner. Computational Linguis-
tics Tilburg University version 5.0.1 December
ILK Technical Report - ILK 04–02 (2004).
[2] D. Yarowsky. 2000. Hierarchical decision lists
for word sense disambiguation. Computers and
the Humanities, 34(1{2).
[3] Y. Wilks and M. Stevenson. 1998. Word sense
disambiguation using optimised combinations of
knowledge sources. In Proceedings of COL-
ING/ACL-98.
[4] "Word Sense Disambiguation." - ACL Wiki.
Accessed May 5, 2015.
http://aclweb.org/aclwiki/index.php?title=Word_
sense_disambiguation.
[5] Pedersen, Ted. "A simple approach to building
ensembles of Naive Bayesian classifiers for
word sense disambiguation." In Proceedings of
the 1st North American chapter of the Associa-
tion for Computational Linguistics conference,
pp. 63-69. Association for Computational Lin-
guistics, 2000.
[6] Patwardhan, Siddharth, Satanjeev Banerjee, and
Ted Pedersen. "SenseRelate:: TargetWord: a
generalized framework for word sense disam-
biguation." In Proceedings of the ACL 2005 on
Interactive poster and demonstration sessions,
pp. 73-76. Association for Computational Lin-
guistics, 2005.
[7] Veenstra, Jorn, Antal Van den Bosch, Sabine
Buchholz, and Walter Daelemans. "Memory-
based word sense disambiguation." Computers
and the Humanities 34, no. 1-2 (2000): 171-177.
[8] Mihalcea, Rada. "Co-training and self-training
for word sense disambiguation." In Proceedings
of the Conference on Computational Natural
Language Learning (CoNLL-2004). 2004.
[9] Gaustad, Tanja. "The importance of high quality
input for WSD: An application-oriented compar-
ison of part-of-speech taggers." In Proceedings
of the Australasian Language Technology Work-
shop (ALTW 2003), pp. 65-72. 2003.
[10] Moreno-Monteagudo, Lorenza, Rubén
Izquierdo-Beviá, Patricio Martínez-Barco, and
Armando Suárez. "A Study of the Influence of
PoS Tagging on WSD." InText, Speech and Dia-
logue, pp. 173-179. Springer Berlin Heidelberg,
2006.
[11] Escudero, Gerard, Lluís Màrquez, and
German Rigau. "Naive Bayes and exemplar-
based approaches to word sense disambigua-
tion revisited." arXiv preprint cs/0007011
(2000).

More Related Content

What's hot

AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...cscpconf
 
An approach to word sense disambiguation combining modified lesk and bag of w...
An approach to word sense disambiguation combining modified lesk and bag of w...An approach to word sense disambiguation combining modified lesk and bag of w...
An approach to word sense disambiguation combining modified lesk and bag of w...csandit
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextBayu Aldi Yansyah
 
Document Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word RepresentationDocument Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word Representationsuthi
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALijaia
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document RankingBhaskar Mitra
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...ijnlc
 
IRJET - Analysis of Paraphrase Detection using NLP Techniques
IRJET - Analysis of Paraphrase Detection using NLP TechniquesIRJET - Analysis of Paraphrase Detection using NLP Techniques
IRJET - Analysis of Paraphrase Detection using NLP TechniquesIRJET Journal
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextrudolf eremyan
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Bhaskar Mitra
 
Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...csandit
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015Conor McGrory
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
 

What's hot (20)

AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
AN APPROACH TO WORD SENSE DISAMBIGUATION COMBINING MODIFIED LESK AND BAG-OF-W...
 
An approach to word sense disambiguation combining modified lesk and bag of w...
An approach to word sense disambiguation combining modified lesk and bag of w...An approach to word sense disambiguation combining modified lesk and bag of w...
An approach to word sense disambiguation combining modified lesk and bag of w...
 
Word embedding
Word embedding Word embedding
Word embedding
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 
228-SE3001_2
228-SE3001_2228-SE3001_2
228-SE3001_2
 
Document Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word RepresentationDocument Classification Using KNN with Fuzzy Bags of Word Representation
Document Classification Using KNN with Fuzzy Bags of Word Representation
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
 
Neural Models for Document Ranking
Neural Models for Document RankingNeural Models for Document Ranking
Neural Models for Document Ranking
 
A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...A comparative analysis of particle swarm optimization and k means algorithm f...
A comparative analysis of particle swarm optimization and k means algorithm f...
 
IRJET - Analysis of Paraphrase Detection using NLP Techniques
IRJET - Analysis of Paraphrase Detection using NLP TechniquesIRJET - Analysis of Paraphrase Detection using NLP Techniques
IRJET - Analysis of Paraphrase Detection using NLP Techniques
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
Cc35451454
Cc35451454Cc35451454
Cc35451454
 
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastTextGDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
GDG Tbilisi 2017. Word Embedding Libraries Overview: Word2Vec and fastText
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Text summarization
Text summarization Text summarization
Text summarization
 
Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...
 
The Duet model
The Duet modelThe Duet model
The Duet model
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015
 
Y24168171
Y24168171Y24168171
Y24168171
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
 

Viewers also liked

Viewers also liked (20)

Introduction to Kernel Functions
Introduction to Kernel FunctionsIntroduction to Kernel Functions
Introduction to Kernel Functions
 
Viva
VivaViva
Viva
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Technique
 
Advances In Wsd Aaai 2005
Advances In Wsd Aaai 2005Advances In Wsd Aaai 2005
Advances In Wsd Aaai 2005
 
Svm and kernel machines
Svm and kernel machinesSvm and kernel machines
Svm and kernel machines
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
 
Kernel Configuration and Compilation
Kernel Configuration and CompilationKernel Configuration and Compilation
Kernel Configuration and Compilation
 
Lecture 5: Bayesian Classification
Lecture 5: Bayesian ClassificationLecture 5: Bayesian Classification
Lecture 5: Bayesian Classification
 
Naive Bayes
Naive Bayes Naive Bayes
Naive Bayes
 
Intelligence Artificielle - Algorithmes de recherche
Intelligence Artificielle - Algorithmes de rechercheIntelligence Artificielle - Algorithmes de recherche
Intelligence Artificielle - Algorithmes de recherche
 
K-Nearest Neighbor
K-Nearest NeighborK-Nearest Neighbor
K-Nearest Neighbor
 
Some problems of ambiguity in translation with reference to english and arabic
Some problems of ambiguity in translation with reference to english and arabicSome problems of ambiguity in translation with reference to english and arabic
Some problems of ambiguity in translation with reference to english and arabic
 
AMBIGUITY IN A LANGUAGE
AMBIGUITY IN A LANGUAGEAMBIGUITY IN A LANGUAGE
AMBIGUITY IN A LANGUAGE
 
Ambiguity
AmbiguityAmbiguity
Ambiguity
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
KNN
KNN KNN
KNN
 
Ambiguity
AmbiguityAmbiguity
Ambiguity
 
k Nearest Neighbor
k Nearest Neighbork Nearest Neighbor
k Nearest Neighbor
 
Structural ambiguity
Structural ambiguityStructural ambiguity
Structural ambiguity
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 

Similar to Wsd final paper

THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...Alexander Decker
 
A Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationA Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationIOSR Journals
 
Automatic classification of bengali sentences based on sense definitions pres...
Automatic classification of bengali sentences based on sense definitions pres...Automatic classification of bengali sentences based on sense definitions pres...
Automatic classification of bengali sentences based on sense definitions pres...ijctcm
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learningcsandit
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewINFOGAIN PUBLICATION
 
A hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerA hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET Journal
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHijwscjournal
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHijwscjournal
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESijnlc
 
IRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET Journal
 
Semantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsSemantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsIRJET Journal
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsIJCSIS Research Publications
 
A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...IJECEIAES
 

Similar to Wsd final paper (20)

THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...A supervised word sense disambiguation method using ontology and context know...
A supervised word sense disambiguation method using ontology and context know...
 
A Survey on Word Sense Disambiguation
A Survey on Word Sense DisambiguationA Survey on Word Sense Disambiguation
A Survey on Word Sense Disambiguation
 
Automatic classification of bengali sentences based on sense definitions pres...
Automatic classification of bengali sentences based on sense definitions pres...Automatic classification of bengali sentences based on sense definitions pres...
Automatic classification of bengali sentences based on sense definitions pres...
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learning
 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
 
A hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerA hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzer
 
IJET-V3I1P1
IJET-V3I1P1IJET-V3I1P1
IJET-V3I1P1
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
Hindi/Bengali Sentiment Analysis using Transfer Learning and Joint Dual Input...
Hindi/Bengali Sentiment Analysis using Transfer Learning and Joint Dual Input...Hindi/Bengali Sentiment Analysis using Transfer Learning and Joint Dual Input...
Hindi/Bengali Sentiment Analysis using Transfer Learning and Joint Dual Input...
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
 
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCHSENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
SENSE DISAMBIGUATION TECHNIQUE FOR PROVIDING MORE ACCURATE RESULTS IN WEB SEARCH
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
 
IRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET-Semantic Based Document Clustering Using Lexical Chains
IRJET-Semantic Based Document Clustering Using Lexical Chains
 
Semantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsSemantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical Chains
 
K0936266
K0936266K0936266
K0936266
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
 
J017256674
J017256674J017256674
J017256674
 
A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...
 
Aq35241246
Aq35241246Aq35241246
Aq35241246
 

More from Milind Gokhale

Yelp Dataset Challenge 2015
Yelp Dataset Challenge 2015Yelp Dataset Challenge 2015
Yelp Dataset Challenge 2015Milind Gokhale
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Technology Survey and Design
Technology Survey and DesignTechnology Survey and Design
Technology Survey and DesignMilind Gokhale
 
Epics and User Stories
Epics and User StoriesEpics and User Stories
Epics and User StoriesMilind Gokhale
 
Aloha Social Networking Portal - SRS
Aloha Social Networking Portal - SRSAloha Social Networking Portal - SRS
Aloha Social Networking Portal - SRSMilind Gokhale
 
Aloha Social Networking Portal - Design Document
Aloha Social Networking Portal - Design DocumentAloha Social Networking Portal - Design Document
Aloha Social Networking Portal - Design DocumentMilind Gokhale
 
Android games analysis final presentation
Android games analysis final presentationAndroid games analysis final presentation
Android games analysis final presentationMilind Gokhale
 
Android gamesanalysis hunger-gamesfinal
Android gamesanalysis hunger-gamesfinalAndroid gamesanalysis hunger-gamesfinal
Android gamesanalysis hunger-gamesfinalMilind Gokhale
 
Buffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingBuffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingMilind Gokhale
 
Algorithms for External Memory Sorting
Algorithms for External Memory SortingAlgorithms for External Memory Sorting
Algorithms for External Memory SortingMilind Gokhale
 
Building effective teams in Amdocs-TECC - project report
Building effective teams in Amdocs-TECC - project reportBuilding effective teams in Amdocs-TECC - project report
Building effective teams in Amdocs-TECC - project reportMilind Gokhale
 
Building effective teams in Amdocs TECC - Presentation
Building effective teams in Amdocs TECC - PresentationBuilding effective teams in Amdocs TECC - Presentation
Building effective teams in Amdocs TECC - PresentationMilind Gokhale
 
Internet marketing report
Internet marketing reportInternet marketing report
Internet marketing reportMilind Gokhale
 
Change: to be or not to be
Change: to be or not to beChange: to be or not to be
Change: to be or not to beMilind Gokhale
 

More from Milind Gokhale (20)

Yelp Dataset Challenge 2015
Yelp Dataset Challenge 2015Yelp Dataset Challenge 2015
Yelp Dataset Challenge 2015
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Sprint Plan1
Sprint Plan1Sprint Plan1
Sprint Plan1
 
Technology Survey and Design
Technology Survey and DesignTechnology Survey and Design
Technology Survey and Design
 
Market Survey Report
Market Survey ReportMarket Survey Report
Market Survey Report
 
Epics and User Stories
Epics and User StoriesEpics and User Stories
Epics and User Stories
 
Visualforce
VisualforceVisualforce
Visualforce
 
Aloha Social Networking Portal - SRS
Aloha Social Networking Portal - SRSAloha Social Networking Portal - SRS
Aloha Social Networking Portal - SRS
 
Aloha Social Networking Portal - Design Document
Aloha Social Networking Portal - Design DocumentAloha Social Networking Portal - Design Document
Aloha Social Networking Portal - Design Document
 
Android games analysis final presentation
Android games analysis final presentationAndroid games analysis final presentation
Android games analysis final presentation
 
Android gamesanalysis hunger-gamesfinal
Android gamesanalysis hunger-gamesfinalAndroid gamesanalysis hunger-gamesfinal
Android gamesanalysis hunger-gamesfinal
 
Buffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingBuffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data Processing
 
Algorithms for External Memory Sorting
Algorithms for External Memory SortingAlgorithms for External Memory Sorting
Algorithms for External Memory Sorting
 
One sample runs test
One sample runs testOne sample runs test
One sample runs test
 
Building effective teams in Amdocs-TECC - project report
Building effective teams in Amdocs-TECC - project reportBuilding effective teams in Amdocs-TECC - project report
Building effective teams in Amdocs-TECC - project report
 
Building effective teams in Amdocs TECC - Presentation
Building effective teams in Amdocs TECC - PresentationBuilding effective teams in Amdocs TECC - Presentation
Building effective teams in Amdocs TECC - Presentation
 
Internet marketing report
Internet marketing reportInternet marketing report
Internet marketing report
 
Internet marketing
Internet marketingInternet marketing
Internet marketing
 
Indian it industry
Indian it industryIndian it industry
Indian it industry
 
Change: to be or not to be
Change: to be or not to beChange: to be or not to be
Change: to be or not to be
 

Recently uploaded

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 

Recently uploaded (20)

Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 

Wsd final paper

  • 1. WSD for SENSEVAL-3 dataset using TiMBL Milind Gokhale Indian University Bloomington Indiana USA mgokhale@indiana.edu Renuka Deshmukh Indian University Bloomington Indiana USA renudesh@umail.iu.edu Abstract Word Sense Disambiguation is a chal- lenge faced in multiple fields involving natural language processing. We present a word sense disambiguation approach that uses memory-based learning to train and test the machine learner. We use TiMBL as our machine learner and the dataset provided in the SENSEVAL-3 as test and train set. With use of simple fea- tures like context words, word colloca- tions, Part of Speech tags and keyword information, we could achieve an average accuracy of 66.69%. 1 Introduction The task of Word Sense Disambiguation entails assigning a polysemous word the most appropri- ate meaning based of its usage in the current con- text. A large number of approaches like Naïve Bayesian models, parallel corpora, SVD, combi- nation of knowledge sources [3], hierarchical decision lists [2], etc. have been proposed for Word Sense Disambiguation. Recent trends in the Natural Language Processing community has seen a tremendous interest in Word Sense Dis- ambiguation, especially since SENSEVAL-1 in 1999. Word Sense Disambiguation has, since, be- come a motivation for more and more research- ers to come up with an approach to solve this problem. SENSEVAL is a word sense evaluation series organized by ACL-SIGLEX. SENSEVAL 1, 2 and 3 were focused on word sense disam- biguation for growing number of languages. Since 2007, SEMEVAL evaluation series was started, which also includes semantic analysis in addition to WSD. The most common approaches to Word sense Disambiguation as discussed below. The Knowledge-based approach uses information from an explicit lexicon or a knowledge- source. This is one of the most common methodologies used in WSD. It makes use of existing knowledge sources like WordNet. Corpus-based approach does not use an explicit knowledge source. Instead, it uses information gained from a disambiguated or raw training corpus. The Dis- ambiguated Corpora-based approach uses a dis- ambiguated training set. It applies some machine learning algorithm to a set of features extracted from the training set. This trained model is then applied to new examples to disambiguate them. The Raw-Corpora-based approach uses raw da- taset as opposed to the disambiguated dataset used in the Disambiguation-Corpora-based ap- proach as it is difficult to obtain annotated da- tasets in most of the cases. This is an unsuper- vised model and is used for Word Sense Dis- crimination rather than Disambiguation. The last approach is Hybrid Approach, which is a combi- nation of knowledge and corpus based approach. [4] In this paper we describe a memory-based ap- proach for word sense disambiguation (WSD) as defined in the SENSEVAL task: the association of a word in context with its contextually appropriate sense tag. We propose a WSD technique based on disambiguated corpora provided in the SENSEVAL-3. We also incorporate features like having context words as features, word colloca- tions, POS Tagging and count of keywords to improve accuracy. We will discuss our approach in more detail in the following sections. In this paper, we propose a WSD technique based on disambiguated corpora provided in the
  • 2. SENSEVAL-3. We also incorporate features like having context words as features, word colloca- tions, POS Tagging and count of keywords to improve accuracy. We will discuss our approach in more detail in the following sections. The rest of the paper is organized as follows. Section 2 gives an overview of the related work. Section 3 describes our methodology and feature selection process. Section 4 gives the accuracy obtained by our approach, followed by Discus- sion in section 5, Future Work in section 6 and Conclusion on section 7. 2 Related Work One of the first attempts for word sense disam- biguation was done by Lesk in 1986 using dic- tionary definitions. This first attempt consisted of counting the number of overlaps between the dictionary definitions of the target word and the words near the target word contextually [6]. Banerjee, Patwardhan and Pederson in 2003 later extended this approach by including the diction- ary definitions of the words related to the target words. These related words were determined ac- cording to the structure of WordNet. This they later called as determination of semantic related- ness and the determination of the most related sense of a target word is in fact disambiguation of the target word. They implemented their idea in a freely available software package called WordNet::Similarity. [5] presents a corpus-based approach that re- sults in high accuracy by combining a number of very simple classifiers into an ensemble that per- forms disambiguation via a majority vote. Since enhancement of feature set or learning algorithm used in a corpus-based approach does not usually improve disambiguation accuracy beyond what can be attained with shallow lexical features or simple algorithm. In [7], Veenstra, Bosch et al. follow a memory based approach to word sense disambiguation in which relies on only a small number of annotated examples for each sense of the target word. There is based on on a POS-tagged corpus ex- amples and selected dictionary entries. Since it depended on less number of annotated examples, it was portable and easily adaptable. They also constructed a separate classifier called the word expert for each of the target words. There have been supervised methods for word sense disambiguation and [8] argues that typical supervised approach relies on feature vector con- structed from the sense-tagged occurrences of target word, however there is a shortcoming in supervised approaches since they are only appli- cable to those words whose sense tagged data is available. Thus their accuracy is dependent on the amount of labeled data on target words avail- able. So Rada Mihalcea focuses on an approach which can work when only few annotated exam- ples are available for the target word using co- training and self-training methods. 3 Methodology 3.1 Classifier Memory based learning is a supervised learning approach used for machine learning. A memory based learning system, has two components – a memory-based learning component and a simi- larity-based performance component. The learn- ing component is used to train the learner, and involves storing the training instances in the memory along with their classifier information. The memory storage is done without any data processing or analysis. Hence, memory based learning is also known as lazy learning. The per- formance component takes new instances that need to be classified, as input, and classifies the input based on the similarity with the instances stored in memory. Among two popular choices for MBL- TiMBL and SVM, we decided to use TiMBL due to its ease of use and ample documentation available online. Additionally, TiMBL gives us the ability of varying algorithms and settings in order to get better accuracy. We did a sample test using both TiMBL and SVM for three words – arm, diffi- culty and interest, and found that TiMBL per- formed better for all the three words, as com- pared to SVM. A probable cause for this could the limited number of examples in the train set. Hence, looking at the positive results from TiMBL, we have decided to use TiMBL as the machine learner for the given task of WSD. 3.2 Dataset Our data set is taken from the SENSEVAL-3, which was held in 2004. The dataset was part of the SENSEVAL project organized by ACL- SIGLEX as part of a competition for evaluating various WSD systems. These systems are trained and tested on the dataset provided in the SENSEVAL for various words and languages. In this paper, we present the best obtained re- sults for disambiguation of every target word in the dataset, by using the words in a widow of ±3 words as feature set, and then adding more fea-
  • 3. tures to the feature set (as described in section 3.3) for better accuracy. 3.3 Feature Set TiMBL requires a set of features along with the classifier as the last column in the training set. To disambiguate the 57 words given in the SENSEVAL-3 dataset for WSD, we performed our experiments in four stages. Each stage con- sists of adding new features to the already exist- ing feature set, and then running this feature set on TiMBL to compare the accuracy obtained in each set. Our feature types can be broadly classi- fied into four main categories: Context Words: Words occurring in the im- mediate context of the target word, also called local context, are very helpful in discerning the sense of the target word. Most of the approaches use a ±2, ±3 or ±4 window size. The context words in our approach comprise of the ±3-word window around the target word. The context words form the basic feature for our WSD sys- tem. Additional features are built on top of these features. Word Collocations: The word collocation are the bigrams and trigrams formed by appending the words around the target word. Each colloca- tion is represented as one feature in the feature set. We consider word collocations, because they often contribute to the disambiguation accuracy in a big way, as opposed to just having context words in the feature set [5]. POS Tags: Parts of Speech tagging and WSD are very closely related to each other. Parts of Speech tagging is carried out to add more mean- ing to the extracted feature set, comprising of words and collocations. [9] and [10] show that adding POS tags to the feature set improves the accuracy significantly. We used the Stanford POS tagger to tag our file with the correct POS, and then added these extracted POS tags to the feature set. Count of Keywords: Considering the key- words occurring in the given example can also improve the accuracy of the WSD system. Since our machine learner needs a fixed number of columns in the feature set, finding keywords of each example might not be the correct solution. In our implementation, we have the total count of the keywords as the extended feature. These keywords include the total number of nouns, verbs, adjectives, and adverbs occurring in the given example, as flagged by the Stanford POS tagger. 3.4 Representation We have divided our implementation into four stages. This is to get a better idea of the contribu- tion each type of feature viz. POS tags, context words, keywords and word collocations makes towards disambiguation of the target words. This also gives us an insight into the effect each kind of feature has on the accuracy of the learner. The baseline setting for our approach com- prises of the words in the ±3-word window, around the target word. Whenever a word is en- countered that falls out of the boundaries of the sentence for any example, we assign it a default value of ‘_’ [11]. Word collocations in terms of bigrams and trigrams are added after the context words. The last column is the classifier, or the sense id of the target word. In our approach we trained a different learner for each of the 57 words in the SENSEVAL-3 dataset, to be able to test the results obtained for individual words. One of the earliest approaches to WSD in- volved directly assigning the most frequent sense to the ambiguous word. This was done with the understanding that there will be errors, but they will be well under the acceptable limit. In the first stage, called the basic setting, we test the accuracy of this baseline feature set with the classifier as the most frequent sense. The accura- cy obtained in this step also serves as the bench- mark to evaluate the results with extended fea- ture set. other plans not now were also otherplans plans- not notnow nowwere werealso otherplansnot plansnotnow notnowwere nowwerealso 38201 Table 1: Basic Feature Representation In the second stage, the most frequent sense is replaced by the actual sense of the target word as provided in the train set of the SENSEVAL-3 dataset. TiMBL is again trained with this updated train set and the obtained accuracy is compared with the accuracy obtained in the previous stage. This stage is also serves as the baseline setting. other plans not now were also otherplans plans- not notnow nowwere werealso otherplansnot plansnotnow notnowwere nowwerealso 38203 Table 2: Baseline Feature Representation In the third stage, called the POS stage, POS tags generated by the Stanford POS tagger are also incorporated in the feature set. The feature set is represented by adding a POS tag to each of the context words, separated by ‘/’. For the word collocations, collocations of POS of the respec-
  • 4. tive words are formed and included in the feature set in a fashion similar to the context word POS tagging. TiMBL is trained on this feature set and the obtained accuracy is compared with the base- line accuracy. other/JJ plans/NNS not/RB now/RB were/VBD also/RB otherplans/JJNNS plansnot/NNSRB notnow/RBRB nowwere/RBVBD were- also/VBDRB otherplansnot/JJNNSRB plans- notnow/NNSRBRB notnowwere/RBRBVBD nowwerealso/RBVBDRB 38203 Table 3: POS Feature Representation In the fourth and final stage, the count of key- words is added as the last column of the feature set, preceding the classifier column. The count here is the total number of nouns, verbs, adjec- tives and adverbs occurring in the context. This information is also obtained from POS tagging the test and train set and then counting the key- words for each example. TiMBL is trained and the obtained results as compared with the accu- racy of the baseline setting. other/JJ plans/NNS not/RB now/RB were/VBD also/RB otherplans/JJNNS plansnot/NNSRB notnow/RBRB nowwere/RBVBD were- also/VBDRB otherplansnot/JJNNSRB plans- notnow/NNSRBRB notnowwere/RBRBVBD nowwerealso/RBVBDRB 38203 Table 4: Keyword Count Feature Represen- tation We tabulate our findings in section 4. 3.5 Metrics We use various algorithms, weighing vectors and distance metrics as provided by TiMBL. For al- gorithms we use IB1, IGTREE, TRIBL. For dis- tance metrics we use Overlap (O), mvdm and Jeffry Divergence (JD). We use No Weight (NW), Information Gain (IG), Chi Squared (CS) and Gain Ratio (GR) as distance metrics. We vary k (nearest neighbor) from 1 to 5 for each setting. We tabulate our findings in the following section. 4 Results We tested the data with classification algorithms – IB1, IGTREE and TRIBL along with the dis- tance metrics – Overlap, MVDM and Jeffrey Di- vergence and feature weighing vectors – No weight, Gain ratio, Information gain and Chi Squared. Then we train the classifiers using the train files and test it against the test files. We tried to optimize the classifier accuracy by ad- justing the parameters for the classifiers. Out of the derived outputs we selected the maximum accuracy TiMBL setting and represent it in table 5. The average accuracy obtained by our ap- proach is 66.69% which is an improvement over baseline average of 62.41%. Thus we show that a combination of context words, word colloca- tions, POS tags and keyword information is a good approach towards word sense disambigua- tion. Figure 1 shows the average accuracy ob- tained in our approach at each stage. Figure 1: Average Accuracy 5 Discussion We notice that IB1 algorithm gives best accuracy for most of the words. Addition of POS tags did not have a significant effect in improving the overall accuracy. Introducing Keyword count increased the overall accuracy of the classifica- tion. We noticed significant improvements in accuracy for some words and slight deterioration in other words. Default setting also performs bet- ter than basic setting which was followed in old- er approaches. Baseline setting shows significant improvement over default setting for most of the words. This is because the classification algo- rithm, distance metrics and weighing vectors can be varied to achieve better accuracy. Words like use, remain, provide, and receive recorded best accuracy when the POS tags are added to the feature set. The same words showed best accuracy in baseline which shows that the POS tags do not change the accuracy much, in fact the accuracy deteriorated in some cases on introducing POS tags in the feature set. Interest- ingly for the word eat, the accuracy improved significantly just by adding POS tags. Although some of the words were outliers and showed fall in accuracy on adding new features to the feature set, we saw an overall accuracy improvement in most of the words as we added more features. For example words like difficulty, paper, im-
  • 5. portant, performance and solid showed signifi- cant improvement while the words like eat, wash, expect showed a severe drop in accuracy. On analysis we found that the increase or de- crease in accuracy in adding keyword count is not dependent on the number of test or train ex- amples. It is more dependent on the coverage of the train set. However we can include features like the actual keywords in the feature set to im- prove the results which can be considered as fu- ture work. Table 5. Best accuracy obtained during the four stage evaluation of our WSD approach. Word Most Freq Default Baseline POS Tagging POS + Keywords activate 69.12% 62.50% 69.12% (IB1 O k3 default) 69.12% (IB1 O k5 GR) 69.12% (IB1 O k5 GR) add 41.50% 70.07% 71.26% (IB1 O k3 IG) 70.75% (IB1 O k1 NW) 54.74% (IB1 O k3 CS) appear 40.97% 59.72% 65.97% (IB1 mvdm k5 CS) 65.97% (IB1 JD k5 GR) 67.65% (IGTREE O GR) argument 44.19% 44.96% 45.74% (IB1 JD k5 GR) 47.29% (IB1 mvdm k5 GR) 77.78% (IB1 O k3 CS) arm 80.74% 85.93% 85.93% (IB1 O k1 NW) 85.93% (IB1 O k1 NW) 69.29% (IB1 JD k1 NW) ask 22.98% 50.93% 53.42% (IB1 JD k3 IG) 48.45% (IB1 mvdm k3 NW) 45.38% (IB1 O k3 CS) atmosphere 52.94% 46.08% 52.94% (IB1 O k3 NW) 52.94% (IB1 O k5 GR) 53.85% (IB1 mvdm k3 NW) audience 62.62% 71.03% 78.50% (IB1 mvdm k3 CS) 77.57% (IB1 mvdm k3 CS) 39.13% (IB1 mvdm k3 NW) bank 64.49% 71.01% 75.36% (IB1 mvdm k3 GR) 75.36% (IB1 mvdm k3 NW) 42.20% (IB1 mvdm k1 NW) begin 50.00% 51.06% 55.32% (IB1 O k3 GR) 54.74% (IB1 O k3 CS) 87.64% (IGTREE O NW) climb 54.41% 66.18% 70.59% (IB1 JD k3 GR) 67.65% (IB1 mvdm k3 GR) 69.23% (IB1 mvdm k3 IG) decide 66.67% 73.02% 79.37% (IB1 O k3 IG) 77.78% (IB1 mvdm k5 NW) 70.07% (IB1 O k1 CS) degree 55.71% 65.00% 71.43% (IB1 mvdm k3 GR) 70.00% (IB1 JD k1 NW) 71.26% (IB1 JD k5 GR) difference 35.38% 41.54% 44.62% (IB1 JD k5 GR) 45.38% (IB1 O k1 CS) 66.67% (IB1 O k5 GR) different 48.08% 48.08% 53.85% (IB1 mvdm k3 NW) 55.77% (IB1 mvdm k3 NW) 62.50% (IB1 mvdm k3 NW) difficulty 17.39% 34.78% 34.78% (IB1 mvdm k3 NW) 39.13% (IB1 mvdm k3 NW) 79.55% (IGTREE O IG) disc 34.86% 36.70% 42.20% (IB1 O k3 CS) 40.37% (IB1 O k3 CS) 60.00% (IB1 mvdm k1 GR) eat 86.52% 82.02% 66.15% (IB1 mvdm k5 NW) 87.64% (IB1 O k1 CS) 37.04% (IB1 JD k1 IG) encounter 36.92% 60.00% 70.07% (IB1 mvdm k5 NW) 66.15% (IB1 mvdm k5 NW) 62.11% (IB1 O k5 GR) expect 66.67% 68.97% 66.67% (IB1 O NW ) 70.11% (IB1 O k1 NW) 43.75% (IB1 O k1 NW) express 66.67% 61.40% 66.67% (IB1 O NW ) 66.67% (IB1 O k1 55.56% (IB1 O k3
  • 6. NW) NW) hear 46.88% 56.25% 59.38% (IB1 O k3 CS) 62.50% (IB1 mvdm k5 CS) 67.50% (IB1 mvdm k3 GR) hot 77.27% 75.00% 79.55% (TRIBL O k1 CS) 79.55% (IGTREE O IG) 65.97% (IB1 mvdm k3 GR) image 36.00% 52.00% 53.33% (IB1 O k1 GR) 53.33% (IB1 O k1 CS) 50.00% (IB1 mvdm k3 IG) important 22.22% 25.93% 29.63% (IB1 O k1 GR) 33.33% (IB1 mvdm k1 NW) 63.24% (IB1 O k5 GR) interest 61.05% 62.11% 62.11% (IB1 O k1 GR) 63.16% (TRIBL mvdm k3 IG) 61.11% (IB1 JD k5 GR) judgement 28.13% 43.75% 50.00% (IB1 O k1 CS) 50.00% (IB1 mvdm k3 NW) 73.68% (IB1 O k1 CS) lose 52.78% 44.44% 58.33% (IB1 JD K3 IG) 55.56% (IB1 JD k3 GR) 47.06% (IB1 JD k5 CS) mean 52.50% 65.00% 65.00% (IB1 O k1 NW) 65.00% (IB1 JD k5 GR) 64.80% (IB1 mvdm k3 NW) miss 33.33% 46.67% 53.33% (IB1 mvdm k5 GR) 53.33% (IB1 mvdm k5 GR) 66.00% (IB1 O k1 NW) note 55.88% 57.35% 67.65% (IB1 O k3 GR) 64.71% (IB1 O k3 CS) 70.00% (IB1 O k3 CS) operate 38.89% 61.11% 66.67% (IB1 JD K5 NW) 61.11% (IB1 JD k5 GR) 48.42% (IB1 O k1 NW) organization 71.93% 70.18% 77.19% (IB1 mvdm k5 IG) 77.19% (IB1 mvdm k5 GR) 58.95% (IB1 mvdm k3 GR) paper 22.06% 44.12% 48.53% (IB1 JD k5 NW) 46.32% (IB1 mvdm k5 NW) 84.51% (IB1 O k5 GR) party 57.60% 59.20% 65.60% (IB1 mvdm k5 GR) 66.00% (IB1 mvdm k5 GR) 88.89% (IB1 O k5 GR) performance 25.56% 31.11% 32.22% (IB1 O k3 GR) 33.33% (IB1 O k3 CS) 88.89% (IB1 O k5 CS) plan 69.00% 65.00% 71.00% (IB1 O k3 GR) 71.00% (IB1 O k3 CS) 90.14% (IB1 JD k1 GR) play 46.15% 44.23% 51.92% (IB1 mvdm k5 NW) 46.15% (IB1 O k5 GR) 70.00% (IB1 mvdm k1 NW) produce 51.58% 56.84% 62.11% (IB1 JD k5 GR) 57.89% (IB1 JD k5 NW) 54.78% (IB1 JD k5 NW) provide 83.10% 88.73% 91.55% (IB1 O k1 GR) 93.32% (IB1 mvdm k3 NW) 66.07% (IB1 O k1 NW) receive 88.89% 74.07% 88.89% (IB1 O k3 NW) 88.89% (IB1 O k5 GR) 75.00% (IB1 mvdm k1 GR) remain 77.46% 83.10% 87.32% (IB1 O k3 GR) 87.32% (IB1 mvdm k1 GR) 66.97% (IB1 O k1 NW) rule 40.00% 66.67% 66.67% (IB1 O k1 NW) 66.67% (IB1 O k1 NW) 66.97% (TRIBL O k1 GR) shelter 38.26% 52.17% 53.04% (IB1 JD k3 CS) 53.04% (IB1 O k1 NW) 85.93% (IB1 O k1 NW) simple 25.00% 25.00% 35.00% (IB1 mvdm k1 NW) 38.00% (IB1 mvdm k1 NW) 85.93% (IB1 mvdm k1 NW) smell 39.29% 69.64% 76.79% (IB1 mvdm k3 NW) 78.57% (IB1 JD k5 CS) 72.60% (IB1 O k5 GR)
  • 7. solid 26.47% 23.53% 29.41% (IB1 O k5 IG) 33.41% (IB1 mvdm k3 GR) 75.34% (IB1 mvdm k1 GR) sort 57.80% 66.06% 66.06% (IB1 O k1 NW) 66.97% (TRIBL O k1 GR) 66.67% (IB1 O k1 NW) source 60.00% 48.57% 60.00% (IB1 O k3 NW) 62.74% (IB1 O k5 GR) 93.33% (IGTREE O GR) suspend 35.94% 45.31% 50.00% (IB1 mvdm k5 NW) 54.56% (IB1 mvdm k5 GR) 76.47% (IB1 O k1 NW) talk 72.60% 72.60% 76.71% (IB1 JD k3 GR) 76.71% (IB1 mvdm k3 NW) 78.43% (IB1 mvdm k3 NW) treat 28.07% 36.84% 47.37% (IB1 mvdm k3 NW) 48.86% (IB1 mvdm k3 NW) 56.52% (IB1 O k1 NW) use 66.67% 66.67% 93.33% (IGTREE GR) 93.33% (IGTREE O GR) 60.87% (IB1 JD k3 GR) wash 69.70% 60.61% 72.73% (TRIBL O q1 no gain) 78.73% (TRIBL O k1 NW) 52.94% (IB1 O k5 GR) watch 74.51% 78.43% 80.39% (IB1 mvdm k1 NW) 78.43% (IB1 O k1 NW) 71.03% (IB1 O k1 NW) win 43.59% 53.85% 56.41% (IB1 JD k3 NW) 56.41% (IB1 O k3 CS) 75.70% (IB1 mvdm k3 CS) write 34.78% 52.17% 52.17% (IB1 O k1 NW) 58.52% (IB1 O k1 CS) 76.09% (IB1 mvdm k3 NW) Average 51% 57% 62.41% 62.96% 66.69% 6 Future Work There are a number of extensions to our work. Implementing forward or backward selection can be considered for future work. Since the SENSEVAL-3 dataset for English language is not exhaustive, we believe implementing backward selection will be a good approach to further improve the accuracy. In the current approach, we use count of the keywords as a feature value. Going one step further, and find- ing the actual keywords related to the target word, using the TF-IDF approach can be worked upon in the future. Publicly available lexical data sources, like the WordNet, can be leveraged to extract addi- tional contextual information about the target word. Online dictionaries and thesaurus can be used to glean alternative words for the target and context words, and these words can be added as features. Same word might appear in different forms in the test and train file mis- leading TiMBL to believe that they are differ- ent words. This compromises the accuracy of the WSD system. As future work, we plan to substitute the context and target words with the morphological root word. We believe that this will have a positive effect on the results. Lastly, we would like to add senses of words to the feature set. Each sense of the tar- get word can be added as a separate feature, where the feature-value is the total number of keywords present in the example which corre- spond to the given sense. Thus, the feature- value with highest number would be the most probable sense of the target word. 7 Conclusion We believe that the memory-based learning methodology is well-suited for the problem of Word Sense Disambiguation, as it can work well with both dense and sparse data. Memory- based learning can also easily work with ex- ceptional and corner-cases. In this paper, we present our approach for WSD using memory-based learning. We ar- gued that using TiMBL over SVM was a better choice for the current dataset. We also show, how a choice of feature set affects the accuracy of the learner. We propose and subsequently evaluate the improvements made by: 1) adding POS tags to the feature set; and 2) adding keyword information to the feature set. Lastly, we are confident that having senses of the tar- get word as part of the feature set with weights as value of the feature will improve the outputs significantly.
  • 8. 8 Acknowledgement We would like to extend our sincere thanks to Dr. Prof. Sandra Kuebler, for her guidance and direction, without which this paper would not have been a success. We would also like to thank Prof. Kuebler for all the study material and encouragement that lead to the completion of this project. References [1] Daelemans, W., Zavrel, J., Van der Sloot K., and Van den Bosch, A.: TiMBL: Tilburg Memory-Based Learner. Computational Linguis- tics Tilburg University version 5.0.1 December ILK Technical Report - ILK 04–02 (2004). [2] D. Yarowsky. 2000. Hierarchical decision lists for word sense disambiguation. Computers and the Humanities, 34(1{2). [3] Y. Wilks and M. Stevenson. 1998. Word sense disambiguation using optimised combinations of knowledge sources. In Proceedings of COL- ING/ACL-98. [4] "Word Sense Disambiguation." - ACL Wiki. Accessed May 5, 2015. http://aclweb.org/aclwiki/index.php?title=Word_ sense_disambiguation. [5] Pedersen, Ted. "A simple approach to building ensembles of Naive Bayesian classifiers for word sense disambiguation." In Proceedings of the 1st North American chapter of the Associa- tion for Computational Linguistics conference, pp. 63-69. Association for Computational Lin- guistics, 2000. [6] Patwardhan, Siddharth, Satanjeev Banerjee, and Ted Pedersen. "SenseRelate:: TargetWord: a generalized framework for word sense disam- biguation." In Proceedings of the ACL 2005 on Interactive poster and demonstration sessions, pp. 73-76. Association for Computational Lin- guistics, 2005. [7] Veenstra, Jorn, Antal Van den Bosch, Sabine Buchholz, and Walter Daelemans. "Memory- based word sense disambiguation." Computers and the Humanities 34, no. 1-2 (2000): 171-177. [8] Mihalcea, Rada. "Co-training and self-training for word sense disambiguation." In Proceedings of the Conference on Computational Natural Language Learning (CoNLL-2004). 2004. [9] Gaustad, Tanja. "The importance of high quality input for WSD: An application-oriented compar- ison of part-of-speech taggers." In Proceedings of the Australasian Language Technology Work- shop (ALTW 2003), pp. 65-72. 2003. [10] Moreno-Monteagudo, Lorenza, Rubén Izquierdo-Beviá, Patricio Martínez-Barco, and Armando Suárez. "A Study of the Influence of PoS Tagging on WSD." InText, Speech and Dia- logue, pp. 173-179. Springer Berlin Heidelberg, 2006. [11] Escudero, Gerard, Lluís Màrquez, and German Rigau. "Naive Bayes and exemplar- based approaches to word sense disambigua- tion revisited." arXiv preprint cs/0007011 (2000).