Automatic Term Ambiguity Detection for Information Extraction

Automatic
Term Ambiguity Detection
Tyler Baldwin, Yunyao Li,
Bogdan Alexe, Ioana R. Stanoi
IBM Research - Almaden

What is the buzz about
Brave on Twitter?

Find tweets about the movie Brave:
Movie night watching brave with Cammie n Isla n
loads munchies
This brave girl deserves endless retweets!
Watching brave with the kiddos!
watching Bregor playing Civ 5: Brave New World and
thinking of getting it

Skyfall 007 in class with @MariaWiheelste
So I was dead set on seeing skyfall 007 for like a year
NowWatching #skyFall 007!
What movie amazed u — skyfall 007

Existing Disambiguation Methods

Word Sense Disambiguation (WSD)

Which word sense does this instance refer to?

Named Entity Disambiguation (NED)

Which entity type is this instance associated with?

Existing Disambiguation Methods

Word Sense Disambiguation (WSD)

Which word sense does this specific instance refers to?

Named Entity Disambiguation (NED)

Which entity type is this individual instance associated with?

Limitations:

Assume the number of senses/entities is known
− Often not the case

Inefficient on very large data sets
− Attempt to disambiguate each instance

Term Ambiguity Detection (TAD)

Perform term disambiguation at the term, not
instance level

Given a term T and its category C, do all the
mentions of the term reference a member of that
category?


Perform term disambiguation at the term, not instance
level

Given a term T and its category C, do all the
mentions of the term reference a member of that
category?

Level of ambiguity of the term

Hybrid information extraction (IE) systems
− Simpler model if the term unambiguous
− More complex model otherwise

Potentially useful for other NLP tasks

CameraEOS 5D
Video Game
A New Beginning
MovieSkyfall 007
MovieBrave
CategoryTerm
Video Game
A New Beginning
MovieBrave
CategoryTerm
Ambiguous
CameraEOS 5D
MovieSkyfall 007
CategoryTerm
Unambiguous
TAD

TAD Framework
Step 1: N-gram
Step 2: Ontology
Step 3: Clustering
Ambiguous
Unambiguous

TAD Framework
Step 1: N-gram
Does the term share a name
with a common word/phrase?
1. Normalize input term t
(stopword removel + lowercase)
2. Calculate unigram probability
3. Ambiguous if the probability is
above the empirically
determined threshold
Ambiguous
Unambiguous

TAD Framework
Step 1: N-gram
Step 2: Ontology
• Wiktionary:
Ambiguous if term has several
senses in Wiktionary
• Wikipedia:
Ambiguous if term has a
Wikipedia disambiguation page
Ambiguous
Unambiguous

TAD Framework
Step 1: N-gram
Step 2: Ontology
Step 3: Clustering
Cluster the contexts in which
the term appear
Ambiguous
Unambiguous
1. Remove stopwords and infrequent
words from all documens
containing the term
2. Cluster the document using Latent
Dirichlet Allocation (LDA)
3. Ambiguous if category term or
WordNet synonym does not appear
in the most heavily weighted terms
of any cluster

Evaluation

Dataset: terms from 4 product domains:

Movies, Video Games, Cameras, Books
− 100 terms per domain
− Extracted randomly from dbpedia and Flickr

Gold standard: ambiguity determined by
examining usage in TREC Tweets2011 corpus

10 tweets labeled per term
− Unambiguous only if all tweets reference category

Questions to Answer

How effective is TAD?

How useful is TAD?

Results - Effectiveness

Each module produced above baseline performance
Configuration Precision Recall F-measure
Majority Class 0.675 1.0 0.806
N-gram (NG) 0.979 0.848 0.909
Ontology (ON) 0.979 0.704 0.819
Clustering (CL) 0.946 0.848 0.895
NG + ON 0.980 0.919 0.948
NG + CL 0.942 0.963 0.952
ON + CL 0.945 0.956 0.950
All 0.943 0.978 0.960


Ontology method is of limited usage, as most of the
terms cannot be found in the ontology.
N-gram (NG) 0.979 0.848 0.909
Ontology (ON) 0.979 0.704 0.819
Clustering (CL) 0.946 0.848 0.895
NG + ON 0.980 0.919 0.948
NG + CL 0.942 0.963 0.952
ON + CL 0.945 0.956 0.950
All 0.943 0.978 0.960


Each module produced above baseline performance

Combined framework produced high F-measure of 0.96
N-gram (NG) 0.979 0.848 0.909
Ontology (ON) 0.979 0.704 0.819
Clustering (CL) 0.946 0.848 0.895
NG + ON 0.980 0.919 0.948
NG + CL 0.942 0.963 0.952
ON + CL 0.945 0.956 0.950
All 0.943 0.978 0.960

Results - Usefulness

Integrated TAD pipeline into commercially
available IE system

Extracted mentions of terms from Camera and
Video game domains on Twitter data

Manually judged relevance of extracted Tweets

Results - Usefulness

Using ambiguity detection hurt recall

Only 57% of the relevant documents returned
with TAD

Ambiguity detection necessary for high
precision

w/ ambiguity detection:
− Precision: 0.96

w/o ambiguity detection
− Precision: 0.16

Conclusion

Term ambiguity detection is helpful for large-
scale information extraction

Able to detect ambiguity when number of senses is
unknown

Able to be applied to large datasets where instance-
level interpretation is impractical

3-Module TAD approach results is high
performance

Detects ambiguity with F-measure of 0.96

Allows IE system to produce high precision

TAD Framework
N-gram
suggests non-
referential
instances
Ontology
suggests
across
domain
instances
Clustering
suggests
either case
Ambiguous
Terms
Unambiguous
Terms
Yes
Yes
Yes
No
No
No
N-gram
Ontology
Clustering

Automatic Term Ambiguity Detection for Information Extraction

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (20)

Semelhante a Automatic Term Ambiguity Detection for Information Extraction

Semelhante a Automatic Term Ambiguity Detection for Information Extraction (20)

Mais de Yunyao Li

Mais de Yunyao Li (20)

Último

Último (20)

Automatic Term Ambiguity Detection for Information Extraction

Notas do Editor