SlideShare a Scribd company logo
1 of 68
Download to read offline
Introduction
Example Extraction
Example Selection
Results
Conclusion
Extracting
Sense-Disambiguated Example Sentences
From Parallel Corpora
Gerard de Melo and Gerhard Weikum
Max Planck Institute for Informatics
Saarbr¨ucken, Germany
2009-09-18
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example Selection
4 Results
5 Conclusion
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example Selection
4 Results
5 Conclusion
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
a thin, sharp-pointed metal pin with a raised spiral
thread running around it and a slotted head, used to join
things together by being rotated in under pressure
(Concise OED)
Use a crosshead screwdriver to tighten the screws inserted
in the pre-fixed connectors.
(Web)
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
a thin, sharp-pointed metal pin with a raised spiral
thread running around it and a slotted head, used to join
things together by being rotated in under pressure
(Concise OED)
Use a crosshead screwdriver to tighten the screws inserted
in the pre-fixed connectors.
(Web)
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
any sentence that contains a word (being used in a specific
sense)
allow the user to grasp a word’s meaning
see circumstances a word would typically be used in
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
any sentence that contains a word (being used in a specific
sense)
allow the user to grasp a word’s meaning
see circumstances a word would typically be used in
traditional intensional word definitions may be too confusing
humans used to deriving meaning from context
users can verify whether they have understood definition correctly
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
any sentence that contains a word (being used in a specific
sense)
allow the user to grasp a word’s meaning
see circumstances a word would typically be used in
possible contexts, e.g. child vs. youngster
typical collocations, e.g. to give birth or birth rate
but not *to give nascence or *nascence rate
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
=⇒ most modern dictionaries include example sentences
number, length often limited
digital dictionaries: tight space constraints of print media no
longer apply!
larger number of example sentences can be presented to the
user on demand
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
=⇒ most modern dictionaries include example sentences
number, length often limited
digital dictionaries: tight space constraints of print media no
longer apply!
larger number of example sentences can be presented to the
user on demand
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
=⇒ most modern dictionaries include example sentences
number, length often limited
digital dictionaries: tight space constraints of print media no
longer apply!
larger number of example sentences can be presented to the
user on demand
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Example Sentences
=⇒ most modern dictionaries include example sentences
number, length often limited
digital dictionaries: tight space constraints of print media no
longer apply!
larger number of example sentences can be presented to the
user on demand
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Goals
automatically obtain example sentences for a specific sense of
a word
choose set of representative sentences to present to user
distinguish senses:
There were many bats flying out of the cave.
vs.
In professional baseball, only wooden bats
are permitted.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Goals
automatically obtain example sentences for a specific sense of
a word
choose set of representative sentences to present to user
not all examples are equally useful
screen space may be limited (can still show more only on demand)
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example Selection
4 Results
5 Conclusion
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Index (Dictionary)
English: Princeton WordNet 3.0
Spanish: Spanish WordNet
σ(t): get senses for term t
σ(t, s): is sense s a sense of term t?
behind the scenes: morphological analysis, multi-word
expression detection
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Index (Dictionary)
English: Princeton WordNet 3.0
Spanish: Spanish WordNet
σ(t): get senses for term t
σ(t, s): is sense s a sense of term t?
behind the scenes: morphological analysis, multi-word
expression detection
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Index (Dictionary)
English: Princeton WordNet 3.0
Spanish: Spanish WordNet
σ(t): get senses for term t
σ(t, s): is sense s a sense of term t?
behind the scenes: morphological analysis, multi-word
expression detection
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Disambiguation
disambiguate word occurrences, then use sentence as example
for those word senses
fine-grained WSD not reliable enough (cf. SemEval results)
idea: use parallel corpora, jointly look at both versions of text
=⇒ greater accuracy can be achieved
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Disambiguation
disambiguate word occurrences, then use sentence as example
for those word senses
fine-grained WSD not reliable enough (cf. SemEval results)
idea: use parallel corpora, jointly look at both versions of text
=⇒ greater accuracy can be achieved
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Disambiguation
disambiguate word occurrences, then use sentence as example
for those word senses
fine-grained WSD not reliable enough (cf. SemEval results)
idea: use parallel corpora, jointly look at both versions of text
=⇒ greater accuracy can be achieved
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Sense Disambiguation
disambiguate word occurrences, then use sentence as example
for those word senses
fine-grained WSD not reliable enough (cf. SemEval results)
idea: use parallel corpora, jointly look at both versions of text
=⇒ greater accuracy can be achieved
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Sense Disambiguation
There were many bats flying out of the cave.
Aus der H¨ohle flogen viele Flederm¨ause.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Introduction
Sense Disambiguation
There were many bats flying out of the cave.
Aus der H¨ohle flogen viele Flederm¨ause.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Parallel Disambiguation
req.: text available in two languages a, b
good news: more and more parallel corpora available
word alignment
compute score
wsd(sa|ta, tb) = wsd(sa|ta) σ(ta,sa)csim(tb,sa)
s ∈σ(ta)
σ(ta,s )csim(tb,s )
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Parallel Disambiguation
req.: text available in two languages a, b
good news: more and more parallel corpora available
word alignment
compute score
wsd(sa|ta, tb) = wsd(sa|ta) σ(ta,sa)csim(tb,sa)
s ∈σ(ta)
σ(ta,s )csim(tb,s )
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Parallel Disambiguation
req.: text available in two languages a, b
good news: more and more parallel corpora available
word alignment
compute score
wsd(sa|ta, tb) = wsd(sa|ta) σ(ta,sa)csim(tb,sa)
s ∈σ(ta)
σ(ta,s )csim(tb,s )
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Components
csim(tb, sa): cross-lingual similarity
monolingual WSD: compare bag-of-words vector for term
context with sense context (constructed from definitions, etc.)
Semantic Similarity sim(s1, s2)
identify only near-identical senses (e.g. of house and home),
not arbitrary associations (e.g. house and door)
csim(tb, sa) =
sb∈σ(tb)
sim(sa, sb) wsd(sb|tb)
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Components
csim(tb, sa): cross-lingual similarity
monolingual WSD: compare bag-of-words vector for term
context with sense context (constructed from definitions, etc.)
Semantic Similarity sim(s1, s2)
identify only near-identical senses (e.g. of house and home),
not arbitrary associations (e.g. house and door)
wsd(s|t) = σ(t, s) α + v(s)T
v(t)
||v(s)|| ||v(t)||
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Components
csim(tb, sa): cross-lingual similarity
monolingual WSD: compare bag-of-words vector for term
context with sense context (constructed from definitions, etc.)
Semantic Similarity sim(s1, s2)
identify only near-identical senses (e.g. of house and home),
not arbitrary associations (e.g. house and door)
sim(s1, s2) =



1 s1 = s2
1 s1, s2 in near-synonymy relationship
1 s1, s2 in hypernymy/hyponymy relationship
0 otherwise
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Extraction
Extraction
Use as example sentence iff score sufficiently high
wsd(sa|ta, tb) = wsd(ta, sa) σ(ta,sa)csim(tb,sa)
s ∈σ(ta)
σ(ta,s )csim(tb,s )
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example Selection
4 Results
5 Conclusion
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Task Motivation
for computational applications: more data =⇒ better data
for human users: a good limited selection should be provided
at first
assumption: space constraint k - number of sentences
goal: choose a good set of k example sentences
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Task Motivation
for computational applications: more data =⇒ better data
for human users: a good limited selection should be provided
at first
assumption: space constraint k - number of sentences
goal: choose a good set of k example sentences
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Task Motivation
for computational applications: more data =⇒ better data
for human users: a good limited selection should be provided
at first
assumption: space constraint k - number of sentences
goal: choose a good set of k example sentences
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good example sentence?
one that showcases how a word is used in context
(typical prepositions, collocations, etc.)
one that helps in grasping the meaning
related work by Rychly et al. (2008):
one that is intelligible to learners
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good example sentence?
one that showcases how a word is used in context
(typical prepositions, collocations, etc.)
one that helps in grasping the meaning
related work by Rychly et al. (2008):
one that is intelligible to learners
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good example sentence?
one that showcases how a word is used in context
(typical prepositions, collocations, etc.)
one that helps in grasping the meaning
related work by Rychly et al. (2008):
one that is intelligible to learners
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Assets
example sentences can be thought of as having certain assets
6 different classes of n-gram assets
1 class of assets for entire original sentence
e.g. for account:
containing frequent bigram bank account can be an asset
containing frequent trigram to account for can be an asset
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Assets
example sentences can be thought of as having certain assets
6 different classes of n-gram assets
1 class of assets for entire original sentence
1: original unigram
2-5: bigram with preceding/following, trigram, etc.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Assets
example sentences can be thought of as having certain assets
6 different classes of n-gram assets
1 class of assets for entire original sentence
weights w(a) = f (x,a)
n
i=1 f (xi ,a)
, etc.
=⇒ higher weight for open an account than for Peter’s chequing
account
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Assets
example sentences can be thought of as having certain assets
6 different classes of n-gram assets
1 class of assets for entire original sentence
weight: cosine similarity with sense definition
=⇒ bias towards explanatory sentences
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good set of example sentences?
Select sentences with greatest total asset weights?
Instead: maximize
a∈
x∈C
A(x)
w(a)
Problem is NP-hard (proof via Vertex Cover reduction)
−→ use greedy heuristic
No: Most frequent expressions would dominate the result set.
Need diversity!
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good set of example sentences?
Select sentences with greatest total asset weights?
Instead: maximize
a∈
x∈C
A(x)
w(a)
Problem is NP-hard (proof via Vertex Cover reduction)
−→ use greedy heuristic
each asset in result set only counted once
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good set of example sentences?
Select sentences with greatest total asset weights?
Instead: maximize
a∈
x∈C
A(x)
w(a)
Problem is NP-hard (proof via Vertex Cover reduction)
−→ use greedy heuristic
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
What is a good set of example sentences?
Select sentences with greatest total asset weights?
Instead: maximize
a∈
x∈C
A(x)
w(a)
Problem is NP-hard (proof via Vertex Cover reduction)
−→ use greedy heuristic
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Greedy Heuristic to select set C
Greedily select highest-weighted sentence x:
x ← argmax
x∈XC a∈A(x)
w(a)
Then set the weights w(a) of all assets a ∈ A(x) to zero
=⇒ ranked list obtained (useful!)
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Greedy Heuristic to select set C
Greedily select highest-weighted sentence x:
x ← argmax
x∈XC a∈A(x)
w(a)
Then set the weights w(a) of all assets a ∈ A(x) to zero
=⇒ ranked list obtained (useful!)
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Example Selection
Greedy Heuristic to select set C
Greedily select highest-weighted sentence x:
x ← argmax
x∈XC a∈A(x)
w(a)
Then set the weights w(a) of all assets a ∈ A(x) to zero
=⇒ ranked list obtained (useful!)
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example Selection
4 Results
5 Conclusion
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Resources
OPUS corpora (OpenSubtitles, OpenOffice.org collections)
and Reuters RCV1 for non-disambiguated sentence selection
GIZA++/UPlug for lexical alignment
Princeton WordNet 3.0 and Spanish WordNet
(+ sense mappings for compatibility)
TreeTagger for morphological analysis
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Resources
OPUS corpora (OpenSubtitles, OpenOffice.org collections)
and Reuters RCV1 for non-disambiguated sentence selection
GIZA++/UPlug for lexical alignment
Princeton WordNet 3.0 and Spanish WordNet
(+ sense mappings for compatibility)
TreeTagger for morphological analysis
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Resources
OPUS corpora (OpenSubtitles, OpenOffice.org collections)
and Reuters RCV1 for non-disambiguated sentence selection
GIZA++/UPlug for lexical alignment
Princeton WordNet 3.0 and Spanish WordNet
(+ sense mappings for compatibility)
TreeTagger for morphological analysis
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Resources
OPUS corpora (OpenSubtitles, OpenOffice.org collections)
and Reuters RCV1 for non-disambiguated sentence selection
GIZA++/UPlug for lexical alignment
Princeton WordNet 3.0 and Spanish WordNet
(+ sense mappings for compatibility)
TreeTagger for morphological analysis
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Examples from OpenSubtitles Corpus
line (something, as a
cord or rope, that is long
and thin and flexible)
I got some fishing line if you want
me to stitch that.
Von Sefelt, get the stern line.
line (the descendants of
one individual)
What line of kings do you descend
from?
My line has ended.
catch (catch up with
and possibly overtake)
He’s got 100 laps to catch Beau
Brandenburg if he wants to become
world champion.
They won’t catch up.
catch (grasp with the I didn’t catch your name.
mind or develop an
understanding of)
Sorry, I didn’t catch it.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Examples from OpenSubtitles Corpus
talk (exchange thoughts,
talk with)
Why don’t we have a seat and talk it
over.
Okay, I’ll talk to you but one
condition...
talk (use language) But we’ll be listening from the kitchen
so talk loud.
You spit when you talk.
opening (a ceremony
accompanying the start
of some enterprise)
We don’t have much time until the
opening day of Exhibition.
What a disaster tomorrow is the
opening ceremony!
opening (the first
performance, as of a
theatrical production)
It will be rehearsed in the morning
ready for the opening tomorrow night.
You ready for our big opening night?
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Sense-disambiguated Example Sentences
Corpus Covered
Senses
Example
Sentences
Accuracy
(Wilson
interval)
OpenSubtitles en-es 13,559 117,078 0.815 ± 0.081
OpenSubtitles es-en 8,833 113,018 0.798 ± 0.090
OpenOffice.org en-es 1,341 13,295 0.803 ± 0.081
OpenOffice.org es-en 932 11,181 0.793 ± 0.087
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
error analysis:
(1) incorrect lexical alignments unlikely to lead to incorrect
disambiguation
(2) incomplete sense inventories can lead to mistakes
(3) also, on a few occasions, the morphological analyser led to
wrong results
implications for future work:
(1) better monolingual WSD
(2) additional languages, especially phylogenetically unrelated
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
error analysis:
(1) incorrect lexical alignments unlikely to lead to incorrect
disambiguation
(2) incomplete sense inventories can lead to mistakes
(3) also, on a few occasions, the morphological analyser led to
wrong results
implications for future work:
(1) better monolingual WSD
(2) additional languages, especially phylogenetically unrelated
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for OpenSubtitles Corpus
being or located on or
directed toward the
1. In America we drive on the right
side of the road.
side of the body to the
east when facing north
2. I’ll tie down your right arm so
you can learn to throw a left.
3. If we wait from the right side, we
have an advantage there.
put up with something 1. You can’t stand it, can you?
or somebody
unpleasant
2. You really think I can tolerate
such an act?
3. No one can stand that harmonica
all day long.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for OpenSubtitles Corpus
using or providing or 1. Not the electric chair.
producing or
transmitting or
2. Some electrical current
circulating through my body.
operated by electricity 3. Near as I can tell it’s an
electrical impulse.
take something or
somebody with
1. And they were kind enough to
take me in here.
oneself somewhere 2. It conveys such a great feeling.
3. We interrupt this program to
bring you a special news bulletin.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for long in RCV1 Corpus
1. In the long term interest rate market, the yield of the key
182nd 10 year Japanese government bond (JGB) fell to
2.060 percent early on Tuesday, a record low for any
benchmark 10-year JGB.
2. “The government and opposition have gambled away the last
chance for a long time to prove they recognise the country’s
problems, and that they put the national good above their
own power interests”, news weekly Der Spiegel said.
3. As long as the index keeps hovering between 957 and 995,
we will maintain our short term neutral recommendation.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for purchase in RCV1 Corpus
1. Romania’s State Ownership Fund (FPS), the country’s main
privatisation body, said on Wednesday it had accepted five
bids for the purchase of a 50.98 percent stake in the largest
local cement maker Romcim.
2. Grand Hotel Group said on Wednesday it has agreed to
procure an option to purchase the remaining 50 percent of
the Grand Hyatt complex in Melbourne from hotel developer
and investor Lustig & Moar.
3. The purchase price for the business, which had 1996
calendar year sales of about $25 million, was not disclosed.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Results
Rankings for colonial in RCV1 Corpus
1. Hong Kong came to the end of 156 years of British colonial
rule on June 30 and is now an autonomous capitalist region
of China, running all its own affairs except defence and
diplomacy.
2. The letter was sent in error to the embassy of Portugal – the
former colonial power in East Timor – and was neither
returned nor forwarded to the Indonesian embassy.
3. Sino-British relations hit a snag when former Governor Chris
Patten launched electoral reforms in the twilight years of
colonial rule despite fierce opposition by Beijing.
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Outline
1 Introduction
2 Example Extraction
3 Example Selection
4 Results
5 Conclusion
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Conclusion
Framework for:
extracting sense-disambiguated example sentences from
parallel corpora
selecting limited numbers of sentences given space constraints
Future Work:
better disambiguation, e.g. additional languages, better
techniques
additional input for selection: sentence length, definition
extraction techniques
integrated user interface for UWN (Universal Wordnet)
Contact: demelo@mpi-inf.mpg.de
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Conclusion
Framework for:
extracting sense-disambiguated example sentences from
parallel corpora
selecting limited numbers of sentences given space constraints
Future Work:
better disambiguation, e.g. additional languages, better
techniques
additional input for selection: sentence length, definition
extraction techniques
integrated user interface for UWN (Universal Wordnet)
Contact: demelo@mpi-inf.mpg.de
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
Introduction
Example Extraction
Example Selection
Results
Conclusion
Conclusion
Framework for:
extracting sense-disambiguated example sentences from
parallel corpora
selecting limited numbers of sentences given space constraints
Future Work:
better disambiguation, e.g. additional languages, better
techniques
additional input for selection: sentence length, definition
extraction techniques
integrated user interface for UWN (Universal Wordnet)
Contact: demelo@mpi-inf.mpg.de
G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences

More Related Content

Similar to Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

Academic writing: pointers for G-cube PhD students 23-01.12
Academic writing: pointers for G-cube PhD students 23-01.12Academic writing: pointers for G-cube PhD students 23-01.12
Academic writing: pointers for G-cube PhD students 23-01.12Lawrie Hunter
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsTae Hwan Jung
 
More on Indexing Text Operations (1).pptx
More on Indexing  Text Operations (1).pptxMore on Indexing  Text Operations (1).pptx
More on Indexing Text Operations (1).pptxMahsadelavari
 
Summary distributed representations_words_phrases
Summary distributed representations_words_phrasesSummary distributed representations_words_phrases
Summary distributed representations_words_phrasesYue Xiangnan
 
Oxford English for Careers_ Technology 1 Student's Book ( PDFDrive ).pdf
Oxford English for Careers_ Technology 1 Student's Book   ( PDFDrive ).pdfOxford English for Careers_ Technology 1 Student's Book   ( PDFDrive ).pdf
Oxford English for Careers_ Technology 1 Student's Book ( PDFDrive ).pdfbeatrix15
 
Word embeddings
Word embeddingsWord embeddings
Word embeddingsShruti kar
 
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined EvidenceTowards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined EvidenceGerard de Melo
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemIJERA Editor
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics Ibutest
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...John Tinsley
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Iconic Translation Machines
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Toru Fujino
 
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVALA NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVALIJNSA Journal
 
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVALA NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVALIJNSA Journal
 
Using ontology based context in the
Using ontology based context in theUsing ontology based context in the
Using ontology based context in theijaia
 

Similar to Extracting Sense-Disambiguated Example Sentences From Parallel Corpora (20)

Academic writing: pointers for G-cube PhD students 23-01.12
Academic writing: pointers for G-cube PhD students 23-01.12Academic writing: pointers for G-cube PhD students 23-01.12
Academic writing: pointers for G-cube PhD students 23-01.12
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
More on Indexing Text Operations (1).pptx
More on Indexing  Text Operations (1).pptxMore on Indexing  Text Operations (1).pptx
More on Indexing Text Operations (1).pptx
 
Summary distributed representations_words_phrases
Summary distributed representations_words_phrasesSummary distributed representations_words_phrases
Summary distributed representations_words_phrases
 
Oxford English for Careers_ Technology 1 Student's Book ( PDFDrive ).pdf
Oxford English for Careers_ Technology 1 Student's Book   ( PDFDrive ).pdfOxford English for Careers_ Technology 1 Student's Book   ( PDFDrive ).pdf
Oxford English for Careers_ Technology 1 Student's Book ( PDFDrive ).pdf
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
Abstract
AbstractAbstract
Abstract
 
Towards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined EvidenceTowards a Universal Wordnet by Learning from Combined Evidence
Towards a Universal Wordnet by Learning from Combined Evidence
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
 
Convert to journal
Convert to journalConvert to journal
Convert to journal
 
CMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics ICMSC 723: Computational Linguistics I
CMSC 723: Computational Linguistics I
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...
 
Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...Past, Present, and Future: Machine Translation & Natural Language Processing ...
Past, Present, and Future: Machine Translation & Natural Language Processing ...
 
Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)Dual Learning for Machine Translation (NIPS 2016)
Dual Learning for Machine Translation (NIPS 2016)
 
Samr
SamrSamr
Samr
 
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVALA NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
 
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVALA NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
A NEW STEMMER TO IMPROVE INFORMATION RETRIEVAL
 
Lucia Specia - SMT e pós-edição
Lucia Specia - SMT e pós-ediçãoLucia Specia - SMT e pós-edição
Lucia Specia - SMT e pós-edição
 
FinalReport
FinalReportFinalReport
FinalReport
 
Using ontology based context in the
Using ontology based context in theUsing ontology based context in the
Using ontology based context in the
 

More from Gerard de Melo

SEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link PredictionSEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link PredictionGerard de Melo
 
How to Manage your Research
How to Manage your ResearchHow to Manage your Research
How to Manage your ResearchGerard de Melo
 
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesKnowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesGerard de Melo
 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebGerard de Melo
 
From Big Data to Valuable Knowledge
From Big Data to Valuable KnowledgeFrom Big Data to Valuable Knowledge
From Big Data to Valuable KnowledgeGerard de Melo
 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningGerard de Melo
 
Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Gerard de Melo
 
From Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated DataFrom Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated DataGerard de Melo
 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataGerard de Melo
 
UWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge BaseUWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge BaseGerard de Melo
 
Not Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked DataNot Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked DataGerard de Melo
 
Good, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic IntensitiesGood, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic IntensitiesGerard de Melo
 
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged OntologyYAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged OntologyGerard de Melo
 

More from Gerard de Melo (13)

SEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link PredictionSEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link Prediction
 
How to Manage your Research
How to Manage your ResearchHow to Manage your Research
How to Manage your Research
 
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesKnowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the Web
 
From Big Data to Valuable Knowledge
From Big Data to Valuable KnowledgeFrom Big Data to Valuable Knowledge
From Big Data to Valuable Knowledge
 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data Mining
 
Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)
 
From Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated DataFrom Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated Data
 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram Data
 
UWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge BaseUWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge Base
 
Not Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked DataNot Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked Data
 
Good, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic IntensitiesGood, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic Intensities
 
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged OntologyYAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
 

Recently uploaded

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 

Recently uploaded (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 

Extracting Sense-Disambiguated Example Sentences From Parallel Corpora

  • 1. Introduction Example Extraction Example Selection Results Conclusion Extracting Sense-Disambiguated Example Sentences From Parallel Corpora Gerard de Melo and Gerhard Weikum Max Planck Institute for Informatics Saarbr¨ucken, Germany 2009-09-18 G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 2. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 3. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 4. Introduction Example Extraction Example Selection Results Conclusion Introduction a thin, sharp-pointed metal pin with a raised spiral thread running around it and a slotted head, used to join things together by being rotated in under pressure (Concise OED) Use a crosshead screwdriver to tighten the screws inserted in the pre-fixed connectors. (Web) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 5. Introduction Example Extraction Example Selection Results Conclusion Introduction a thin, sharp-pointed metal pin with a raised spiral thread running around it and a slotted head, used to join things together by being rotated in under pressure (Concise OED) Use a crosshead screwdriver to tighten the screws inserted in the pre-fixed connectors. (Web) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 6. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences any sentence that contains a word (being used in a specific sense) allow the user to grasp a word’s meaning see circumstances a word would typically be used in G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 7. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences any sentence that contains a word (being used in a specific sense) allow the user to grasp a word’s meaning see circumstances a word would typically be used in traditional intensional word definitions may be too confusing humans used to deriving meaning from context users can verify whether they have understood definition correctly G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 8. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences any sentence that contains a word (being used in a specific sense) allow the user to grasp a word’s meaning see circumstances a word would typically be used in possible contexts, e.g. child vs. youngster typical collocations, e.g. to give birth or birth rate but not *to give nascence or *nascence rate G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 9. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences =⇒ most modern dictionaries include example sentences number, length often limited digital dictionaries: tight space constraints of print media no longer apply! larger number of example sentences can be presented to the user on demand G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 10. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences =⇒ most modern dictionaries include example sentences number, length often limited digital dictionaries: tight space constraints of print media no longer apply! larger number of example sentences can be presented to the user on demand G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 11. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences =⇒ most modern dictionaries include example sentences number, length often limited digital dictionaries: tight space constraints of print media no longer apply! larger number of example sentences can be presented to the user on demand G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 12. Introduction Example Extraction Example Selection Results Conclusion Introduction Example Sentences =⇒ most modern dictionaries include example sentences number, length often limited digital dictionaries: tight space constraints of print media no longer apply! larger number of example sentences can be presented to the user on demand G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 13. Introduction Example Extraction Example Selection Results Conclusion Introduction Goals automatically obtain example sentences for a specific sense of a word choose set of representative sentences to present to user distinguish senses: There were many bats flying out of the cave. vs. In professional baseball, only wooden bats are permitted. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 14. Introduction Example Extraction Example Selection Results Conclusion Introduction Goals automatically obtain example sentences for a specific sense of a word choose set of representative sentences to present to user not all examples are equally useful screen space may be limited (can still show more only on demand) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 15. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 16. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Index (Dictionary) English: Princeton WordNet 3.0 Spanish: Spanish WordNet σ(t): get senses for term t σ(t, s): is sense s a sense of term t? behind the scenes: morphological analysis, multi-word expression detection G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 17. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Index (Dictionary) English: Princeton WordNet 3.0 Spanish: Spanish WordNet σ(t): get senses for term t σ(t, s): is sense s a sense of term t? behind the scenes: morphological analysis, multi-word expression detection G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 18. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Index (Dictionary) English: Princeton WordNet 3.0 Spanish: Spanish WordNet σ(t): get senses for term t σ(t, s): is sense s a sense of term t? behind the scenes: morphological analysis, multi-word expression detection G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 19. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Disambiguation disambiguate word occurrences, then use sentence as example for those word senses fine-grained WSD not reliable enough (cf. SemEval results) idea: use parallel corpora, jointly look at both versions of text =⇒ greater accuracy can be achieved G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 20. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Disambiguation disambiguate word occurrences, then use sentence as example for those word senses fine-grained WSD not reliable enough (cf. SemEval results) idea: use parallel corpora, jointly look at both versions of text =⇒ greater accuracy can be achieved G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 21. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Disambiguation disambiguate word occurrences, then use sentence as example for those word senses fine-grained WSD not reliable enough (cf. SemEval results) idea: use parallel corpora, jointly look at both versions of text =⇒ greater accuracy can be achieved G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 22. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Sense Disambiguation disambiguate word occurrences, then use sentence as example for those word senses fine-grained WSD not reliable enough (cf. SemEval results) idea: use parallel corpora, jointly look at both versions of text =⇒ greater accuracy can be achieved G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 23. Introduction Example Extraction Example Selection Results Conclusion Introduction Sense Disambiguation There were many bats flying out of the cave. Aus der H¨ohle flogen viele Flederm¨ause. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 24. Introduction Example Extraction Example Selection Results Conclusion Introduction Sense Disambiguation There were many bats flying out of the cave. Aus der H¨ohle flogen viele Flederm¨ause. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 25. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Parallel Disambiguation req.: text available in two languages a, b good news: more and more parallel corpora available word alignment compute score wsd(sa|ta, tb) = wsd(sa|ta) σ(ta,sa)csim(tb,sa) s ∈σ(ta) σ(ta,s )csim(tb,s ) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 26. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Parallel Disambiguation req.: text available in two languages a, b good news: more and more parallel corpora available word alignment compute score wsd(sa|ta, tb) = wsd(sa|ta) σ(ta,sa)csim(tb,sa) s ∈σ(ta) σ(ta,s )csim(tb,s ) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 27. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Parallel Disambiguation req.: text available in two languages a, b good news: more and more parallel corpora available word alignment compute score wsd(sa|ta, tb) = wsd(sa|ta) σ(ta,sa)csim(tb,sa) s ∈σ(ta) σ(ta,s )csim(tb,s ) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 28. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Components csim(tb, sa): cross-lingual similarity monolingual WSD: compare bag-of-words vector for term context with sense context (constructed from definitions, etc.) Semantic Similarity sim(s1, s2) identify only near-identical senses (e.g. of house and home), not arbitrary associations (e.g. house and door) csim(tb, sa) = sb∈σ(tb) sim(sa, sb) wsd(sb|tb) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 29. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Components csim(tb, sa): cross-lingual similarity monolingual WSD: compare bag-of-words vector for term context with sense context (constructed from definitions, etc.) Semantic Similarity sim(s1, s2) identify only near-identical senses (e.g. of house and home), not arbitrary associations (e.g. house and door) wsd(s|t) = σ(t, s) α + v(s)T v(t) ||v(s)|| ||v(t)|| G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 30. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Components csim(tb, sa): cross-lingual similarity monolingual WSD: compare bag-of-words vector for term context with sense context (constructed from definitions, etc.) Semantic Similarity sim(s1, s2) identify only near-identical senses (e.g. of house and home), not arbitrary associations (e.g. house and door) sim(s1, s2) =    1 s1 = s2 1 s1, s2 in near-synonymy relationship 1 s1, s2 in hypernymy/hyponymy relationship 0 otherwise G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 31. Introduction Example Extraction Example Selection Results Conclusion Example Extraction Extraction Use as example sentence iff score sufficiently high wsd(sa|ta, tb) = wsd(ta, sa) σ(ta,sa)csim(tb,sa) s ∈σ(ta) σ(ta,s )csim(tb,s ) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 32. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 33. Introduction Example Extraction Example Selection Results Conclusion Example Selection Task Motivation for computational applications: more data =⇒ better data for human users: a good limited selection should be provided at first assumption: space constraint k - number of sentences goal: choose a good set of k example sentences G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 34. Introduction Example Extraction Example Selection Results Conclusion Example Selection Task Motivation for computational applications: more data =⇒ better data for human users: a good limited selection should be provided at first assumption: space constraint k - number of sentences goal: choose a good set of k example sentences G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 35. Introduction Example Extraction Example Selection Results Conclusion Example Selection Task Motivation for computational applications: more data =⇒ better data for human users: a good limited selection should be provided at first assumption: space constraint k - number of sentences goal: choose a good set of k example sentences G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 36. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good example sentence? one that showcases how a word is used in context (typical prepositions, collocations, etc.) one that helps in grasping the meaning related work by Rychly et al. (2008): one that is intelligible to learners G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 37. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good example sentence? one that showcases how a word is used in context (typical prepositions, collocations, etc.) one that helps in grasping the meaning related work by Rychly et al. (2008): one that is intelligible to learners G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 38. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good example sentence? one that showcases how a word is used in context (typical prepositions, collocations, etc.) one that helps in grasping the meaning related work by Rychly et al. (2008): one that is intelligible to learners G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 39. Introduction Example Extraction Example Selection Results Conclusion Example Selection Assets example sentences can be thought of as having certain assets 6 different classes of n-gram assets 1 class of assets for entire original sentence e.g. for account: containing frequent bigram bank account can be an asset containing frequent trigram to account for can be an asset G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 40. Introduction Example Extraction Example Selection Results Conclusion Example Selection Assets example sentences can be thought of as having certain assets 6 different classes of n-gram assets 1 class of assets for entire original sentence 1: original unigram 2-5: bigram with preceding/following, trigram, etc. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 41. Introduction Example Extraction Example Selection Results Conclusion Example Selection Assets example sentences can be thought of as having certain assets 6 different classes of n-gram assets 1 class of assets for entire original sentence weights w(a) = f (x,a) n i=1 f (xi ,a) , etc. =⇒ higher weight for open an account than for Peter’s chequing account G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 42. Introduction Example Extraction Example Selection Results Conclusion Example Selection Assets example sentences can be thought of as having certain assets 6 different classes of n-gram assets 1 class of assets for entire original sentence weight: cosine similarity with sense definition =⇒ bias towards explanatory sentences G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 43. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good set of example sentences? Select sentences with greatest total asset weights? Instead: maximize a∈ x∈C A(x) w(a) Problem is NP-hard (proof via Vertex Cover reduction) −→ use greedy heuristic No: Most frequent expressions would dominate the result set. Need diversity! G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 44. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good set of example sentences? Select sentences with greatest total asset weights? Instead: maximize a∈ x∈C A(x) w(a) Problem is NP-hard (proof via Vertex Cover reduction) −→ use greedy heuristic each asset in result set only counted once G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 45. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good set of example sentences? Select sentences with greatest total asset weights? Instead: maximize a∈ x∈C A(x) w(a) Problem is NP-hard (proof via Vertex Cover reduction) −→ use greedy heuristic G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 46. Introduction Example Extraction Example Selection Results Conclusion Example Selection What is a good set of example sentences? Select sentences with greatest total asset weights? Instead: maximize a∈ x∈C A(x) w(a) Problem is NP-hard (proof via Vertex Cover reduction) −→ use greedy heuristic G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 47. Introduction Example Extraction Example Selection Results Conclusion Example Selection Greedy Heuristic to select set C Greedily select highest-weighted sentence x: x ← argmax x∈XC a∈A(x) w(a) Then set the weights w(a) of all assets a ∈ A(x) to zero =⇒ ranked list obtained (useful!) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 48. Introduction Example Extraction Example Selection Results Conclusion Example Selection Greedy Heuristic to select set C Greedily select highest-weighted sentence x: x ← argmax x∈XC a∈A(x) w(a) Then set the weights w(a) of all assets a ∈ A(x) to zero =⇒ ranked list obtained (useful!) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 49. Introduction Example Extraction Example Selection Results Conclusion Example Selection Greedy Heuristic to select set C Greedily select highest-weighted sentence x: x ← argmax x∈XC a∈A(x) w(a) Then set the weights w(a) of all assets a ∈ A(x) to zero =⇒ ranked list obtained (useful!) G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 50. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 51. Introduction Example Extraction Example Selection Results Conclusion Results Resources OPUS corpora (OpenSubtitles, OpenOffice.org collections) and Reuters RCV1 for non-disambiguated sentence selection GIZA++/UPlug for lexical alignment Princeton WordNet 3.0 and Spanish WordNet (+ sense mappings for compatibility) TreeTagger for morphological analysis G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 52. Introduction Example Extraction Example Selection Results Conclusion Results Resources OPUS corpora (OpenSubtitles, OpenOffice.org collections) and Reuters RCV1 for non-disambiguated sentence selection GIZA++/UPlug for lexical alignment Princeton WordNet 3.0 and Spanish WordNet (+ sense mappings for compatibility) TreeTagger for morphological analysis G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 53. Introduction Example Extraction Example Selection Results Conclusion Results Resources OPUS corpora (OpenSubtitles, OpenOffice.org collections) and Reuters RCV1 for non-disambiguated sentence selection GIZA++/UPlug for lexical alignment Princeton WordNet 3.0 and Spanish WordNet (+ sense mappings for compatibility) TreeTagger for morphological analysis G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 54. Introduction Example Extraction Example Selection Results Conclusion Results Resources OPUS corpora (OpenSubtitles, OpenOffice.org collections) and Reuters RCV1 for non-disambiguated sentence selection GIZA++/UPlug for lexical alignment Princeton WordNet 3.0 and Spanish WordNet (+ sense mappings for compatibility) TreeTagger for morphological analysis G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 55. Introduction Example Extraction Example Selection Results Conclusion Results Examples from OpenSubtitles Corpus line (something, as a cord or rope, that is long and thin and flexible) I got some fishing line if you want me to stitch that. Von Sefelt, get the stern line. line (the descendants of one individual) What line of kings do you descend from? My line has ended. catch (catch up with and possibly overtake) He’s got 100 laps to catch Beau Brandenburg if he wants to become world champion. They won’t catch up. catch (grasp with the I didn’t catch your name. mind or develop an understanding of) Sorry, I didn’t catch it. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 56. Introduction Example Extraction Example Selection Results Conclusion Results Examples from OpenSubtitles Corpus talk (exchange thoughts, talk with) Why don’t we have a seat and talk it over. Okay, I’ll talk to you but one condition... talk (use language) But we’ll be listening from the kitchen so talk loud. You spit when you talk. opening (a ceremony accompanying the start of some enterprise) We don’t have much time until the opening day of Exhibition. What a disaster tomorrow is the opening ceremony! opening (the first performance, as of a theatrical production) It will be rehearsed in the morning ready for the opening tomorrow night. You ready for our big opening night? G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 57. Introduction Example Extraction Example Selection Results Conclusion Results Sense-disambiguated Example Sentences Corpus Covered Senses Example Sentences Accuracy (Wilson interval) OpenSubtitles en-es 13,559 117,078 0.815 ± 0.081 OpenSubtitles es-en 8,833 113,018 0.798 ± 0.090 OpenOffice.org en-es 1,341 13,295 0.803 ± 0.081 OpenOffice.org es-en 932 11,181 0.793 ± 0.087 G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 58. Introduction Example Extraction Example Selection Results Conclusion Results error analysis: (1) incorrect lexical alignments unlikely to lead to incorrect disambiguation (2) incomplete sense inventories can lead to mistakes (3) also, on a few occasions, the morphological analyser led to wrong results implications for future work: (1) better monolingual WSD (2) additional languages, especially phylogenetically unrelated G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 59. Introduction Example Extraction Example Selection Results Conclusion Results error analysis: (1) incorrect lexical alignments unlikely to lead to incorrect disambiguation (2) incomplete sense inventories can lead to mistakes (3) also, on a few occasions, the morphological analyser led to wrong results implications for future work: (1) better monolingual WSD (2) additional languages, especially phylogenetically unrelated G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 60. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for OpenSubtitles Corpus being or located on or directed toward the 1. In America we drive on the right side of the road. side of the body to the east when facing north 2. I’ll tie down your right arm so you can learn to throw a left. 3. If we wait from the right side, we have an advantage there. put up with something 1. You can’t stand it, can you? or somebody unpleasant 2. You really think I can tolerate such an act? 3. No one can stand that harmonica all day long. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 61. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for OpenSubtitles Corpus using or providing or 1. Not the electric chair. producing or transmitting or 2. Some electrical current circulating through my body. operated by electricity 3. Near as I can tell it’s an electrical impulse. take something or somebody with 1. And they were kind enough to take me in here. oneself somewhere 2. It conveys such a great feeling. 3. We interrupt this program to bring you a special news bulletin. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 62. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for long in RCV1 Corpus 1. In the long term interest rate market, the yield of the key 182nd 10 year Japanese government bond (JGB) fell to 2.060 percent early on Tuesday, a record low for any benchmark 10-year JGB. 2. “The government and opposition have gambled away the last chance for a long time to prove they recognise the country’s problems, and that they put the national good above their own power interests”, news weekly Der Spiegel said. 3. As long as the index keeps hovering between 957 and 995, we will maintain our short term neutral recommendation. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 63. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for purchase in RCV1 Corpus 1. Romania’s State Ownership Fund (FPS), the country’s main privatisation body, said on Wednesday it had accepted five bids for the purchase of a 50.98 percent stake in the largest local cement maker Romcim. 2. Grand Hotel Group said on Wednesday it has agreed to procure an option to purchase the remaining 50 percent of the Grand Hyatt complex in Melbourne from hotel developer and investor Lustig & Moar. 3. The purchase price for the business, which had 1996 calendar year sales of about $25 million, was not disclosed. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 64. Introduction Example Extraction Example Selection Results Conclusion Results Rankings for colonial in RCV1 Corpus 1. Hong Kong came to the end of 156 years of British colonial rule on June 30 and is now an autonomous capitalist region of China, running all its own affairs except defence and diplomacy. 2. The letter was sent in error to the embassy of Portugal – the former colonial power in East Timor – and was neither returned nor forwarded to the Indonesian embassy. 3. Sino-British relations hit a snag when former Governor Chris Patten launched electoral reforms in the twilight years of colonial rule despite fierce opposition by Beijing. G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 65. Introduction Example Extraction Example Selection Results Conclusion Outline 1 Introduction 2 Example Extraction 3 Example Selection 4 Results 5 Conclusion G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 66. Introduction Example Extraction Example Selection Results Conclusion Conclusion Framework for: extracting sense-disambiguated example sentences from parallel corpora selecting limited numbers of sentences given space constraints Future Work: better disambiguation, e.g. additional languages, better techniques additional input for selection: sentence length, definition extraction techniques integrated user interface for UWN (Universal Wordnet) Contact: demelo@mpi-inf.mpg.de G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 67. Introduction Example Extraction Example Selection Results Conclusion Conclusion Framework for: extracting sense-disambiguated example sentences from parallel corpora selecting limited numbers of sentences given space constraints Future Work: better disambiguation, e.g. additional languages, better techniques additional input for selection: sentence length, definition extraction techniques integrated user interface for UWN (Universal Wordnet) Contact: demelo@mpi-inf.mpg.de G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences
  • 68. Introduction Example Extraction Example Selection Results Conclusion Conclusion Framework for: extracting sense-disambiguated example sentences from parallel corpora selecting limited numbers of sentences given space constraints Future Work: better disambiguation, e.g. additional languages, better techniques additional input for selection: sentence length, definition extraction techniques integrated user interface for UWN (Universal Wordnet) Contact: demelo@mpi-inf.mpg.de G. de Melo / G. Weikum, Max-Planck-Institut Informatik Sense-Disambiguated Example Sentences