SlideShare uma empresa Scribd logo
1 de 65
Baixar para ler offline
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Towards a Universal Wordnet
by Learning from Combined Evidence
Gerard de Melo and Gerhard Weikum
Max Planck Institute for Informatics
Saarbr¨ucken, Germany
2009-11-03
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 1/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words? person who
gives a talk
“speaker”
device that
produces
sounds
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
flat piece
of wood
“board”
committee
panel for writing
with chalk
to enter a
transportation
vehicle
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
someone who
studies
“student”
“pupil”
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
faculty
professor
member
part
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
entity
institution
educational
institution
university
...
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Lexical Knowledge
What meanings does
a word have?
How do those meanings
relate to the meanings
of other words?
Many Applications
examples:
NLP, AI
question answering
query expansion
human consultation
entity
institution
educational
institution
university
...
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Multilinguality
the world is
multilingual
the Internet is also
increasingly
multilingual
Top 10 Languages by
Approx. No. of Speakers
Source: Ethnologue 2005
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
Multilinguality
the world is
multilingual
the Internet is also
increasingly
multilingual
Internet users by Region
Source:
Internet World Stats
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
person who
gives a talk
eng: “speaker”
jpn: “ ”話者
rus: “докладчик”
ces: “řečník”
... ......
Vision
universal index of word
meanings
large-scale semantic network
with class hierarchy
look up any word
in any language,
get a list of its meanings
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Introduction
entitypor: “entidade”
cmn: “ ”制度 institution
educational
institution
university
heb: “‫ישות‬.”
deu: “Bildungs-
einrichtung”
cym: “prifysgol”
...
Vision
universal index of word
meanings
large-scale semantic network
with class hierarchy
meanings should be connected
via semantic relations
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Lexical Knowledge
Multilinguality
Vision
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 5/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 6/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
WordNet
lexical database created at Princeton
enumerates meanings of English
words
meaning-to-meaning links
Miller, Fellbaum et al. (1990)
among most-cited papers
in computer science
(source: CiteseerX)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
WordNet
lexical database created at Princeton
enumerates meanings of English
words
meaning-to-meaning links
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
WordNet
lexical database created at Princeton
enumerates meanings of English
words
meaning-to-meaning links
hypernym hierarchy
meronymy (part of)
etc.
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
not a single, coherent resource
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Non-English Wordnets
EuroWordNet, BalkaNet, Global WordNet Association
problem: many are small, incomplete
problem: different identifiers, formats, etc.
problem: only ∼10 languages with freely available wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Other Resources
PANGLOSS Ontology: Knight & Luk (1994)
TransGraph system: Etzioni et al. (2007)
DBPedia, YAGO, OpenCyc
2 languages, around 70 000 entities
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Other Resources
PANGLOSS Ontology: Knight & Luk (1994)
TransGraph system: Etzioni et al. (2007)
DBPedia, YAGO, OpenCyc
large translation graph
limited structure
e.g. no semantic hierarchy
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
WordNet
Non-English Wordnets
Other Resources
Existing Lexical Knowledge Bases
Other Resources
PANGLOSS Ontology: Knight & Luk (1994)
TransGraph system: Etzioni et al. (2007)
DBPedia, YAGO, OpenCyc
class hierarchy not multilingual
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 10/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Strategy
use existing wordnets as backbone
add new terms, link to meaning nodes
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
eng: “course”
eng: “class”
Existing Wordnets
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Strategy
use existing wordnets as backbone
add new terms, link to meaning nodes
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
eng: “course”
eng: “class”
Existing Wordnets
−→
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Desired Output
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Input Graph
use existing wordnets as backbone
add translations to graph
mainly English, Spanish, Catalan
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
eng: “course”
eng: “class”
Input Graph G0
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Input Graph
use existing wordnets as backbone
add translations to graph
dictionaries (e.g. Wiktionary)
thesauri and ontologies
parallel corpora (word alignment)
also: predict new translations
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Input Graph G0
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Approach: Link new words to meanings of their translations
Huge Challenge: Disambiguation!
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-
lation
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Approach: Link new words to meanings of their translations
Huge Challenge: Disambiguation!
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-
lation
?
?
?
?
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-
lation
?
?
?
?
Approach
variety of features that analyse
previous graph Gi−1,
incorporate neighbourhood
information into an
edge’s feature vector
supervised learning: new edge
weights determined using
RBF-kernel SVM with posterior
probability estimation
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
academic course
part of a meal
route of travel
series of events
ita: “piatto”
eng: “course”
trans-
lation
?
?
?
?
Approach
variety of features that analyse
previous graph Gi−1,
incorporate neighbourhood
information into an
edge’s feature vector
supervised learning: new edge
weights determined using
RBF-kernel SVM with posterior
probability estimation
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
Given term t
and meaning m
Question: Should they be linked?
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
Given term t
and meaning m
Question: Should they be linked?
Look at neighbours t ∈ Γt
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
t ∈Γ(t)
sim∗(t , m)
sim∗(t , m) + dissim(t , m)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
t ∈Γ(t)
sim∗(t , m)
sim∗(t , m) + dissim(t , m)
sim∗(t ,m)= max
m ∈Γ(t )
sim(m ,m)
dissim(t ,m)=
P
m ∈Γ(t )
(1−sim(m ,m))
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
Example Feature:
fra: “suite” academic course
?
t m
fra: “suite”
spa: “trayectoria”
eng: “course”
part of a meal
academic course
route of travel
...
series of eventst'
m'm'
t ∈Γ(t)
φ1(t, t ) sim∗(t , m)
sim∗(t , m) + dissim(t , m)
sim∗(t ,m)= max
m ∈Γ(t )
φ2(t ,m )sim(m ,m)
dissim(t ,m)=
P
m ∈Γ(t )
φ2(t ,m )(1−sim(m ,m))
weighting based on:
part-of-speech
corpus frequency
...
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Other Features
cosine similarity of
translations with gloss
scores assessing polysemy by
looking at back-translations
many more
(see paper for details)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Approach
use scores as features for
RBF-kernel SVM
multiple iterations:
each graphs Gi based on the
previous Gi−1
stop when F1 score plateau
is reached on a validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Approach
use scores as features for
RBF-kernel SVM
multiple iterations:
each graphs Gi based on the
previous Gi−1
stop when F1 score plateau
is reached on a validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Strategy
Input Graph
Approach
Features
Building a Multilingual Wordnet
deu: “Reihe”
spa: “trayectoria”
academic course
part of a meal
route of travel
series of events
ita: “piatto”
fra: “suite”
eng: “course”
deu: “Kurs”
eng: “class”
Approach
use scores as features for
RBF-kernel SVM
multiple iterations:
each graphs Gi based on the
previous Gi−1
stop when F1 score plateau
is reached on a validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 17/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Results
Setup
input graph G0:
448,069 pre-existing term-meaning links
10,805,400 translation edges
1.3 million term nodes with candidates
7.7 candidate meanings per new term
2,445 term-meaning links for training (French/German)
2,901 term-meaning links as validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Results
Setup
input graph G0:
448,069 pre-existing term-meaning links
10,805,400 translation edges
1.3 million term nodes with candidates
7.7 candidate meanings per new term
2,445 term-meaning links for training (French/German)
2,901 term-meaning links as validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Results
Setup
input graph G0:
448,069 pre-existing term-meaning links
10,805,400 translation edges
1.3 million term nodes with candidates
7.7 candidate meanings per new term
2,445 term-meaning links for training (French/German)
2,901 term-meaning links as validation set
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Results
deu: “Schulgebäude”
school
(group of fish)
school
(institution)
school
(building)
deu: “Schulhaus”
deu: “Fischschwarm”
ces: “hejno”
fra: “banc”
ind: “sekolah”
jpn: “ ”学校
kor: “ ”학교
lao: “ໂຮງຮຽນ”
kat: “ ”სკოლა
Excerpt from final UWN graph G3 after 3 iterations
retaining only edges with sufficiently high weights (0.5 / 0.6)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 19/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Evaluation
Relation Precision1
Term-Meaning Links (French) 89.2% ± 3.4%
Term-Meaning Links (German) 85.9% ± 3.8%
Term-Meaning Links (Mandarin Chinese) 90.5% ± 3.3%
Generalization (Hypernymy) 87.1% ± 4.8%
Instance 89.3% ± 4.4%
Similarity 92.0% ± 3.8%
Category 93.3% ± 4.5%
Part (Meronymy) 94.4% ± 4.1%
Member (Meronymy) 92.7% ± 4.0%
Substance (Meronymy) 95.6% ± 3.5%
Opposite 94.3% ± 3.9%
1: Wilson score intervals for random samples
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 20/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Coverage
Language Term-Meaning Links Distinct Terms
Overall 1,595,763 822,212
German 132,523 67,087
French 75,544 33,423
Esperanto 71,247 33,664
Dutch 68,792 30,154
Spanish 68,445 32,143
Turkish 67,641 31,553
Czech 59,268 33,067
Russian 57,929 26,293
Portuguese 55,569 23,499
Italian 52,008 24,974
Hungarian 46,492 28,324
Thai 44,523 30,815
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 21/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Semantic Relatedness
Experimental Setup
Example: “curriculum” considered closely related to
“school”, but not to “water”
compute term relatedness using UWN
sim(t1, t2) = max
s1∈σ(t1)
max
s2∈σ(t2)
sim(s1, s2) sim(s1, s2):
combined graph-/gloss-based method
compare with assessments of relatedness made by human
judges
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Semantic Relatedness
Experimental Setup
Example: “curriculum” considered closely related to
“school”, but not to “water”
compute term relatedness using UWN
sim(t1, t2) = max
s1∈σ(t1)
max
s2∈σ(t2)
sim(s1, s2) sim(s1, s2):
combined graph-/gloss-based method
compare with assessments of relatedness made by human
judges
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Semantic Relatedness
Experimental Setup
Example: “curriculum” considered closely related to
“school”, but not to “water”
compute term relatedness using UWN
sim(t1, t2) = max
s1∈σ(t1)
max
s2∈σ(t2)
sim(s1, s2) sim(s1, s2):
combined graph-/gloss-based method
compare with assessments of relatedness made by human
judges
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Semantic Relatedness
Results for 3 German Datasets
Dataset GUR65 GUR350 ZG222
r Cov. r Cov. r Cov.
Inter-Annot. Agreement 0.81 (65) 0.69 (350) 0.49 (222)
Wikipedia (ESA*) 0.56 65 0.52 333 0.32 205
GermaNet (Lin*) 0.73 60 0.50 208 0.08 88
UWN 0.80 60 0.68 242 0.51 106
r: Pearson product-moment correlation coefficient
Cov.: absolute coverage
∗: scores by Gurevych et al. (2007)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 23/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,
classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)
for each language pair:
105 binary classification tasks, each using
200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,
classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)
for each language pair:
105 binary classification tasks, each using
200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,
classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)
for each language pair:
105 binary classification tasks, each using
200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
cross-lingual TC: train using documents in one language,
classify documents in another language
used bag-of-words/meanings TF-IDF vectors
Dataset: Reuters corpora (RCV1/2)
for each language pair:
105 binary classification tasks, each using
200 training documents, 600 test documents
SVMlight
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Setup
Output
Evaluation
Application: Semantic Relatedness
Application: Cross-Lingual Text Classification
Application: Cross-Lingual Text Classification
Language Pair Terms only Terms + Meanings
English-Italian 68.3% 76.3%
English-Russian 51.7% 71.2%
Italian-English 74.4% 78.1%
Italian-Russian 58.4% 73.2%
Russian-English 67.3% 76.8%
Russian-Italian 62.2% 71.8%
(all values are F1 scores)
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 25/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Outline
1 Existing Lexical Knowledge Bases
2 Building a Multilingual Wordnet
3 Results and Experiments
4 Summary and Future Work
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 26/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Summary
large-scale multilingual wordnet:
85% accuracy, 800,000 terms, over 1.5 million links from
terms to meanings,
built by learning edge weights using graph-based evidence
useful for monolingual and cross-lingual tasks
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Summary
large-scale multilingual wordnet:
85% accuracy, 800,000 terms, over 1.5 million links from
terms to meanings,
built by learning edge weights using graph-based evidence
useful for monolingual and cross-lingual tasks
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Summary
large-scale multilingual wordnet:
85% accuracy, 800,000 terms, over 1.5 million links from
terms to meanings,
built by learning edge weights using graph-based evidence
useful for monolingual and cross-lingual tasks
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Future Work
ongoing work: user interface incl. user contributions
techniques to automatically discover new word meanings
word sense disambiguation, query expansion using UWN
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Future Work
ongoing work: user interface incl. user contributions
techniques to automatically discover new word meanings
word sense disambiguation, query expansion using UWN
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Future Work
ongoing work: user interface incl. user contributions
techniques to automatically discover new word meanings
word sense disambiguation, query expansion using UWN
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
Introduction
Existing Lexical Knowledge Bases
Building a Multilingual Wordnet
Results and Experiments
Summary and Future Work
Summary
Future Work
Thanks!
expression of
gratitude
eng: “thank you”
yue: “ ”唔該
cmn: “ ”谢谢
jap: “ ”ありがとう
spa: “gracias”
ara: “‫را‬ً ‫شك‬.”
Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 29/29

Mais conteúdo relacionado

Semelhante a Towards a Universal Wordnet by Learning from Combined Evidence

Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Alannah Fitzgerald
 
Concordancing 1
Concordancing 1Concordancing 1
Concordancing 1Hala Fawzi
 
Multimedia In The Esol Curriculum (Conference)
Multimedia In The Esol Curriculum (Conference)Multimedia In The Esol Curriculum (Conference)
Multimedia In The Esol Curriculum (Conference)jwheetley
 
Closing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary LinguisticsClosing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary LinguisticsBaden Hughes
 
Week 8 Communication
Week 8 CommunicationWeek 8 Communication
Week 8 Communicationanneleftwich
 
Foundations of ICT In ELT
Foundations of ICT In ELTFoundations of ICT In ELT
Foundations of ICT In ELTjaedth
 
TPCK: Use of ICT to teach/improve competence in listening to English
TPCK: Use of ICT to teach/improve competence in listening to EnglishTPCK: Use of ICT to teach/improve competence in listening to English
TPCK: Use of ICT to teach/improve competence in listening to Englishpaula hodgson
 
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning ExperienceText-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning ExperienceBlackboardEMEA
 
Resources at the Interface of Openness for Academic English
Resources at the Interface of Openness for Academic EnglishResources at the Interface of Openness for Academic English
Resources at the Interface of Openness for Academic EnglishAlannah Fitzgerald
 
FLAX: Flexible Language Acquisition with Open Data-Driven Learning
FLAX: Flexible Language Acquisition with Open Data-Driven LearningFLAX: Flexible Language Acquisition with Open Data-Driven Learning
FLAX: Flexible Language Acquisition with Open Data-Driven LearningAlannah Fitzgerald
 
LRC XIII Localisation Conference - Using community feedback to improve social...
LRC XIII Localisation Conference - Using community feedback to improve social...LRC XIII Localisation Conference - Using community feedback to improve social...
LRC XIII Localisation Conference - Using community feedback to improve social...sarni
 
Using Technology In The Language Classroom
Using Technology In The Language ClassroomUsing Technology In The Language Classroom
Using Technology In The Language ClassroomErin Lowry
 
Whole School EAL Training: Graphic Organizers and Collaborative Learning (Oct...
Whole School EAL Training: Graphic Organizers and Collaborative Learning (Oct...Whole School EAL Training: Graphic Organizers and Collaborative Learning (Oct...
Whole School EAL Training: Graphic Organizers and Collaborative Learning (Oct...Kamil Trzebiatowski
 
TRANSLATOR'S TOOLS, by Dr. Shadia Y. BAnjar
TRANSLATOR'S TOOLS, by Dr. Shadia Y. BAnjarTRANSLATOR'S TOOLS, by Dr. Shadia Y. BAnjar
TRANSLATOR'S TOOLS, by Dr. Shadia Y. BAnjarDr. Shadia Banjar
 

Semelhante a Towards a Universal Wordnet by Learning from Combined Evidence (20)

Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
Bridging Informal MOOCs & Formal English for Academic Purposes Programmes wit...
 
FinalReport
FinalReportFinalReport
FinalReport
 
Concordancing 1
Concordancing 1Concordancing 1
Concordancing 1
 
Multimedia In The Esol Curriculum (Conference)
Multimedia In The Esol Curriculum (Conference)Multimedia In The Esol Curriculum (Conference)
Multimedia In The Esol Curriculum (Conference)
 
Closing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary LinguisticsClosing the Gap: Data Models for Documentary Linguistics
Closing the Gap: Data Models for Documentary Linguistics
 
Week 8 Communication
Week 8 CommunicationWeek 8 Communication
Week 8 Communication
 
Integrating Voice
Integrating VoiceIntegrating Voice
Integrating Voice
 
Tsl641
Tsl641Tsl641
Tsl641
 
Foundations of ICT In ELT
Foundations of ICT In ELTFoundations of ICT In ELT
Foundations of ICT In ELT
 
TPCK: Use of ICT to teach/improve competence in listening to English
TPCK: Use of ICT to teach/improve competence in listening to EnglishTPCK: Use of ICT to teach/improve competence in listening to English
TPCK: Use of ICT to teach/improve competence in listening to English
 
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning ExperienceText-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
 
WALS and eLanguage (Leipzig)
WALS and eLanguage (Leipzig)WALS and eLanguage (Leipzig)
WALS and eLanguage (Leipzig)
 
Resources at the Interface of Openness for Academic English
Resources at the Interface of Openness for Academic EnglishResources at the Interface of Openness for Academic English
Resources at the Interface of Openness for Academic English
 
Blended Learning Technology Access
Blended Learning Technology AccessBlended Learning Technology Access
Blended Learning Technology Access
 
FLAX: Flexible Language Acquisition with Open Data-Driven Learning
FLAX: Flexible Language Acquisition with Open Data-Driven LearningFLAX: Flexible Language Acquisition with Open Data-Driven Learning
FLAX: Flexible Language Acquisition with Open Data-Driven Learning
 
LRC XIII Localisation Conference - Using community feedback to improve social...
LRC XIII Localisation Conference - Using community feedback to improve social...LRC XIII Localisation Conference - Using community feedback to improve social...
LRC XIII Localisation Conference - Using community feedback to improve social...
 
Text-to-Speech for Beginning Readers -ATIA Chicago 09
Text-to-Speech for Beginning Readers -ATIA Chicago 09Text-to-Speech for Beginning Readers -ATIA Chicago 09
Text-to-Speech for Beginning Readers -ATIA Chicago 09
 
Using Technology In The Language Classroom
Using Technology In The Language ClassroomUsing Technology In The Language Classroom
Using Technology In The Language Classroom
 
Whole School EAL Training: Graphic Organizers and Collaborative Learning (Oct...
Whole School EAL Training: Graphic Organizers and Collaborative Learning (Oct...Whole School EAL Training: Graphic Organizers and Collaborative Learning (Oct...
Whole School EAL Training: Graphic Organizers and Collaborative Learning (Oct...
 
TRANSLATOR'S TOOLS, by Dr. Shadia Y. BAnjar
TRANSLATOR'S TOOLS, by Dr. Shadia Y. BAnjarTRANSLATOR'S TOOLS, by Dr. Shadia Y. BAnjar
TRANSLATOR'S TOOLS, by Dr. Shadia Y. BAnjar
 

Mais de Gerard de Melo

SEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link PredictionSEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link PredictionGerard de Melo
 
How to Manage your Research
How to Manage your ResearchHow to Manage your Research
How to Manage your ResearchGerard de Melo
 
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesKnowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesGerard de Melo
 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebGerard de Melo
 
From Big Data to Valuable Knowledge
From Big Data to Valuable KnowledgeFrom Big Data to Valuable Knowledge
From Big Data to Valuable KnowledgeGerard de Melo
 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningGerard de Melo
 
Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Gerard de Melo
 
From Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated DataFrom Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated DataGerard de Melo
 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataGerard de Melo
 
UWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge BaseUWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge BaseGerard de Melo
 
Multilingual Text Classification using Ontologies
Multilingual Text Classification using OntologiesMultilingual Text Classification using Ontologies
Multilingual Text Classification using OntologiesGerard de Melo
 
Extracting Sense-Disambiguated Example Sentences From Parallel Corpora
Extracting Sense-Disambiguated Example Sentences From Parallel CorporaExtracting Sense-Disambiguated Example Sentences From Parallel Corpora
Extracting Sense-Disambiguated Example Sentences From Parallel CorporaGerard de Melo
 
Not Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked DataNot Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked DataGerard de Melo
 
Good, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic IntensitiesGood, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic IntensitiesGerard de Melo
 
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged OntologyYAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged OntologyGerard de Melo
 

Mais de Gerard de Melo (15)

SEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link PredictionSEMAC Graph Node Embeddings for Link Prediction
SEMAC Graph Node Embeddings for Link Prediction
 
How to Manage your Research
How to Manage your ResearchHow to Manage your Research
How to Manage your Research
 
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood NarrativesKnowlywood: Mining Activity Knowledge from Hollywood Narratives
Knowlywood: Mining Activity Knowledge from Hollywood Narratives
 
Learning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the WebLearning Multilingual Semantics from Big Data on the Web
Learning Multilingual Semantics from Big Data on the Web
 
From Big Data to Valuable Knowledge
From Big Data to Valuable KnowledgeFrom Big Data to Valuable Knowledge
From Big Data to Valuable Knowledge
 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data Mining
 
Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)Searching the Web of Data (Tutorial)
Searching the Web of Data (Tutorial)
 
From Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated DataFrom Linked Data to Tightly Integrated Data
From Linked Data to Tightly Integrated Data
 
Information Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram DataInformation Extraction from Web-Scale N-Gram Data
Information Extraction from Web-Scale N-Gram Data
 
UWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge BaseUWN: A Large Multilingual Lexical Knowledge Base
UWN: A Large Multilingual Lexical Knowledge Base
 
Multilingual Text Classification using Ontologies
Multilingual Text Classification using OntologiesMultilingual Text Classification using Ontologies
Multilingual Text Classification using Ontologies
 
Extracting Sense-Disambiguated Example Sentences From Parallel Corpora
Extracting Sense-Disambiguated Example Sentences From Parallel CorporaExtracting Sense-Disambiguated Example Sentences From Parallel Corpora
Extracting Sense-Disambiguated Example Sentences From Parallel Corpora
 
Not Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked DataNot Quite the Same: Identity Constraints for the Web of Linked Data
Not Quite the Same: Identity Constraints for the Web of Linked Data
 
Good, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic IntensitiesGood, Great, Excellent: Global Inference of Semantic Intensities
Good, Great, Excellent: Global Inference of Semantic Intensities
 
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged OntologyYAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
YAGO-SUMO: Integrating YAGO into the Suggested Upper Merged Ontology
 

Último

Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 

Último (20)

Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 

Towards a Universal Wordnet by Learning from Combined Evidence

  • 1. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Towards a Universal Wordnet by Learning from Combined Evidence Gerard de Melo and Gerhard Weikum Max Planck Institute for Informatics Saarbr¨ucken, Germany 2009-11-03 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 1/29
  • 2. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? person who gives a talk “speaker” device that produces sounds Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 3. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? flat piece of wood “board” committee panel for writing with chalk to enter a transportation vehicle Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 4. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? someone who studies “student” “pupil” Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 5. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? faculty professor member part Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 6. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? entity institution educational institution university ... Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 7. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Lexical Knowledge What meanings does a word have? How do those meanings relate to the meanings of other words? Many Applications examples: NLP, AI question answering query expansion human consultation entity institution educational institution university ... Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 2/29
  • 8. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Multilinguality the world is multilingual the Internet is also increasingly multilingual Top 10 Languages by Approx. No. of Speakers Source: Ethnologue 2005 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29
  • 9. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction Multilinguality the world is multilingual the Internet is also increasingly multilingual Internet users by Region Source: Internet World Stats Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 3/29
  • 10. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction person who gives a talk eng: “speaker” jpn: “ ”話者 rus: “докладчик” ces: “řečník” ... ...... Vision universal index of word meanings large-scale semantic network with class hierarchy look up any word in any language, get a list of its meanings Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29
  • 11. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Introduction entitypor: “entidade” cmn: “ ”制度 institution educational institution university heb: “‫ישות‬.” deu: “Bildungs- einrichtung” cym: “prifysgol” ... Vision universal index of word meanings large-scale semantic network with class hierarchy meanings should be connected via semantic relations Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 4/29
  • 12. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Lexical Knowledge Multilinguality Vision Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 5/29
  • 13. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 6/29
  • 14. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases WordNet lexical database created at Princeton enumerates meanings of English words meaning-to-meaning links Miller, Fellbaum et al. (1990) among most-cited papers in computer science (source: CiteseerX) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
  • 15. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases WordNet lexical database created at Princeton enumerates meanings of English words meaning-to-meaning links Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
  • 16. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases WordNet lexical database created at Princeton enumerates meanings of English words meaning-to-meaning links hypernym hierarchy meronymy (part of) etc. Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 7/29
  • 17. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Non-English Wordnets EuroWordNet, BalkaNet, Global WordNet Association problem: many are small, incomplete problem: different identifiers, formats, etc. problem: only ∼10 languages with freely available wordnets Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
  • 18. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Non-English Wordnets EuroWordNet, BalkaNet, Global WordNet Association problem: many are small, incomplete problem: different identifiers, formats, etc. problem: only ∼10 languages with freely available wordnets Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
  • 19. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Non-English Wordnets EuroWordNet, BalkaNet, Global WordNet Association problem: many are small, incomplete problem: different identifiers, formats, etc. problem: only ∼10 languages with freely available wordnets not a single, coherent resource Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
  • 20. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Non-English Wordnets EuroWordNet, BalkaNet, Global WordNet Association problem: many are small, incomplete problem: different identifiers, formats, etc. problem: only ∼10 languages with freely available wordnets Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 8/29
  • 21. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Other Resources PANGLOSS Ontology: Knight & Luk (1994) TransGraph system: Etzioni et al. (2007) DBPedia, YAGO, OpenCyc 2 languages, around 70 000 entities Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
  • 22. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Other Resources PANGLOSS Ontology: Knight & Luk (1994) TransGraph system: Etzioni et al. (2007) DBPedia, YAGO, OpenCyc large translation graph limited structure e.g. no semantic hierarchy Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
  • 23. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work WordNet Non-English Wordnets Other Resources Existing Lexical Knowledge Bases Other Resources PANGLOSS Ontology: Knight & Luk (1994) TransGraph system: Etzioni et al. (2007) DBPedia, YAGO, OpenCyc class hierarchy not multilingual Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 9/29
  • 24. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 10/29
  • 25. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Strategy use existing wordnets as backbone add new terms, link to meaning nodes spa: “trayectoria” academic course part of a meal route of travel series of events eng: “course” eng: “class” Existing Wordnets Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29
  • 26. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Strategy use existing wordnets as backbone add new terms, link to meaning nodes spa: “trayectoria” academic course part of a meal route of travel series of events eng: “course” eng: “class” Existing Wordnets −→ deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Desired Output Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 11/29
  • 27. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Input Graph use existing wordnets as backbone add translations to graph mainly English, Spanish, Catalan spa: “trayectoria” academic course part of a meal route of travel series of events eng: “course” eng: “class” Input Graph G0 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29
  • 28. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Input Graph use existing wordnets as backbone add translations to graph dictionaries (e.g. Wiktionary) thesauri and ontologies parallel corpora (word alignment) also: predict new translations deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Input Graph G0 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 12/29
  • 29. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Approach: Link new words to meanings of their translations Huge Challenge: Disambiguation! academic course part of a meal route of travel series of events ita: “piatto” eng: “course” trans- lation Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29
  • 30. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Approach: Link new words to meanings of their translations Huge Challenge: Disambiguation! academic course part of a meal route of travel series of events ita: “piatto” eng: “course” trans- lation ? ? ? ? Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 13/29
  • 31. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet academic course part of a meal route of travel series of events ita: “piatto” eng: “course” trans- lation ? ? ? ? Approach variety of features that analyse previous graph Gi−1, incorporate neighbourhood information into an edge’s feature vector supervised learning: new edge weights determined using RBF-kernel SVM with posterior probability estimation Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29
  • 32. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet academic course part of a meal route of travel series of events ita: “piatto” eng: “course” trans- lation ? ? ? ? Approach variety of features that analyse previous graph Gi−1, incorporate neighbourhood information into an edge’s feature vector supervised learning: new edge weights determined using RBF-kernel SVM with posterior probability estimation Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 14/29
  • 33. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m Given term t and meaning m Question: Should they be linked? Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 34. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m fra: “suite” spa: “trayectoria” eng: “course” part of a meal academic course route of travel ... series of eventst' m'm' Given term t and meaning m Question: Should they be linked? Look at neighbours t ∈ Γt Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 35. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m fra: “suite” spa: “trayectoria” eng: “course” part of a meal academic course route of travel ... series of eventst' m'm' t ∈Γ(t) sim∗(t , m) sim∗(t , m) + dissim(t , m) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 36. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m fra: “suite” spa: “trayectoria” eng: “course” part of a meal academic course route of travel ... series of eventst' m'm' t ∈Γ(t) sim∗(t , m) sim∗(t , m) + dissim(t , m) sim∗(t ,m)= max m ∈Γ(t ) sim(m ,m) dissim(t ,m)= P m ∈Γ(t ) (1−sim(m ,m)) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 37. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet Example Feature: fra: “suite” academic course ? t m fra: “suite” spa: “trayectoria” eng: “course” part of a meal academic course route of travel ... series of eventst' m'm' t ∈Γ(t) φ1(t, t ) sim∗(t , m) sim∗(t , m) + dissim(t , m) sim∗(t ,m)= max m ∈Γ(t ) φ2(t ,m )sim(m ,m) dissim(t ,m)= P m ∈Γ(t ) φ2(t ,m )(1−sim(m ,m)) weighting based on: part-of-speech corpus frequency ... Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 15/29
  • 38. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Other Features cosine similarity of translations with gloss scores assessing polysemy by looking at back-translations many more (see paper for details) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
  • 39. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Approach use scores as features for RBF-kernel SVM multiple iterations: each graphs Gi based on the previous Gi−1 stop when F1 score plateau is reached on a validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
  • 40. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Approach use scores as features for RBF-kernel SVM multiple iterations: each graphs Gi based on the previous Gi−1 stop when F1 score plateau is reached on a validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
  • 41. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Strategy Input Graph Approach Features Building a Multilingual Wordnet deu: “Reihe” spa: “trayectoria” academic course part of a meal route of travel series of events ita: “piatto” fra: “suite” eng: “course” deu: “Kurs” eng: “class” Approach use scores as features for RBF-kernel SVM multiple iterations: each graphs Gi based on the previous Gi−1 stop when F1 score plateau is reached on a validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 16/29
  • 42. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 17/29
  • 43. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Results Setup input graph G0: 448,069 pre-existing term-meaning links 10,805,400 translation edges 1.3 million term nodes with candidates 7.7 candidate meanings per new term 2,445 term-meaning links for training (French/German) 2,901 term-meaning links as validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
  • 44. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Results Setup input graph G0: 448,069 pre-existing term-meaning links 10,805,400 translation edges 1.3 million term nodes with candidates 7.7 candidate meanings per new term 2,445 term-meaning links for training (French/German) 2,901 term-meaning links as validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
  • 45. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Results Setup input graph G0: 448,069 pre-existing term-meaning links 10,805,400 translation edges 1.3 million term nodes with candidates 7.7 candidate meanings per new term 2,445 term-meaning links for training (French/German) 2,901 term-meaning links as validation set Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 18/29
  • 46. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Results deu: “Schulgebäude” school (group of fish) school (institution) school (building) deu: “Schulhaus” deu: “Fischschwarm” ces: “hejno” fra: “banc” ind: “sekolah” jpn: “ ”学校 kor: “ ”학교 lao: “ໂຮງຮຽນ” kat: “ ”სკოლა Excerpt from final UWN graph G3 after 3 iterations retaining only edges with sufficiently high weights (0.5 / 0.6) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 19/29
  • 47. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Evaluation Relation Precision1 Term-Meaning Links (French) 89.2% ± 3.4% Term-Meaning Links (German) 85.9% ± 3.8% Term-Meaning Links (Mandarin Chinese) 90.5% ± 3.3% Generalization (Hypernymy) 87.1% ± 4.8% Instance 89.3% ± 4.4% Similarity 92.0% ± 3.8% Category 93.3% ± 4.5% Part (Meronymy) 94.4% ± 4.1% Member (Meronymy) 92.7% ± 4.0% Substance (Meronymy) 95.6% ± 3.5% Opposite 94.3% ± 3.9% 1: Wilson score intervals for random samples Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 20/29
  • 48. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Coverage Language Term-Meaning Links Distinct Terms Overall 1,595,763 822,212 German 132,523 67,087 French 75,544 33,423 Esperanto 71,247 33,664 Dutch 68,792 30,154 Spanish 68,445 32,143 Turkish 67,641 31,553 Czech 59,268 33,067 Russian 57,929 26,293 Portuguese 55,569 23,499 Italian 52,008 24,974 Hungarian 46,492 28,324 Thai 44,523 30,815 Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 21/29
  • 49. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Semantic Relatedness Experimental Setup Example: “curriculum” considered closely related to “school”, but not to “water” compute term relatedness using UWN sim(t1, t2) = max s1∈σ(t1) max s2∈σ(t2) sim(s1, s2) sim(s1, s2): combined graph-/gloss-based method compare with assessments of relatedness made by human judges Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
  • 50. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Semantic Relatedness Experimental Setup Example: “curriculum” considered closely related to “school”, but not to “water” compute term relatedness using UWN sim(t1, t2) = max s1∈σ(t1) max s2∈σ(t2) sim(s1, s2) sim(s1, s2): combined graph-/gloss-based method compare with assessments of relatedness made by human judges Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
  • 51. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Semantic Relatedness Experimental Setup Example: “curriculum” considered closely related to “school”, but not to “water” compute term relatedness using UWN sim(t1, t2) = max s1∈σ(t1) max s2∈σ(t2) sim(s1, s2) sim(s1, s2): combined graph-/gloss-based method compare with assessments of relatedness made by human judges Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 22/29
  • 52. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Semantic Relatedness Results for 3 German Datasets Dataset GUR65 GUR350 ZG222 r Cov. r Cov. r Cov. Inter-Annot. Agreement 0.81 (65) 0.69 (350) 0.49 (222) Wikipedia (ESA*) 0.56 65 0.52 333 0.32 205 GermaNet (Lin*) 0.73 60 0.50 208 0.08 88 UWN 0.80 60 0.68 242 0.51 106 r: Pearson product-moment correlation coefficient Cov.: absolute coverage ∗: scores by Gurevych et al. (2007) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 23/29
  • 53. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification cross-lingual TC: train using documents in one language, classify documents in another language used bag-of-words/meanings TF-IDF vectors Dataset: Reuters corpora (RCV1/2) for each language pair: 105 binary classification tasks, each using 200 training documents, 600 test documents SVMlight Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
  • 54. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification cross-lingual TC: train using documents in one language, classify documents in another language used bag-of-words/meanings TF-IDF vectors Dataset: Reuters corpora (RCV1/2) for each language pair: 105 binary classification tasks, each using 200 training documents, 600 test documents SVMlight Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
  • 55. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification cross-lingual TC: train using documents in one language, classify documents in another language used bag-of-words/meanings TF-IDF vectors Dataset: Reuters corpora (RCV1/2) for each language pair: 105 binary classification tasks, each using 200 training documents, 600 test documents SVMlight Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
  • 56. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification cross-lingual TC: train using documents in one language, classify documents in another language used bag-of-words/meanings TF-IDF vectors Dataset: Reuters corpora (RCV1/2) for each language pair: 105 binary classification tasks, each using 200 training documents, 600 test documents SVMlight Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 24/29
  • 57. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Setup Output Evaluation Application: Semantic Relatedness Application: Cross-Lingual Text Classification Application: Cross-Lingual Text Classification Language Pair Terms only Terms + Meanings English-Italian 68.3% 76.3% English-Russian 51.7% 71.2% Italian-English 74.4% 78.1% Italian-Russian 58.4% 73.2% Russian-English 67.3% 76.8% Russian-Italian 62.2% 71.8% (all values are F1 scores) Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 25/29
  • 58. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Outline 1 Existing Lexical Knowledge Bases 2 Building a Multilingual Wordnet 3 Results and Experiments 4 Summary and Future Work Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 26/29
  • 59. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Summary large-scale multilingual wordnet: 85% accuracy, 800,000 terms, over 1.5 million links from terms to meanings, built by learning edge weights using graph-based evidence useful for monolingual and cross-lingual tasks Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
  • 60. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Summary large-scale multilingual wordnet: 85% accuracy, 800,000 terms, over 1.5 million links from terms to meanings, built by learning edge weights using graph-based evidence useful for monolingual and cross-lingual tasks Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
  • 61. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Summary large-scale multilingual wordnet: 85% accuracy, 800,000 terms, over 1.5 million links from terms to meanings, built by learning edge weights using graph-based evidence useful for monolingual and cross-lingual tasks Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 27/29
  • 62. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Future Work ongoing work: user interface incl. user contributions techniques to automatically discover new word meanings word sense disambiguation, query expansion using UWN Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
  • 63. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Future Work ongoing work: user interface incl. user contributions techniques to automatically discover new word meanings word sense disambiguation, query expansion using UWN Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
  • 64. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Future Work ongoing work: user interface incl. user contributions techniques to automatically discover new word meanings word sense disambiguation, query expansion using UWN Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 28/29
  • 65. Introduction Existing Lexical Knowledge Bases Building a Multilingual Wordnet Results and Experiments Summary and Future Work Summary Future Work Thanks! expression of gratitude eng: “thank you” yue: “ ”唔該 cmn: “ ”谢谢 jap: “ ”ありがとう spa: “gracias” ara: “‫را‬ً ‫شك‬.” Gerard de Melo and Gerhard Weikum Towards a Universal Wordnet 29/29