Babelfy is a unified, multilingual, graph-based approach to Entity Linking and Word Sense Disambiguation. This presentation is an explanation of the algorithm used by Babelfy.
Babelfy: Entity Linking meets Word Sense Disambiguation.
1. Entity Linking meets Word Sense
Disambiguation: a Unified Approach
Paper by: Andrea Moro, Alessandro Raganato, Roberto Navigli
Dipartimento di Informatica,Sapienza Universita di Roma
Presentation by: Antonio Quirós
Grupo LaBDA (Laboratorio de Bases de Datos Avanzadas)
Universidad Carlos III de Madrid
2. Babelfy is a unified, multilingual, graph-based approach to Entity
Linking and Word Sense Disambiguation based on a loose
identification of candidate meanings coupled with a densest subgraph
heuristic which selects high-coherence semantic interpretations.
Babelfy is based on the BabelNet 3.0 multilingual semantic network
and jointly performs disambiguation and entity linking.
3. Entity Linking: Discovering mentions of entities within a text and
linking them in a Knowledge Base.
Word Sense Disambiguation: Assigning meanings to word
occurrencies within a text.
Babelfy combine Entity Linking and Word Sense Disambiguation.
EL & WSD
4. - Unlike WSD, Babelfy allows overlapping fragments of text
ie: “Major League Baseball”
It identifies and disambiguate several nominal and entity mentions:
“Major League Baseball” - “Major League” - “League” - “Baseball”
- Unlike EL, it links not only Named Entity Mentions (“Major League
Baseball”) but also nominal mentions (“Major League”) to their
corresponding meaning in the Knowledge Base.
5. Babelfy approach in three steps:
One: Associate each vertex of the Semantic Network with a Semantic
Signature.
Two: Given an input text, extract all the linkable fragments and for
each fragment list the possible meanings according to the Semantic
Network.
Three: Create a graph-based semantic interpretation of the whole text
by linking the candidate meanings of the fragments using the Semantic
Signatures created in the first step, and then, extract a dense subgraph
of this representation and select the best candidate meaning for each
fragment.
Highly related
verticesPerformed only once
Either concept or named entity
Novel approach !!
6. Step One: (Creating the Semantic Signatures)
Assign higher weight to edges which are involved in more densely
connected areas.
This is accomplished by using “Directed Triangles” (Cycles of lenght 3)
and weight by the number of triangles they occur in.
7. Step One: (Creating the Semantic Signatures)
Football
weight(v, v') := |{(v, v', v'') : (v, v'), (v', v''), (v'', v) ∈ E}|+1
Ball
Basketall
Field
Sports
Court
9. Step One: (Creating the Semantic Signatures)
2
Football
Ball
Basketall
Field
Sports
Court
2
2 2
2
2
3
3 3
10. Step One: (Creating the Semantic Signatures)
After assigning weights to each edge, perform a Random Walk with
Restart to create the Semantic Signature: a set of highly related
vertices.
For a fixed number of steps, run a RWR from every vertex v of the
Semantic Network, keep track of the encountered vertices; eliminate
weakly related vertices, keeping only those items that were hit at least
η times.
Finally return the remaining vertices as SemSignv
: the Semantic
Signature of v.
11. Step One: (Creating the Semantic Signatures)
1: input: v, the starting vertex; , the restart probability;α
n, the number of steps to be executed; P, the transition probabilities;
, the frequency threshold.η
2: output: semSignv, set of related vertices for v.
3: function RWR(v, , n,P, )α η
4: v' := v
5: counts := newMap < Synset, Integer >
6: while n > 0 do
7: if random() > α then
8: given the transition probabilities P(·|v')
9: of v', choose a random neighbor v''
10: v' := v''
11: counts[v']++
12: else
13: restart the walk
14: v' := v
15: n := n 1−
16: for each v' in counts.keys() do
17: if counts[v'] < η then
18: remove v' from counts.keys()
19: return semSignv = counts.keys()
P(v' | v) = weight(v, v')
∑ weight(v, v'')
v'' V∈
12. Step Two: (Candidate Identification)
Using part-of-speech tagging, identify the set F of all textual fragments
which contains at least one noun and are substring of lexicalizations in
BabelNet.
For each f F look for candidates meanings -∈ cand(f)-: vertices
containing f or, only for named entities, a superstring of f as their
lexicalization.
Babelfy uses a loose candidate identification based on superstring
matching, instead of exact matching.
13. Step Two: (Candidate Identification)
example:
Word:
Sports
Candidates:
Sports
Water sports
...
Skateboarding {…, Extreme Sports, …}
...
Vertices containing f
Vertices having a superstring of f as one of its
lexicalization (Senses)
14. Step Three: (Candidate Disambiguation)
Create a directed graph GI
= (VI
, EI
) of the Semantic Interpretations of
the input text.
VI
: Contains all candidate meanings of all fragments
VI
:= {(v, f) : v ∈ cand(f), f F}∈
EI
: Connect two candidate meanings of different fragments if one is in
the semantic signature of the other.
Add an edge from (v, f) to (v', f') iff f ≠ f' and v' semSign∈ v
15. Step Three: (Candidate Disambiguation)
Once created GI
(The graph representation of all the possible
interpretations) then apply densest subgraph heuristics.
After that, the result is a sub-graph which contains those semantic
interpretations that are most coherent to each other. But this sub-graph
might still containt multiple interpretations for the same fragment.
So, the final step is to select the most suitable candidate meaning for
each fragment f given a threshold to discard semantically unrelated
candidate meanings.
16. Step Three: (Candidate Disambiguation)
1: input: F, the fragments in the input text; semSign, the semantic signatures;
µ, ambiguity level to be reached; cand, fragments to candidate meanings.
2: output: selected, disambiguated fragments.
3: function DISAMB(F,semSign, µ, cand)
4: VI := ;EI :=∅ ∅
5: GI := (VI,EI)
6: for each fragment f F∈ do
7: for each candidate v cand(f)∈ do
8: VI := VI {(v, f)}∪
9: for each ((v, f), (v', f')) VI × VI∈ do
10: if f ≠ f' and v' semSignv∈ then
11: EI := EI {((v, f), (v', f'))}∪
12: G*I := DENSSUB(F, cand, GI, µ)
13: selected := newMap < String,Synset >
14: for each f F s.t. (v, f) V*I∈ ∃ ∈ do
15: cand*(f) := {v : (v, f) V*I }∈
16: v* := argmaxv cand*(f)∈
score((v, f))
17: if score((v*, f)) ≥ θ then
18: selected(f) := v*
19: return selected
Function with the novel approach!!
17. Step Three: (Candidate Disambiguation)
Let's see an example:
“The leaf is falling from the tree on my head”
- Leaf has many candidate meanings.
- falling also has many candidate meanings.
- tree also has many candidate meanings.
And, as you might have guessed...
- Head also has many candidate meanings.
18. Step Three: “The leaf is falling from the tree on my head”
Music, Disc, Record, Rock
( Tree (Álbum), tree )
Thoughts, Feelings, Reason
( Mind, head )
Body, Anatomy, Falling (Accident)
( Head, head )
Guide, Group, Team, Boss
( Leader, head )
Book, Text, Paragraph, Novel
( Header, head )
cand(f)
SemSignv
Physics, Descend, Sky, High
( Fall, falling )
Music, Alicia Keys, Album
( Falling (Song), falling )
Pain, Hit, Push, Trauma
( Falling (Accident), falling )
Action, Hollywood, Cinema
( Falling (Movie), falling )
Nature, Fall, Earth, Oxygen, Leaf
( Tree, tree )
Leaf, Storage, Father, Son, Binary
( Tree (Data Structure), tree )
Node, Euler, Binary, Math, Path
( Tree (Graph Theory), tree )
Fall, Woods, Tree, Forest, Flora, Fall
( Leaf, leaf )
Text, Side, Right, Left, Book, Novel
( Leaf (Book), leaf )
Car, Motor, Vehicle, Japan, Tree
( Nissan Leaf, leaf )
Games, Visual Novel, Publisher
( Leaf (Japanese Co.), leaf )
Music, Pop, Dutch, Falling (Song)
( Leaf (Band), leaf )
(Generate a graph representation with
all possible meanings)
19. Step Three: (Candidate Disambiguation)
Following the algorithm, create an edge between two vertex if and only
if they do not belong to the same frangment and one is part of the
Semantic Signature of the other.
20. Step Three: “The leaf is falling from the tree on my head”
Music, Disc, Record, Rock
( Tree (Álbum), tree )
Thoughts, Feelings, Reason
( Mind, head )
Body, Anatomy, Falling (Accident)
( Head, head )
Guide, Group, Team, Boss
( Leader, head )
Book, Text, Paragraph, Novel
( Header, head )
cand(f)
SemSignv
Physics, Descend, Sky, High
( Fall, falling )
Music, Alicia Keys, Album
( Falling (Song), falling )
Pain, Hit, Push, Trauma
( Falling (Accident), falling )
Action, Hollywood, Cinema
( Falling (Movie), falling )
Nature, Fall, Earth, Oxygen, Leaf
( Tree, tree )
Leaf, Storage, Father, Son, Binary
( Tree (Data Structure), tree )
Node, Euler, Binary, Math, Path
( Tree (Graph Theory), tree )
Fall, Woods, Tree, Forest, Flora, Fall
( Leaf, leaf )
Text, Side, Right, Left, Book, Novel
( Leaf (Book), leaf )
Car, Motor, Vehicle, Japan, Tree
( Nissan Leaf, leaf )
Games, Visual Novel, Publisher
( Leaf (Japanese Co.), leaf )
Music, Pop, Dutch, Falling (Song)
( Leaf (Band), leaf )
(Generate a graph representation with
all possible meanings)
21. Step Three:
Apply densest sub-graph heuristics to obtain a sub-graph which contains those
semantic interpretations that are most coherent to each other
DENSSUB(F, cand, GI
, µ)
We'll come back to it later...
22. Step Three: “The leaf is falling from the tree on my head”
Body, Anatomy, Falling (Accident)
( Head, head )
Book, Text, Paragraph, Novel
( Header, head )
cand(f)
SemSignv
Physics, Descend, Sky, High
( Fall, falling )
Pain, Hit, Push, Trauma, Tree
( Falling (Accident), falling )
Nature, Root, Earth, Oxygen, Fall
( Tree, tree )
Leaf, Storage, Father, Son, Binary
( Tree (Data Structure), tree )
Fall, Woods, Tree, Forest, Flora, Fall
( Leaf, leaf )
Music, Pop, Dutch, Falling (Song)
( Leaf (Band), leaf )
(Generate a graph representation with
all possible meanings)
Let's assume this is the
output of the blackbox
23. Step Three:
Then we have to select the most suitable candidate meaning for each fragment f.
We use a given threshold θ to discard semantically unrealted candidates.
For each fragment f, we compute the score of each candidate for that fragment and
keep those candidates which score is higher than θ.
score((v, f)) = w(v,f) · deg((v, f))
∑ w(v',f) · deg((v', f))
v' cand(f)∈
w(v,f) := |{f' F :∈ v' s.t. ((v, f),(v', f')) or ((v', f'),(v, f)) E∃ ∈ I
}|
|F| 1−
deg(v) is the overall number of incoming and outgoing edges
deg(v) := deg+(v)+deg (v)−
24. Step Three:
In other words: We compute the score for each meaning by calculating it's normalized
weighted degree.
Calculate the weight for the meaning, multiply it by it's degree and divide it by the
sumatory of all scores of the candidates for that fragment.
The weight is calculated as the fraction of fragments the candidate meaning v connects
to. In other words, count the number of fragments the vertex v connects to and divide it
by the number of fragments minus one.
Fragments, not vertex. In other words, if the
vertex v connects to v' and v'' and they both
belong to the same fragment, they count as
one
25. Step Three: “The leaf is falling from the tree on my head”
Body, Anatomy, Falling (Accident)
( Head, head )
Book, Text, Paragraph, Novel
( Header, head )
cand(f)
SemSignv
Physics, Descend, Sky, High
( Fall, falling )
Pain, Hit, Push, Trauma, Tree
( Falling (Accident), falling )
Nature, Root, Earth, Oxygen, Fall
( Tree, tree )
Leaf, Storage, Father, Son, Binary
( Tree (Data Structure), tree )
Fall, Woods, Tree, Forest, Flora, Fall
( Leaf, leaf )
Music, Pop, Dutch, Falling (Song)
( Leaf (Band), leaf )
(Generate a graph representation with
all possible meanings)
Let's compute the weight of
(Leaf, leaf)
The number of fragments
“(Leafl, leaf)” is linked to, divided by
the number of fragments minus one:
w((Leaf, leaf)) = |{Fall, Tree}| = 2
4 – 1 3
26. Step Three: “The leaf is falling from the tree on my head”
Body, Anatomy, Falling (Accident)
( Head, head )
Book, Text, Paragraph, Novel
( Header, head )
cand(f)
SemSignv
Physics, Descend, Sky, High
( Fall, falling )
Pain, Hit, Push, Trauma, Tree
( Falling (Accident), falling )
Nature, Root, Earth, Oxygen, Fall
( Tree, tree )
Leaf, Storage, Father, Son, Binary
( Tree (Data Structure), tree )
Fall, Woods, Tree, Forest, Flora, Fall
( Leaf, leaf )
Music, Pop, Dutch, Falling (Song)
( Leaf (Band), leaf )
(Generate a graph representation with
all possible meanings)
And the degree of (Leaf, leaf) is
the number of incomming and
outgoing edges:
deg((Leaf, leaf)) = 3
27. Step Three:
For our example the computed weights and degrees are in the next table:
28. Step Three:
Now we can calculate the score for every candidate meaning:
For each candidate multiply it's weight by it's degree (w*d)
Then again for each candidate, divide w*d by the sum of all w*d for that fragment.
For example (Leaf, leaf)
weight((Leaf, leaf)) = 2/3
degree((Leaf, leaf)) = 4
w*d = 8/3
Sum of all others w*d for that
specific fragment (leaf) = 8/3
score((Leaf, leaf)) = 1,000
8
3 = 1
8
3
8
3
30. Step Three:
Finally, we link each fragment with the highest ranking candidate meaning v* if it's score
is higher than the fixed threshold.
Four our example, for a threshold of 0,7
We keep:
Leaf (plant)
Fall
Tree
Head (as body part)
Which is correct.
32. Densest Sub-Graph
This is an approach to drastically reduce the level of ambiguity of the initial semantic
interpretation graph.
It is based on the assumption that the most suitable meanings of each text fragment will
belong to the densest area of the graph.
Identify the densest sub-graph of size at least k is NP-Hard. So Babelfy uses a heuristic
for k-partite graphs inspired by a 2-approximation greedy algorithm for arbitrary graphs.
Babelfy strategy is based on the iterative removal of low-coherence vertices.
33. Densest Sub-Graph
First, start with the initial semantic interpretation graph GI
(0)
at step 0.
For each step, identify the most ambiguous fragment fmax (The one with the maxumum
number of candidate meanings).
Then, discard the weakest interpretation of the current fragment fmax. This is done by
determining the lexical and semantic coherence of each candidate meaning using the
score formula showed before.
The vertex with the minimum score is removed from the graph.
34. Densest Sub-Graph
Then, in the next step, repeat the low-coherence removal step. And stop when the
number of remaining candidates for each fragment is below a threshold.
During each iteration, compute the average degree of the current step graph, and keep
the densest subgraph of the initial semantic interpretation graph, which is the one that
maximizes the average degree.
35. Densest Sub-Graph
1: input: F, the set of all fragments in the input text;
cand, from fragments to candidate meanings;
G(0)
I , the full semantic interpretation graph; µ, ambiguity level to be reached.
2: output: G*I, a dense subgraph.
3: function DENSSUB(F, cand, G(0)
I ,µ)
4: t := 0
5: G*I := G(0)
I
6: while true do
7: fmax := argmaxf F∈ |{v : (v, f) V∃ ∈ (t)
I}|
8: if |{v : (v, fmax) V∃ ∈ (t)
I }| µ≤ then
9: break;
10: vmin:= argmin score((v, fmax))
v cand(fmax)∈
11: V(t+1)
I := V(t)
I {(vmin, fmax)}
12: E(t+1)
I := E(t)
I V∩ (t+1)
I × V(t+1)
I
13: G(t+1)
I := (V(t+1)
I, E(t+1)
I)
14: if avgdeg(G(t+1)
I) > avgdeg(G*I) then
15: G*I := G(t+1)
I
16: t := t+1
17: return G*I
36. Links
Reference paper about Babelfy:
A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense
Disambiguation: a Unified Approach. Transactions of the Association for
Computational Linguistics (TACL), 2, pp. 231-244, 2014.
http://wwwusers.di.uniroma1.it/~navigli/pubs/TACL_2014_Babelfy.pdf
Babelfy website
http://babelfy.org/
Babelnet website
http://babelnet.org/
Grupo LaBDA
http://labda.inf.uc3m.es/