O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Babelfy: Entity Linking meets Word Sense Disambiguation.

573 visualizações

Publicada em

Babelfy is a unified, multilingual, graph-based approach to Entity Linking and Word Sense Disambiguation. This presentation is an explanation of the algorithm used by Babelfy.

Publicada em: Software
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui
  • Seja a primeira pessoa a gostar disto

Babelfy: Entity Linking meets Word Sense Disambiguation.

  1. 1. Entity Linking meets Word Sense Disambiguation: a Unified Approach Paper by: Andrea Moro, Alessandro Raganato, Roberto Navigli Dipartimento di Informatica,Sapienza Universita di Roma Presentation by: Antonio Quirós Grupo LaBDA (Laboratorio de Bases de Datos Avanzadas) Universidad Carlos III de Madrid
  2. 2. Babelfy is a unified, multilingual, graph-based approach to Entity Linking and Word Sense Disambiguation based on a loose identification of candidate meanings coupled with a densest subgraph heuristic which selects high-coherence semantic interpretations. Babelfy is based on the BabelNet 3.0 multilingual semantic network and jointly performs disambiguation and entity linking.
  3. 3. Entity Linking: Discovering mentions of entities within a text and linking them in a Knowledge Base. Word Sense Disambiguation: Assigning meanings to word occurrencies within a text. Babelfy combine Entity Linking and Word Sense Disambiguation. EL & WSD
  4. 4. - Unlike WSD, Babelfy allows overlapping fragments of text ie: “Major League Baseball” It identifies and disambiguate several nominal and entity mentions: “Major League Baseball” - “Major League” - “League” - “Baseball” - Unlike EL, it links not only Named Entity Mentions (“Major League Baseball”) but also nominal mentions (“Major League”) to their corresponding meaning in the Knowledge Base.
  5. 5. Babelfy approach in three steps: One: Associate each vertex of the Semantic Network with a Semantic Signature. Two: Given an input text, extract all the linkable fragments and for each fragment list the possible meanings according to the Semantic Network. Three: Create a graph-based semantic interpretation of the whole text by linking the candidate meanings of the fragments using the Semantic Signatures created in the first step, and then, extract a dense subgraph of this representation and select the best candidate meaning for each fragment. Highly related verticesPerformed only once Either concept or named entity Novel approach !!
  6. 6. Step One: (Creating the Semantic Signatures) Assign higher weight to edges which are involved in more densely connected areas. This is accomplished by using “Directed Triangles” (Cycles of lenght 3) and weight by the number of triangles they occur in.
  7. 7. Step One: (Creating the Semantic Signatures) Football weight(v, v') := |{(v, v', v'') : (v, v'), (v', v''), (v'', v) ∈ E}|+1 Ball Basketall Field Sports Court
  8. 8. Step One: (Creating the Semantic Signatures) weight(Football, Sports) = | ( (Football, Sports) , (Football, Ball) , (Sports, Ball) ) , ( (Football, Sports) , (Football, Field) , (Sports, Field) ) | = 2 + 1 = 3 Football Ball Field Sports Court Basketall
  9. 9. Step One: (Creating the Semantic Signatures) 2 Football Ball Basketall Field Sports Court 2 2 2 2 2 3 3 3
  10. 10. Step One: (Creating the Semantic Signatures) After assigning weights to each edge, perform a Random Walk with Restart to create the Semantic Signature: a set of highly related vertices. For a fixed number of steps, run a RWR from every vertex v of the Semantic Network, keep track of the encountered vertices; eliminate weakly related vertices, keeping only those items that were hit at least η times. Finally return the remaining vertices as SemSignv : the Semantic Signature of v.
  11. 11. Step One: (Creating the Semantic Signatures) 1: input: v, the starting vertex; , the restart probability;α n, the number of steps to be executed; P, the transition probabilities; , the frequency threshold.η 2: output: semSignv, set of related vertices for v. 3: function RWR(v, , n,P, )α η 4: v' := v 5: counts := newMap < Synset, Integer > 6: while n > 0 do 7: if random() > α then 8: given the transition probabilities P(·|v') 9: of v', choose a random neighbor v'' 10: v' := v'' 11: counts[v']++ 12: else 13: restart the walk 14: v' := v 15: n := n 1− 16: for each v' in counts.keys() do 17: if counts[v'] < η then 18: remove v' from counts.keys() 19: return semSignv = counts.keys() P(v' | v) = weight(v, v') ∑ weight(v, v'') v'' V∈
  12. 12. Step Two: (Candidate Identification) Using part-of-speech tagging, identify the set F of all textual fragments which contains at least one noun and are substring of lexicalizations in BabelNet. For each f F look for candidates meanings -∈ cand(f)-: vertices containing f or, only for named entities, a superstring of f as their lexicalization. Babelfy uses a loose candidate identification based on superstring matching, instead of exact matching.
  13. 13. Step Two: (Candidate Identification) example: Word: Sports Candidates: Sports Water sports ... Skateboarding {…, Extreme Sports, …} ... Vertices containing f Vertices having a superstring of f as one of its lexicalization (Senses)
  14. 14. Step Three: (Candidate Disambiguation) Create a directed graph GI = (VI , EI ) of the Semantic Interpretations of the input text. VI : Contains all candidate meanings of all fragments VI := {(v, f) : v ∈ cand(f), f F}∈ EI : Connect two candidate meanings of different fragments if one is in the semantic signature of the other. Add an edge from (v, f) to (v', f') iff f ≠ f' and v' semSign∈ v
  15. 15. Step Three: (Candidate Disambiguation) Once created GI (The graph representation of all the possible interpretations) then apply densest subgraph heuristics. After that, the result is a sub-graph which contains those semantic interpretations that are most coherent to each other. But this sub-graph might still containt multiple interpretations for the same fragment. So, the final step is to select the most suitable candidate meaning for each fragment f given a threshold to discard semantically unrelated candidate meanings.
  16. 16. Step Three: (Candidate Disambiguation) 1: input: F, the fragments in the input text; semSign, the semantic signatures; µ, ambiguity level to be reached; cand, fragments to candidate meanings. 2: output: selected, disambiguated fragments. 3: function DISAMB(F,semSign, µ, cand) 4: VI := ;EI :=∅ ∅ 5: GI := (VI,EI) 6: for each fragment f F∈ do 7: for each candidate v cand(f)∈ do 8: VI := VI {(v, f)}∪ 9: for each ((v, f), (v', f')) VI × VI∈ do 10: if f ≠ f' and v' semSignv∈ then 11: EI := EI {((v, f), (v', f'))}∪ 12: G*I := DENSSUB(F, cand, GI, µ) 13: selected := newMap < String,Synset > 14: for each f F s.t. (v, f) V*I∈ ∃ ∈ do 15: cand*(f) := {v : (v, f) V*I }∈ 16: v* := argmaxv cand*(f)∈ score((v, f)) 17: if score((v*, f)) ≥ θ then 18: selected(f) := v* 19: return selected Function with the novel approach!!
  17. 17. Step Three: (Candidate Disambiguation) Let's see an example: “The leaf is falling from the tree on my head” - Leaf has many candidate meanings. - falling also has many candidate meanings. - tree also has many candidate meanings. And, as you might have guessed... - Head also has many candidate meanings.
  18. 18. Step Three: “The leaf is falling from the tree on my head” Music, Disc, Record, Rock ( Tree (Álbum), tree ) Thoughts, Feelings, Reason ( Mind, head ) Body, Anatomy, Falling (Accident) ( Head, head ) Guide, Group, Team, Boss ( Leader, head ) Book, Text, Paragraph, Novel ( Header, head ) cand(f) SemSignv Physics, Descend, Sky, High ( Fall, falling ) Music, Alicia Keys, Album ( Falling (Song), falling ) Pain, Hit, Push, Trauma ( Falling (Accident), falling ) Action, Hollywood, Cinema ( Falling (Movie), falling ) Nature, Fall, Earth, Oxygen, Leaf ( Tree, tree ) Leaf, Storage, Father, Son, Binary ( Tree (Data Structure), tree ) Node, Euler, Binary, Math, Path ( Tree (Graph Theory), tree ) Fall, Woods, Tree, Forest, Flora, Fall ( Leaf, leaf ) Text, Side, Right, Left, Book, Novel ( Leaf (Book), leaf ) Car, Motor, Vehicle, Japan, Tree ( Nissan Leaf, leaf ) Games, Visual Novel, Publisher ( Leaf (Japanese Co.), leaf ) Music, Pop, Dutch, Falling (Song) ( Leaf (Band), leaf ) (Generate a graph representation with all possible meanings)
  19. 19. Step Three: (Candidate Disambiguation) Following the algorithm, create an edge between two vertex if and only if they do not belong to the same frangment and one is part of the Semantic Signature of the other.
  20. 20. Step Three: “The leaf is falling from the tree on my head” Music, Disc, Record, Rock ( Tree (Álbum), tree ) Thoughts, Feelings, Reason ( Mind, head ) Body, Anatomy, Falling (Accident) ( Head, head ) Guide, Group, Team, Boss ( Leader, head ) Book, Text, Paragraph, Novel ( Header, head ) cand(f) SemSignv Physics, Descend, Sky, High ( Fall, falling ) Music, Alicia Keys, Album ( Falling (Song), falling ) Pain, Hit, Push, Trauma ( Falling (Accident), falling ) Action, Hollywood, Cinema ( Falling (Movie), falling ) Nature, Fall, Earth, Oxygen, Leaf ( Tree, tree ) Leaf, Storage, Father, Son, Binary ( Tree (Data Structure), tree ) Node, Euler, Binary, Math, Path ( Tree (Graph Theory), tree ) Fall, Woods, Tree, Forest, Flora, Fall ( Leaf, leaf ) Text, Side, Right, Left, Book, Novel ( Leaf (Book), leaf ) Car, Motor, Vehicle, Japan, Tree ( Nissan Leaf, leaf ) Games, Visual Novel, Publisher ( Leaf (Japanese Co.), leaf ) Music, Pop, Dutch, Falling (Song) ( Leaf (Band), leaf ) (Generate a graph representation with all possible meanings)
  21. 21. Step Three: Apply densest sub-graph heuristics to obtain a sub-graph which contains those semantic interpretations that are most coherent to each other DENSSUB(F, cand, GI , µ) We'll come back to it later...
  22. 22. Step Three: “The leaf is falling from the tree on my head” Body, Anatomy, Falling (Accident) ( Head, head ) Book, Text, Paragraph, Novel ( Header, head ) cand(f) SemSignv Physics, Descend, Sky, High ( Fall, falling ) Pain, Hit, Push, Trauma, Tree ( Falling (Accident), falling ) Nature, Root, Earth, Oxygen, Fall ( Tree, tree ) Leaf, Storage, Father, Son, Binary ( Tree (Data Structure), tree ) Fall, Woods, Tree, Forest, Flora, Fall ( Leaf, leaf ) Music, Pop, Dutch, Falling (Song) ( Leaf (Band), leaf ) (Generate a graph representation with all possible meanings) Let's assume this is the output of the blackbox
  23. 23. Step Three: Then we have to select the most suitable candidate meaning for each fragment f. We use a given threshold θ to discard semantically unrealted candidates. For each fragment f, we compute the score of each candidate for that fragment and keep those candidates which score is higher than θ. score((v, f)) = w(v,f) · deg((v, f)) ∑ w(v',f) · deg((v', f)) v' cand(f)∈ w(v,f) := |{f' F :∈ v' s.t. ((v, f),(v', f')) or ((v', f'),(v, f)) E∃ ∈ I }| |F| 1− deg(v) is the overall number of incoming and outgoing edges deg(v) := deg+(v)+deg (v)−
  24. 24. Step Three: In other words: We compute the score for each meaning by calculating it's normalized weighted degree. Calculate the weight for the meaning, multiply it by it's degree and divide it by the sumatory of all scores of the candidates for that fragment. The weight is calculated as the fraction of fragments the candidate meaning v connects to. In other words, count the number of fragments the vertex v connects to and divide it by the number of fragments minus one. Fragments, not vertex. In other words, if the vertex v connects to v' and v'' and they both belong to the same fragment, they count as one
  25. 25. Step Three: “The leaf is falling from the tree on my head” Body, Anatomy, Falling (Accident) ( Head, head ) Book, Text, Paragraph, Novel ( Header, head ) cand(f) SemSignv Physics, Descend, Sky, High ( Fall, falling ) Pain, Hit, Push, Trauma, Tree ( Falling (Accident), falling ) Nature, Root, Earth, Oxygen, Fall ( Tree, tree ) Leaf, Storage, Father, Son, Binary ( Tree (Data Structure), tree ) Fall, Woods, Tree, Forest, Flora, Fall ( Leaf, leaf ) Music, Pop, Dutch, Falling (Song) ( Leaf (Band), leaf ) (Generate a graph representation with all possible meanings) Let's compute the weight of (Leaf, leaf) The number of fragments “(Leafl, leaf)” is linked to, divided by the number of fragments minus one: w((Leaf, leaf)) = |{Fall, Tree}| = 2 4 – 1 3
  26. 26. Step Three: “The leaf is falling from the tree on my head” Body, Anatomy, Falling (Accident) ( Head, head ) Book, Text, Paragraph, Novel ( Header, head ) cand(f) SemSignv Physics, Descend, Sky, High ( Fall, falling ) Pain, Hit, Push, Trauma, Tree ( Falling (Accident), falling ) Nature, Root, Earth, Oxygen, Fall ( Tree, tree ) Leaf, Storage, Father, Son, Binary ( Tree (Data Structure), tree ) Fall, Woods, Tree, Forest, Flora, Fall ( Leaf, leaf ) Music, Pop, Dutch, Falling (Song) ( Leaf (Band), leaf ) (Generate a graph representation with all possible meanings) And the degree of (Leaf, leaf) is the number of incomming and outgoing edges: deg((Leaf, leaf)) = 3
  27. 27. Step Three: For our example the computed weights and degrees are in the next table:
  28. 28. Step Three: Now we can calculate the score for every candidate meaning: For each candidate multiply it's weight by it's degree (w*d) Then again for each candidate, divide w*d by the sum of all w*d for that fragment. For example (Leaf, leaf) weight((Leaf, leaf)) = 2/3 degree((Leaf, leaf)) = 4 w*d = 8/3 Sum of all others w*d for that specific fragment (leaf) = 8/3 score((Leaf, leaf)) = 1,000 8 3 = 1 8 3 8 3
  29. 29. Step Three: For our example the computed scores are in the next table:
  30. 30. Step Three: Finally, we link each fragment with the highest ranking candidate meaning v* if it's score is higher than the fixed threshold. Four our example, for a threshold of 0,7 We keep: Leaf (plant) Fall Tree Head (as body part) Which is correct.
  31. 31. Densest Sub-Graph DENSSUB(F, cand, GI , µ) Back to the blackbox !!
  32. 32. Densest Sub-Graph This is an approach to drastically reduce the level of ambiguity of the initial semantic interpretation graph. It is based on the assumption that the most suitable meanings of each text fragment will belong to the densest area of the graph. Identify the densest sub-graph of size at least k is NP-Hard. So Babelfy uses a heuristic for k-partite graphs inspired by a 2-approximation greedy algorithm for arbitrary graphs. Babelfy strategy is based on the iterative removal of low-coherence vertices.
  33. 33. Densest Sub-Graph First, start with the initial semantic interpretation graph GI (0) at step 0. For each step, identify the most ambiguous fragment fmax (The one with the maxumum number of candidate meanings). Then, discard the weakest interpretation of the current fragment fmax. This is done by determining the lexical and semantic coherence of each candidate meaning using the score formula showed before. The vertex with the minimum score is removed from the graph.
  34. 34. Densest Sub-Graph Then, in the next step, repeat the low-coherence removal step. And stop when the number of remaining candidates for each fragment is below a threshold. During each iteration, compute the average degree of the current step graph, and keep the densest subgraph of the initial semantic interpretation graph, which is the one that maximizes the average degree.
  35. 35. Densest Sub-Graph 1: input: F, the set of all fragments in the input text; cand, from fragments to candidate meanings; G(0) I , the full semantic interpretation graph; µ, ambiguity level to be reached. 2: output: G*I, a dense subgraph. 3: function DENSSUB(F, cand, G(0) I ,µ) 4: t := 0 5: G*I := G(0) I 6: while true do 7: fmax := argmaxf F∈ |{v : (v, f) V∃ ∈ (t) I}| 8: if |{v : (v, fmax) V∃ ∈ (t) I }| µ≤ then 9: break; 10: vmin:= argmin score((v, fmax)) v cand(fmax)∈ 11: V(t+1) I := V(t) I {(vmin, fmax)} 12: E(t+1) I := E(t) I V∩ (t+1) I × V(t+1) I 13: G(t+1) I := (V(t+1) I, E(t+1) I) 14: if avgdeg(G(t+1) I) > avgdeg(G*I) then 15: G*I := G(t+1) I 16: t := t+1 17: return G*I
  36. 36. Links Reference paper about Babelfy: A. Moro, A. Raganato, R. Navigli. Entity Linking meets Word Sense Disambiguation: a Unified Approach. Transactions of the Association for Computational Linguistics (TACL), 2, pp. 231-244, 2014. http://wwwusers.di.uniroma1.it/~navigli/pubs/TACL_2014_Babelfy.pdf Babelfy website http://babelfy.org/ Babelnet website http://babelnet.org/ Grupo LaBDA http://labda.inf.uc3m.es/

×