Broad-coverage knowledge bases (KBs) such as Wikipedia, Freebase, Microsoft's Satori and Google's Knowledge Graph contain structured data describing real-world entities.
These data sources have become increasingly important for a wide range of intelligent systems: from information retrieval and question answering, to Facebook's Graph Search, IBM's Watson, and more.
Previous work on learning to populate knowledge bases from text has, for the most part, made the simplifying assumption that facts remain constant over time.
But this is inaccurate - we live in a rapidly changing world.
Knowledge should not be viewed as a static snapshot, but instead a rapidly evolving set of facts that must change as the world changes.
In this paper we demonstrate the feasibility of accurately identifying entity-transition-events, from real-time news and social media text streams, that drive changes to a knowledge base.
We use Wikipedia's edit history as distant supervision to learn event extractors, and evaluate the extractors based on their ability to predict online updates.
Our weakly supervised event extractors are able to predict 10 KB revisions per month at 0.8 precision. By lowering our confidence threshold, we can suggest 34.3 correct edits per month at 0.4 precision.
64% of predicted edits were detected before they were added to Wikipedia. The average lead time of our forecasted knowledge revisions over Wikipedia's editors is 40 days, demonstrating the utility of our method for suggesting edits that can be quickly verified and added to the knowledge graph.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Learning event extractors from knowledge base revisions
1. Learning to Extract Events from
Knowledge Base Revisions
Alexander Konovalov
Ohio State University
konovalov.2@osu.edu
Benjamin Strauss
Ohio State University
strauss.105@osu.edu
Alan Ritter
Ohio State University
ritter.1492@osu.edu
Brendan O'Connor
University of Amherst
brenocon@cs.umass.edu
2. Knowledge Bases
Some notable examples:
● Google Knowledge Graph
● Microsoft Satori KB
● Wolfram Alpha
● Wikidata
● DBPedia
● Wikipedia Infoboxes
● Freebase
3. Knowledge Bases
Some notable examples:
● Google Knowledge Graph
● Microsoft Satori KB
● Wolfram Alpha
● Wikidata
● DBPedia
● Wikipedia Infoboxes
● Freebase
4. Extracting KBs from
Text
Prior work assumes static text
corpora and knowledge bases:
● NELL [AAAI 2015]
● Mintz et al. [ACL 2009]
● DeepDive [VLDS 2012]
● Knowledge Vault [KDD 2014]
11. 50% of deaths updated within
a couple of days
Manual Editing…
12. 50% of deaths updated within
a couple of days
... but for scientists take 31 days to
reach 50% coverage!*
Manual Editing…
can not Scale Up!
* https://research.googleblog.com/2013/05/distributing-edit-history-of-wikipedia.html
Goal: extract KB revisions from text as
soon as public knowledge is available
20. In a nutshell: formally.
● An event is a quadruple of a
predicate, two entities, and a
time.
21. In a nutshell: formally.
● Features are collected from tweets written
before time t.
Prediction time
Feature window
22. In a nutshell: formally.
● A bag of Mintz et al. like features.
State arg1 wins in arg2
cnn projects arg1 wins IN arg2
0.333
0.115
CurrentTeam arg1 to arg2
arg1 joins arg2
arg1 signs...with arg2
1.291
0.513
0.419
DeathPlace arg1 found dead in arg2
arg1 found JJ IN arg2
arg1 killed in arg2
arg1 memorial...in arg2
0.269
0.260
0.195
0.125
23. In a nutshell: formally.
● L₂ regularized log-linear model.
● Optimize conditional likelihood with
respect to θ.
24. Challenge: Non-Semantic Edits
| spouse = Soumaya Domit (1967–1999; her
death); 6 children
| spouse = Soumaya Domit
<small>(1967-1999; her death)</small>
26. Revision as of 17:27,
7 April 2015
➕ | death_date = {{death date and age
|2015|02|03|1978|05|29}}}
➕ | death_place = [[Valhalla, New
York|Valhalla, NY]], U.S.
Challenge: Edits After Event
33. Experiments: datasets.
● Intuition: Samples near the time of a
revision are likely to mention an event
that causes the change.
Annotated
Samples
Matched
Samples
Match NE
Aligned Samples
Use
intuition!
Unaligned Samples
34. Experiments: datasets.
● Intuition: Samples near the time…
Use temporal alignment (-10...+3 days).
Annotated
Samples
Matched
Samples
Match NE
Aligned Samples
Unaligned Samples
temporal
alignment
35. Experiments: baseline.
● Baseline: a traditional relation
extraction system that uses
no temporal information.
Annotated
Samples
Matched
Samples
Match NE
Aligned Samples
Unaligned Samples
temporal
alignment
41. Experiments: evaluation.
● Inter-Annotator Agreement (Fleiss-Kappa):
○ 0.64 on Twitter
○ 0.30 on Gigaword
Context matters:
Much of Wednesday was about pomp
and circumstance for Rand and Ron
Paul. They were sworn in, [...] But the
newly minted senator from Kentucky
also tended to some business [...]
46. Conclusions
● Revisions - a source for distant
supervision for event extraction.
● Use temporal and surface matching to
identify reliable infobox edits
corresponding to real world events.
47. Conclusions
● Revisions - a source for distant
supervision for event extraction.
● Use temporal and surface matching to
identify reliable infobox edits
corresponding to real world events.
● Automate KB updates by learning
predictors from past KB updates.
48. Conclusions
● We generate on average 34.3 edits per
month with high precision for 6
attributes.
● Often beat human KB contributors in
terms of recall (64%⇑ before) and lead
time (40 days).
50. Future Work
● Align edits to the actual events using
latent variable alignment model.
With an
offset
51. Q & A
Learning to Extract Events from Knowledge Base Revisions
You can reach me via konovalov.2@osu.edu.
Code and data will be available soon at github.com/alexknvl/dsup.