1. What knowledge bases know
(and what they don't)
Simon Razniewski
Free University of Bozen-Bolzano, Italy
Max Planck Institute for Informatics
(starting November 2017)
2. About myself
• Assistant professor at FU Bozen-Bolzano, South Tyrol, Italy (since 2014)
• PhD from FU Bozen-Bolzano (2014)
• Diplom from TU Dresden, Germany (2010)
• Research visits at UCSD (2012), AT&T Labs-Research (2013),
UQ (2015), MPII (2016)
Trilingual
The Alps’ oldest
criminal case: Ötzi
1/8th of EU apples
3. What do knowledge bases know?
What is a knowledge base?
A collection of general world knowledge
• Common sense:
• Apples are sweet or sour,
• Cats are smaller than cars
• Activities:
• “whisper” and “shout” are implementations of “talk”
• Facts:
• Saarbrücken is the capital of the Saarland
• Ötzi has blood type O
3
4. Factual KBs: An old dream of AI
• Early manual efforts (CYC, 1980s)
• Structured extraction (YAGO, DBpedia, 2000s)
• Text mining and extraction (NELL, Prospera,
Textrunner, 2000s)
• Back to the roots: Wikidata (2012)
4
5.
6. KBs are useful (1/2): QA
What is the capital of the Saarland?
Try yourself:
• When was Trump born?
• What is the nickname of Ronaldo?
• Who invented the light bulb?
Q: What is the capital of the Saarland?
7. KBs are useful (2/2): Language Generation
7
• Wikipedia in world’s most spoken language:
1/10 as many articles as English Wikipedia
• World’s fourth most spoken language: 1/100
Wikidata intended to help
resource-poor languages
8. KB construction: Current state
• More than 2300 papers with titles containing
“information extraction” in the last 4 years [Google Scholar]
• Large KBs at Google, Microsoft, Alibaba, Bloomberg, …
• Progress visible downstream
• IBM Watson beats humans in trivia game in 2011
• Entity linking systems close to human performance on
popular news corpora
• Systems pass 8th grade science tests
in the AllenAI Science challenge in 2016
• But how good are KBs themself?
8
9. How good are the KBs that we build?
Is what they know true?
(precision or correctness)
Do they know what is true?
(recall or completeness)
9
10. KBs know much of what is true
10
Google Knowledge Graph: 39 out of 48 Tarantino movies
DBpedia: 167 out of 204 Nobel laureates
in Physics
Wikidata: 2 out of 2
children of Obama
12. KBs know little of what is true
12
DBpedia: contains 6 out of 35
Dijkstra Prize winners Google Knowledge Graph:
``Points of Interest’’ – Completeness?
Wikidata knows not so well
about employees here
14. What previous work says
14
[Dong et al., KDD 2014]
There are known knowns; there are
things we know we know. We also
know there are known unknowns;
that is to say we know there are some
things we do not know. But there are
also unknown unknowns – the ones
we don't know we don't know.
KB engineers have only tried to
make KBs bigger. The point,
however is to understand what
they are trying to approximate.
15. Outline – Assessing KB recall
1. Logical foundations
2. Rule mining
3. Information extraction
4. Data presence heuristic
15
16. Outline – Assessing KB recall
1. Logical foundations
2. Rule mining
3. Information extraction
4. Data presence heuristic
16
17. Closed and open-world assumption
worksIn
Name Department
John D1
Mary D2
Bob D3
17
worksIn(John, D1)?
worksIn(Ellen, D3)?
Closed-world
assumption
Open-world
assumption
• (Relational) databases traditionally employ the closed-world assumption
• KBs necessarily operate under the open-world assumption
Yes Yes
No Maybe
18. Open-world assumption
• Q: Hamlet written by Goethe?
KB: Maybe
• Q: Schwarzenegger lives in Dudweiler?
KB: Maybe
• Q: Trump brother of Kim Jong Un?
KB: Maybe
Open-world assumption often too cautious
18
19. Teaching KBs to say “no”
• Need power to express
both maybe and no
= Partial-closed world assumption
• Approach: Completeness statements [Motro 1989]
19
Completeness statement:
worksIn is complete for employees of D1
worksIn(John, D1)?
worksIn(Ellen, D1)?
worksIn(Ellen, D3)?
Yes
No
Maybe
worksIn
Name Department
John D1
Mary D2
Bob D3
20. Completeness statements
• Assertions about the available database containing
all information on a certain topic
“worksIn is complete for employees of D1”
• Form constraints between an ideal database and
the available database
∀𝑥: 𝑤𝑜𝑟𝑘𝑠𝐼𝑛𝑖
𝑥, 𝐷1 → 𝑤𝑜𝑟𝑘𝑠𝐼𝑛 𝑎
(𝑥, 𝐷1)
• Can have expressivity ranging from simple
selections up to first-order-logic
20
21. If you have completeness statements
you can do wonderful things…
• Develop techniques for deciding whether a
conjunctive query answer is complete [VLDB 2011]
• Assign unambiguous semantics to SQL nulls
[CIKM 2012]
• Create an algebra for propagating completeness
[SIGMOD 2015]
• Ensure the soundness of queries with negation
[ICWE 2016]
• ….
21
22. Where would completeness
statements come from?
• Data creators should pass them along as metadata
• Or editors should add them in curation steps
• Developed plugin and external tool COOL-WD
(Completeness tool for Wikidata)
22
24. But…
• Requires human effort
• Editors are lazy
• Automatically created KBs do not even have editors
Remainder of this talk:
How to automatically acquire information
about KB completeness/recall
24
25. Outline – Assessing KB recall
1. Logical foundations
2. Rule mining
3. Information extraction
4. Data presence heuristic
25
26. Rule mining: Idea (1/2)
Certain patterns in data hint at completeness/incompleteness
• People with a death date but no death place are incomplete for death place
• Movies with a producer are complete for directors
• People with less than two parents are incomplete for parents
26
27. Rule mining: Idea (2/2)
• Examples can be expressed as Horn rules:
dateOfDeath(X, Y) ∧ lessThan1(X, placeOfDeath)
⇒ incomplete(X, placeOfDeath)
movie(X) ∧ producer(X, Z) ⇒ complete(X, director)
lessThan2(X, hasParent) ⇒ incomplete(X, hasParent)
Can such patterns be discovered
with association rule mining?
27
28. Rule mining: Implementation
• We extended the AMIE association rule mining system
with predicates on
• Complete/incomplete complete(X, director)
• Object counts lessThan2(X, hasParent)
• Popularity popular(X)
• Negated classes person(X) ∧ ¬ adult(X)
• Then mined rules with complete/incomplete in the head
for 20 YAGO/Wikidata relations
• Result: Can predict (in-)completeness
with 46-100% F-score
28[Galarraga et al., WSDM 2017]
29. Rule mining: Challenges
• Consensus:
human(x) Complete(x, graduatedFrom)
schoolteacher(x) Incomplete(x, graduatedFrom)
professor(x) Complete(x, graduatedFrom)
John ∈ (human, schoolteacher, professor)
Complete(John, graduatedFrom)?
• Rare properties require very large training data
• E.g., monks being complete for spouses
• Annotated ~3000 rows at 10ct/row 0 monks
29
30. Outline – Assessing KB recall
1. Logical foundations
2. Rule mining
3. Information extraction
4. Data presence heuristic
30
32. Information extraction: Implementation
• Developed a CRF-based classifier for identifying
numbers that express relation cardinalities
• Works for a variety of topics such as
• Family relations has 2 siblings
• Geopolitics is composed of seven boroughs
• Artwork consists of three episodes
• Finds the existence of 178% more children than
currently in Wikidata
32
[Mirza et al, ISWC 2016 + ACL 2017]
33. Information extraction: Challenges
• Cardinalities are frequently expressed nonnumeric:
• Nouns has twins, is a trilogy
• Indefinite articles They have a daughter
• Negation/adjectives Have no children/is childless
• Often requires reasoning
Has 3 children from Ivana and one from Marla
• Training (dist. supervision) struggles with false positives
• KBs used for training are themselves incomplete
President Garfield: Wikidata knows only of 4 out of 7 children
33
34. Vision: Make IE recall-aware
Textual information extraction usually gives precision estimates
“John was born in Malmö, Sweden.” citizenship(John, Sweden) – precision 95%
“John grew up in Malmö, Sweden.” citizenship(John, Sweden) – precision 70%
Can we also produce recall estimates?
“John has a son, Tom, and a daughter, Susan.”
child(John, Tom), child(John, Susan) – recall 90%
“John brought his children Susan and Tom to school.”
child(John, Tom), child(John, Susan) – recall 30%
34
35. Outline – Assessing KB recall
1. Logical foundations
2. Rule mining
3. Information extraction
4. Data presence heuristic
35
36. Data presence heuristic: Idea
KB: dateOfBirth(John, 17.5.1983)
Q: dateOfBirth(John, 31.12.1999)?
A: Probably not
Single-value properties:
• Having one value Property is complete
• Look at data alone suffices
36
37. What are single-value properties?
37
year
Extreme case, but…
• Multiple
citizenships
• More parents due
to adoption
• Several Twitter
accounts due to
presidentship
38. All hopes lost?
• Presence of a value is better than nothing
• Even better: For non-functional attributes,
data is still frequently added in batches
• All clubs Diego Maradona played for
• All ministers of Merkel’s new cabinet
• …
• Checking data presence is a common heuristic
among Wikidata editors
38
39. Value presence heuristic - example
[https://www.wikidata.org/wiki/Wikidata:Wikivoyage/Lists/Embassies]
40. Data presence heuristic: Challenges
4.1: Which properties to look at?
4.2: How to quantify data presence?
40
41. 4.1: Which properties to look at? (1/2)
• Complete(Wikidata for Putin)?
• There are more than 3000 properties one can assign to Putin…
• Not all properties are relevant to everyone.
(Think of goals scored or monastic order)
• Are at least all relevant properties there?
• What do you mean by relevant?
41
42. 42
State-of-the-art approach gets 61% of high-agreement triples right
• Mistakes frequency for interestingness
Our method using also linguistic similarity achieves 75%
We used crowdsourcing to annotate 350 random
(person, property1, property2)
triples with human perception of interestingness
[Razniewski et al., ADMA 2017]
4.1: Which properties to look at? (2/2)
43. 4.2: How to quantify data presence?
We have values for 46 out of 77 relevant properties for Putin
Hard to interpret
Proposal: Quantify based on comparison
with other similar entities
Ingredients:
• Similarity metric Who is similar to Trump?
• Data quantification How much data is good/bad?
• Deployed on Wikidata, but evaluation difficult
43
[Ahmeti et al., ESWC 2017]
47. Summary (1/3)
• Increasing KB quality can to some extent
be noticed downstream
• Precision easy to evaluate
• Recall largely unknown
47
48. Summary (2/3)
• Ideal is human-curated completeness information
• Created in conjunction with data (COOL-WD tool)
• Not really scalable
• Automated alternatives:
• Association rule mining
• Information extraction
• Looking at existence of data is a useful start
48
49. Summary (3/3)
• Recall-aware information extraction an open
challenge
• Concepts of relevance and relative completeness
in KBs little understood to date
• I look forward to fruitful collaborations with UdS,
MPI-SWS and MPI-INF
49
Notas do Editor
O-like letter - otto
350 man years to complete, estimate 1986
Google launched 1998 (1995 other name)
First Chinese, fourth Hindi
Marx point: see what you are actually trying to approximate
-> rule mining with constraints?
Here multiple claims, but so when do we have all?
Sl – sitelink yes or no, www yes or no, img yes or no
Coordinate yes or no
Phone yes or no
What is good/bad: Problem could be that very few are good/bad
Question: What are/how to find interesting facets?
Much work on entity and fact ranking, little on predicate ranking