Unstructured and Structured KBs
Workshop at AKBC 2020
June 25, 2020
https://uskb-workshop.github.io/
Abstract:
Several approaches have been proposed to represent world knowledge. It can be unstructured in text corpora, organised in structured collections (e.g, KBs, key-value memories), or self-structured in the parameters of a neural model. However, it is still unclear how to compare these different solutions. Most of the existing NLP benchmarks focus on tasks that humans can solve by just examining local information. In this talk I will review some knowledge-intensive tasks, that require to seek knowledge in a large body of documents even for humans in order to be solved. I will present some of the latest models proposed to solve those and which representation they use for knowledge. Moreover, I will present some ideas to investigate models' explainability in this setting.
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
How can we compare unstructured, structured and self-structured knowledge representation?
1. How can we compare unstructured,
structured and self-structured knowledge
representation?
Fabio Petroni
25 June 2020
1
Unstructured and Structured KBs @
6. 6Current NLP benchmarks
e.g., natural language inference
s1: At the other end of
Pennsylvania Avenue,
people began to line up for
a White House tour.
s2: People formed a
line at the end of
Pennsylvania Avenue.
entailment
- focus on reading comprehension
- emergence of general architectures (e.g., BERT)
- local information is sufficient to solve the task
...
7. Knowledge Intensive NLP tasks
- require to seek knowledge in a large body of documents
even for humans to be solved
Knowledge
Source
Unstructured
8. Knowledge Intensive NLP tasks
- require to seek knowledge in a large body of documents
even for humans to be solved
Structured
Knowledge
Source
Unstructured+
10. Knowledge Intensive NLP task 1 - Slot Filling
GiacomoTedesco
date of birth
place of birth
occupation
position played on team / speciality
TAC-KBP challenges
McNamee and Dang, 2009; Ji et al., 2010; Surdeanu, 2013; Surdeanu and Ji, 2014
collect information on certain relations (or slots) of
entities from large collections of natural language text
11. Knowledge Intensive NLP task 1 - Slot Filling
GiacomoTedesco plays in _____ position .
<GiacomoTedesco, position played on team>
What position does GiacomoTedesco play?
structured query
natural question
cloze-style question
several ways to approach the problem
GiacomoTedesco
date of birth
place of birth
occupation
position played on team / speciality
12. 12
Petroni et al, 2019-2020
single token answers
T-REx (Elsahar et al, 2018)
Google-RE
https://code.google.com/archive/
p/relation-extraction-corpus/
https://github.com/facebookresearch/LAMA
Slot Filling
14. 141
0
25
50
75
100
RE RE-ora BERT Wikidata
automatic KG
structured
self-
structured
unstructured solution = read all Wikipedia to predict
accuracy
human curated
structured
Petroni et al, 2019-2020
structured query
structured query
cloze-style question
15. 151
0
25
50
75
100
RE RE-ora BERT DrQA BERT-ret BERT-ora Wikidata
automatic KG
structured
self-
structured
indexed human curated
structured
unstructured solution = read all Wikipedia to predict
accuracy
Petroni et al, 2019-2020
structured query
structured query
cloze-style question
enriched
cloze-style question
natural question
16. 161
0
25
50
75
100
RE RE-ora BERT DrQA BERT-ret BERT-ora Wikidata
automatic KG
structured
self-
structured
indexed human curated
structured
unstructured solution = read all Wikipedia to predict
accuracy
soft hard
hardcoded rules for
what knowledge is
KB in-
parameters
Petroni et al, 2019-2020
structured query
structured query
cloze-style question
cloze-style question
natural question
17. 17LAMA limitations
- Single token might favour BERT
- Wikidata gets all answer right
- Explainability not assessed
GiacomoTedesco plays
in _____ position .
midfielder
provenance
answer
both open/close book
can get the answer
for the wrong reason
evidence from the
knowledge source
18. Knowledge Intensive NLP task 2 - Entity Linking 18
The most comprehensive
photographic handbook for
mushroom is authored by
Michael Jordan.
wiki/Michael_Jordan_(mycologist)
is the task of assigning a unique identity to entities mentioned in text
19. 19
dense retrieval with MIPS and bi-encoder
wikipedia
dense space
[…] authored by
[SE] Michael
Jordan [EE].
is an English
mycologist
Michael Jordan (m)
is former
basketball player
Michael Jordan
T
T
Tiger
T
Ferrari
Domus Aurea
Carriage
Moon
Jaguar
…
BI-ENCODER
MIPS
Dog
Lake Avernus
Mannheim
Diego Maradona
5.9M points
20. TAC-KBP 2010 20
He et al. (2013) 81.0
Sun et al. (2015) 83.9
Yamada et al. (2016) 85.5
Globerson et al. (2016) 87.2
Sil et al. (2018) 87.4
Nie et al. (2018) 89.1
Raiman and Raiman (2018) 90.9
Cao et al. (2018) 91.0
Gillick et al. (2019) 87.0
Wu et al. (2019) 94.5
Févry et al (2020) 94.9
dense retrieval
bi-encoder
Ji et al. 2010
21. TAC-KBP 2010 21Ji et al. 2010
He et al. (2013) 81.0
Sun et al. (2015) 83.9
Yamada et al. (2016) 85.5
Globerson et al. (2016) 87.2
Sil et al. (2018) 87.4
Nie et al. (2018) 89.1
Raiman and Raiman (2018) 90.9
Cao et al. (2018) 91.0
Gillick et al. (2019) 87.0
Wu et al. (2019) 94.5
Févry et al (2020) 94.9
uniform candidate set - whole Wikipedia
dense retrieval
bi-encoder
TAC-KBP ks
~700K entities
real-world scenario
each dataset defines a different set of candidate entities
22. TAC-KBP 2010 22
He et al. (2013) 81.0
Sun et al. (2015) 83.9
Yamada et al. (2016) 85.5
Globerson et al. (2016) 87.2
Sil et al. (2018) 87.4
Nie et al. (2018) 89.1
Raiman and Raiman (2018) 90.9
Cao et al. (2018) 91.0
Gillick et al. (2019) 87.0
Wu et al. (2019) 94.5
Févry et al (2020) 94.9
Févry et al (2020) 91.4
Wu et al. (2019) 92.8all Wikipedia
~5.9M entities
TAC-KBP ks
~700K entities
Ji et al. 2010
real-world scenario
23. Knowledge Intensive NLP task 3 - Open Domain QA
What's the highest mountain in Europe?
- TriviaQA (Joshi et al., 2017)
- HotpotQA (Yang et al., 2018)
- Natural Questions (Kwiatkowski et al., 2019)
- ELI5 (Fan et al., 2019)
- ...
24. 24
dense retrieval with MIPS and bi-encoder
wikipedia
dense space
What’s the highest
mountain in
Europe?
is the second-
highest mountain
Mont Blanc p1
Mount Elbrus is a
dormant vulcano
Mount Elbrus p1
T
T
Tiger p5
T
Ferrari p1
Domus Aurea p5
Carriage p4
Moon p2
Jaguar p4
…
BI-ENCODER
MIPS
Dog p6
Lake Avernus p2
Mannheim p1
Diego Maradona p1
21M points
25. Exact Match 25
Natural
Questions
open dev
TriviaQA
official test
Roberts et al. 2020 T5 36.6 60.5
Guu et al. 2020 REALM 40.4
Karpukhin et al. 2020 DPR 41.5
Lewis et al. 2020 RAG 44.5 68
26. Exact Match 26
Natural
Questions
TriviaQA
Roberts et al. 2020 T5 36.6 60.5
Guu et al. 2020 REALM 40.4
Karpukhin et al. 2020 DPR 41.5
Lewis et al. 2020 RAG 44.5 68
Self-Structured
Unstructured + Structured + Self-Structured
27. Limitations 27
Credit: Firstname Lastname
- different split of the data
- explainability not assessed
provenance
answer
both open/close book
can get the answer
for the wrong reason
evidence from the
knowledge source
text, lists, tables, images
What's the highest
mountain in Europe?
Mount Elbrus
28. Knowledge Intensive NLP task 4 - Fact Checking 28
FEVER
Thorne et al., 2018-2019
Lorelai Gilmore's father
is named Robert.
claim
SUPPORTS
REFUTES
NOT ENOUGH INFO
answer
provenance
3-way
classification
29. Label Accuracy 29
3-way 2-way
Zhong et al. 2020 DREAM 76.8 -
Thorne et al. 2020 RoBERTa - 92.2*
Lewis et al. 2019 BART 64.0 81.1
Lewis et al. 2020 RAG 72.5 89.5
* with oracle evidence
30. Discussion 30
FEVER score - elegant way to combine explainability and
downstream performance
only award points for accuracy if the correct evidence is found
a ton of manual annotations
FEVER is an artificial task
fact-checking in the real word is another game
31. Knowledge Intensive NLP task 5 - Factual Generations 31
GPT2 generation
Massarelli et al. (2019)
Princess Margaret, Countess of Snowdon, (Margaret Rose 21
August 1930 - 9 February 2002) was the younger daughter of
King GeorgeVI and Queen ElizabethThe Queen Mother and
the only sibling of Queen Elizabeth II.
She married Antony Armstrong-Jones, a photographer, in 1960.
It was the first marriage for the Queen and the first for Prince
Philip, Duke of Edinburgh.
After divorcing Armstrong-Jones in 1978, she married Group
Captain PeterTownsend in June that same year.
She died at the age of 71 on 9 February 2002.
Why did Princess Margaret marry Antony Armstrong-Jones?
prompt
Delayed
Beam
Search
up to 64% of generated sentences with claims are SUPPORTED
32. Jeopardy Question Generation 32
Input: The Divine Comedy
BART: This epic poem by Dante is divided into three parts: the
Inferno,The Purgatorio & the Purgatorio
RAG: This 14th Century work is divided into 3 sections:“inferno”,
“Purgatorio” & “Paradiso” Factuality Specificity
BART Better 7.1% 16.8%
RAG better 42.7% 37.4%
both good 11.7% 11.8%
both poor 17.7% 6.9%
no majority 20.8% 20.1%
human evaluation
Lewis et al. (2020)
33. 33
Fabio Petroni
1Jeopardy Question Generation
Input: Hemingway
RAG: “The Sun Also Rises” is a novel by
this author of "A Farewell to Arms"
Document 1: his works are considered classics of American
literature ... His wartime experiences formed the basis for his novel
”A Farewell to Arms” (1929) ...
Document 2: ... artists of the 1920s ”Lost Generation” expatriate
community. His debut novel, ”The Sun Also Rises”, was published
in 1926.
BO
S
”
The
Sun
Also
R
ises
”
is
a
novel
by
thisauthor
of
”
A
Fare
w
ell
to
Arm
s
”
Doc 1
Doc 2
Doc 3
Doc 4
Doc 5
33
Lewis et al. (2020)
34. Interaction Between Parametric / Non-Parametric Knowledge
Retrieved documents cue correct responses from BART:
Feed BART with input Hemingway and partial decoding “The Sun:
Completion: “The Sun also Rises” is a novel by this author of
“the Sun Also Rises”
Feed BART with input Hemingway and partial decoding “The Sun Also
Rises” is a novel by this author of “A:
Completion: “The Sun also Rises” is a novel by this author of “A
Farewell to Arms”
Lewis et al. (2020)
35. Conclusion 35
Can a model read the web and autonomously write an
encyclopedia ?
Encoding unit of text with a LM seems a really
promising way to build knowledge bases
We should use a variegated set of
knowledge intensive language tasks to
evaluate knowledge representation
The ultimate Knowledge Intensive task