1. Comparing three models of
scientific discourse annotation
for enhanced knowledge
extraction
Anita de Waard, Maria Liakata,
Paul Thompson, Raheel Nawaz and Sophia Ananiadou
4. Accessing the knowledge in papers
- Papers are ‘Stories that persuade with data’
- So how is this persuasion done? Three ways of annotating
key rhetorical moves:
- Discourse segment types (de Waard, Elsevier/Utrecht)
- Zones of conceptualisation using Core Scientific
Concepts (Liakata, Aberystwyth/EBI)
- Metaknowledge annotation of BioEvents (Thompson,
Ananiadou et al, NACTeM/Manchester)
5. Accessing the knowledge in papers
- Papers are ‘Stories that persuade with data’
- So how is this persuasion done? Three ways of annotating
key rhetorical moves:
- Discourse segment types (de Waard, Elsevier/Utrecht)
- Zones of conceptualisation using Core Scientific
Concepts (Liakata, Aberystwyth/EBI)
- Metaknowledge annotation of BioEvents (Thompson,
Ananiadou et al, NACTeM/Manchester)
- Comparison of 3 methods on full-text paper
6. Accessing the knowledge in papers
- Papers are ‘Stories that persuade with data’
- So how is this persuasion done? Three ways of annotating
key rhetorical moves:
- Discourse segment types (de Waard, Elsevier/Utrecht)
- Zones of conceptualisation using Core Scientific
Concepts (Liakata, Aberystwyth/EBI)
- Metaknowledge annotation of BioEvents (Thompson,
Ananiadou et al, NACTeM/Manchester)
- Comparison of 3 methods on full-text paper
- What are overlaps/differences? Can we combine?
7. “Scientific articles are stories...
The Story of Goldilocks and Story Grammar Paper The AXH Domain of Ataxin-1 Mediates
the Three Bears Neurodegeneration through Its Interaction with Gfi-1/
Senseless Proteins
Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully
understood, but some general principles have emerged.
a little girl named Goldilocks Characters Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract
She went for a walk in the Location Experimental studied and compared in vivo effects and interactions to those o
forest. Pretty soon, she came setup the human protein
upon a house.
She knocked and, when no one Goal Theme Research Gain insight into how Atx-1's function contributes to SCA1
answered, goal pathogenesis. How these interactions might contribute to the
disease process and how they might cause toxicity in only a
subset of neurons in SCA1 is not fully understood.
she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression
At the table in the kitchen, there Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When
were three bowls of porridge. Overexpressed in Files
Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain
She tasted the porridge from Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system
the first bowl. (Brand and Perrimon, 1993) and compared its effects to those o
hAtx-1.
This porridge is too hot! she Outcome Results Although at 2 days after eclosion, overexpression of either Atx-1
exclaimed. does not show obvious morphological changes in the
photoreceptor cells
So, she tasted the porridge Activity Data (data not shown),
from the second bowl.
This porridge is too cold, she Outcome Results both genotypes show many large holes and loss of cell integrity
said at 28 days
So, she tasted the last bowl of Activity Data (Figures 1B-1D).
porridge. 3
Ahhh, this porridge is just right, Outcome Results Overexpression of dAtx-1 using the GMR-GAL4 driver also
she said happily and induces eye abnormalities. The external structures of the eyes
8. “Scientific articles are stories...
The Story of Goldilocks and Story Grammar Paper The AXH Domain of Ataxin-1 Mediates
the Three Bears Neurodegeneration through Its Interaction with Gfi-1/
Senseless Proteins
Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully
understood, but some general principles have emerged.
a little girl named Goldilocks Characters Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract
She went for a walk in the Location Experimental studied and compared in vivo effects and interactions to those o
forest. Pretty soon, she came setup the human protein
upon a house.
She knocked and, when no one Goal Theme Research Gain insight into how Atx-1's function contributes to SCA1
answered, goal pathogenesis. How these interactions might contribute to the
disease process and how they might cause toxicity in only a
subset of neurons in SCA1 is not fully understood.
she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression
At the table in the kitchen, there Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When
were three bowls of porridge. Overexpressed in Files
Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain
She tasted the porridge from Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system
the first bowl. (Brand and Perrimon, 1993) and compared its effects to those o
hAtx-1.
This porridge is too hot! she Outcome Results Although at 2 days after eclosion, overexpression of either Atx-1
exclaimed. does not show obvious morphological changes in the
photoreceptor cells
So, she tasted the porridge Activity Data (data not shown),
from the second bowl.
This porridge is too cold, she Outcome Results both genotypes show many large holes and loss of cell integrity
said at 28 days
So, she tasted the last bowl of Activity Data (Figures 1B-1D).
porridge. 3
Ahhh, this porridge is just right, Outcome Results Overexpression of dAtx-1 using the GMR-GAL4 driver also
she said happily and induces eye abnormalities. The external structures of the eyes
10. “...that persuade (reviewers/readers)…”
Aristotle Quintilian Scientific Paper
prooimion Introduction The introduction of a speech, where one announces the Introduction:
/ exordium subject and purpose of the discourse, and where one usually positioning
employs the persuasive appeal to ethos in order to
establish credibility with the audience.
prothesis Statement of The speaker here provides a narrative account of what has Introduction: research
Facts/narratio happened and generally explains the nature of the case. question
Summary/ The propositio provides a brief summary of what one is about Summary of contents
propostitio to speak on, or concisely puts forth the charges or accusation.
pistis Proof/ The main body of the speech where one offers logical Results
confirmatio arguments as proof. The appeal to logos is emphasized
here.
Refutation/ As the name connotes, this section of a speech was devoted to Related Work
refutatio answering the counterarguments of one's opponent.
epilogos peroratio Following the refutatio and concluding the classical oration, the Discussion: summary,
peroratio conventionally employed appeals through implications.
4
pathos, and often included a summing up.
11. “...that persuade (reviewers/readers)…”
Aristotle Quintilian Scientific Paper
prooimion Introduction The introduction of a speech, where one announces the Introduction:
/ exordium subject and purpose of the discourse, and where one usually positioning
employs the persuasive appeal to ethos in order to
establish credibility with the audience.
prothesis Statement of The speaker here provides a narrative account of what has Introduction: research
Facts/narratio happened and generally explains the nature of the case. question
Summary/ The propositio provides a brief summary of what one is about Summary of contents
propostitio to speak on, or concisely puts forth the charges or accusation.
pistis Proof/ The main body of the speech where one offers logical Results
confirmatio arguments as proof. The appeal to logos is emphasized
here.
Refutation/ As the name connotes, this section of a speech was devoted to Related Work
refutatio answering the counterarguments of one's opponent.
epilogos peroratio Following the refutatio and concluding the classical oration, the Discussion: summary,
peroratio conventionally employed appeals through implications.
4
pathos, and often included a summing up.
13. Annotate: fine-grained models of argumentation
Method 1: Discourse Segment Types
Both seminomas and the EC component of
nonseminomas share features with ES cells. To
exclude that the detection of miR-371-3 merely
reflects its expression pattern in ES cells, we tested
by RPA miR-302a-d, another ES cells-specific
miRNA cluster (Suh et al, 2004). In many of the
miR-371-3 expressing seminomas and
nonseminomas, miR-302a-d was undetectable
(Figs S7 and S8), suggesting that miR-371-3
expression is a selective event during
tumorigenesis.
14. Annotate: fine-grained models of argumentation
Method 1: Discourse Segment Types
Both seminomas and the EC component of
Both seminomas and the EC component of
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m
nonseminomas, miR-371-3 expressing seminomas
In many of the miR-302a-d was undetectable
(Figs nonseminomas, miR-302a-d that undetectable
and S7 and S8), suggesting was miR-371-3
e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
xpress on is
tumorigenesis.
suggesting that
miR-371-3 expression is a selective event during
tumorigenesis.
15. Annotate: fine-grained models of argumentation
Method 1: Discourse Segment Types
Both seminomas and the EC component of
Both seminomas and the EC component of Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m
nonseminomas, miR-371-3 expressing seminomas
In many of the miR-302a-d was undetectable
(Figs nonseminomas, miR-302a-d that undetectable
and S7 and S8), suggesting was miR-371-3
e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
xpress on is
tumorigenesis.
suggesting that
miR-371-3 expression is a selective event during
tumorigenesis.
16. Annotate: fine-grained models of argumentation
Method 1: Discourse Segment Types
Both seminomas and the EC component of
Both seminomas and the EC component of Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m
nonseminomas, miR-371-3 expressing seminomas
In many of the miR-302a-d was undetectable
(Figs nonseminomas, miR-302a-d that undetectable
and S7 and S8), suggesting was miR-371-3
e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
xpress on is
tumorigenesis.
suggesting that
miR-371-3 expression is a selective event during
tumorigenesis.
17. Annotate: fine-grained models of argumentation
Method 1: Discourse Segment Types
Both seminomas and the EC component of
Both seminomas and the EC component of Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m Method
nonseminomas, miR-371-3 expressing seminomas
In many of the miR-302a-d was undetectable
(Figs nonseminomas, miR-302a-d that undetectable
and S7 and S8), suggesting was miR-371-3
e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
xpress on is
tumorigenesis.
suggesting that
miR-371-3 expression is a selective event during
tumorigenesis.
18. Annotate: fine-grained models of argumentation
Method 1: Discourse Segment Types
Both seminomas and the EC component of
Both seminomas and the EC component of Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m Method
nonseminomas, miR-371-3 expressing seminomas
In many of the miR-302a-d was undetectable
(Figs nonseminomas, miR-302a-d that undetectable
and S7 and S8), suggesting was miR-371-3 Result
e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
xpress on is
tumorigenesis.
suggesting that
miR-371-3 expression is a selective event during
tumorigenesis.
19. Annotate: fine-grained models of argumentation
Method 1: Discourse Segment Types
Both seminomas and the EC component of
Both seminomas and the EC component of Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m Method
nonseminomas, miR-371-3 expressing seminomas
In many of the miR-302a-d was undetectable
(Figs nonseminomas, miR-302a-d that undetectable
and S7 and S8), suggesting was miR-371-3 Result
e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
xpress on is
tumorigenesis.
suggesting that
miR-371-3 expression is a selective event during
Implication
tumorigenesis.
20. Annotate: fine-grained models of argumentation
Method 1: Discourse Segment Types
Both seminomas and the EC component of
Both seminomas and the EC component of Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the Goal
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m Method
nonseminomas, miR-371-3 expressing seminomas
In many of the miR-302a-d was undetectable
(Figs nonseminomas, miR-302a-d that undetectable
and S7 and S8), suggesting was miR-371-3 Result
e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
xpress on is
tumorigenesis.
suggesting that
miR-371-3 expression is a selective event during
Implication
tumorigenesis.
21. Annotate: fine-grained models of argumentation
Method 1: Discourse Segment Types
Both seminomas and the EC component of
Both seminomas and the EC component of Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the Goal
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m Method
nonseminomas, miR-371-3 expressing seminomas
In many of the miR-302a-d was undetectable
(Figs nonseminomas, miR-302a-d that undetectable
and S7 and S8), suggesting was miR-371-3 Result
e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
xpress on is
tumorigenesis.
suggesting that Reg-Implication
miR-371-3 expression is a selective event during
Implication
tumorigenesis.
22. Annotate: fine-grained models of argumentation
Method 1: Discourse Segment Types
Conceptual
Both seminomas and the EC component of
Both seminomas and the EC component of knowledge
Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the Goal
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m Method
nonseminomas, miR-371-3 expressing seminomas
In many of the miR-302a-d was undetectable
(Figs nonseminomas, miR-302a-d that undetectable
and S7 and S8), suggesting was miR-371-3 Result
e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
xpress on is
tumorigenesis.
suggesting that Reg-Implication
miR-371-3 expression is a selective event during
Implication
tumorigenesis.
23. Annotate: fine-grained models of argumentation
Method 1: Discourse Segment Types
Conceptual
Both seminomas and the EC component of
Both seminomas and the EC component of knowledge
Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the Goal
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m Method
Experimental
nonseminomas, miR-371-3 expressing seminomas
In many of the miR-302a-d was undetectable
Evidence
(Figs nonseminomas, miR-302a-d that undetectable
and S7 and S8), suggesting was miR-371-3 Result
e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
xpress on is
tumorigenesis.
suggesting that Reg-Implication
miR-371-3 expression is a selective event during
Implication
tumorigenesis.
25. Segment types point to realms of discourse:
Fact Problem
(1) Both seminomas and the (2) b. the detection of
EC component of miR-371-3 merely reflects
nonseminomas share its expression pattern in ES
features with ES cells. cells,
26. Segment types point to realms of discourse:
Fact Problem
(1) Both seminomas and the (2) b. the detection of
EC component of miR-371-3 merely reflects
nonseminomas share its expression pattern in ES
features with ES cells. cells,
Goal
(2) a. To exclude that
27. Segment types point to realms of discourse:
Fact Problem
(1) Both seminomas and the (2) b. the detection of
EC component of miR-371-3 merely reflects
nonseminomas share its expression pattern in ES
features with ES cells. cells,
Goal
(2) a. To exclude that
Method Result
(2) c. we tested by RPA (3) a. In many of the miR-371-3
miR-302a-d, another ES cells- expressing seminomas and
specific miRNA cluster (Suh et al, nonseminomas, miR-302a-d was
2004). undetectable (Figs S7 and S8),
28. Segment types point to realms of discourse:
Fact Problem
(1) Both seminomas and the (2) b. the detection of
EC component of miR-371-3 merely reflects
nonseminomas share its expression pattern in ES
features with ES cells. cells,
Goal Regulatory-Implication
(2) a. To exclude that (3) b. suggesting that
Method Result
(2) c. we tested by RPA (3) a. In many of the miR-371-3
miR-302a-d, another ES cells- expressing seminomas and
specific miRNA cluster (Suh et al, nonseminomas, miR-302a-d was
2004). undetectable (Figs S7 and S8),
29. Segment types point to realms of discourse:
Fact Problem Implication
(1) Both seminomas and the (2) b. the detection of (3) c. miR-371-3
EC component of miR-371-3 merely reflects expression is a selective
nonseminomas share its expression pattern in ES event during
features with ES cells. cells, tumorigenesis.
Goal Regulatory-Implication
(2) a. To exclude that (3) b. suggesting that
Method Result
(2) c. we tested by RPA (3) a. In many of the miR-371-3
miR-302a-d, another ES cells- expressing seminomas and
specific miRNA cluster (Suh et al, nonseminomas, miR-302a-d was
2004). undetectable (Figs S7 and S8),
30. Segment types point to realms of discourse:
Concepts, models, ‘facts’: Present tense
Fact Problem Implication
(1) Both seminomas and the (2) b. the detection of (3) c. miR-371-3
EC component of miR-371-3 merely reflects expression is a selective
nonseminomas share its expression pattern in ES event during
features with ES cells. cells, tumorigenesis.
Goal Regulatory-Implication
(2) a. To exclude that (3) b. suggesting that
Method Result
(2) c. we tested by RPA (3) a. In many of the miR-371-3
miR-302a-d, another ES cells- expressing seminomas and
specific miRNA cluster (Suh et al, nonseminomas, miR-302a-d was
2004). undetectable (Figs S7 and S8),
31. Segment types point to realms of discourse:
Concepts, models, ‘facts’: Present tense
Fact Problem Implication
(1) Both seminomas and the (2) b. the detection of (3) c. miR-371-3
EC component of miR-371-3 merely reflects expression is a selective
nonseminomas share its expression pattern in ES event during
features with ES cells. cells, tumorigenesis.
Goal Regulatory-Implication
(2) a. To exclude that (3) b. suggesting that
Method Result
(2) c. we tested by RPA (3) a. In many of the miR-371-3
miR-302a-d, another ES cells- expressing seminomas and
specific miRNA cluster (Suh et al, nonseminomas, miR-302a-d was
2004). undetectable (Figs S7 and S8),
Experiment: Past tense
32. Segment types point to realms of discourse:
Concepts, models, ‘facts’: Present tense
Fact Problem Implication
(1) Both seminomas and the (2) b. the detection of (3) c. miR-371-3
EC component of miR-371-3 merely reflects expression is a selective
nonseminomas share its expression pattern in ES event during
features with ES cells. cells, tumorigenesis.
Goal Regulatory-Implication
(2) a. To exclude that Transitions: present tense (3) b. suggesting that
Method Result
(2) c. we tested by RPA (3) a. In many of the miR-371-3
miR-302a-d, another ES cells- expressing seminomas and
specific miRNA cluster (Suh et al, nonseminomas, miR-302a-d was
2004). undetectable (Figs S7 and S8),
Experiment: Past tense
33. Method 2: Annotate with Core-Scientific
Concepts (CoreSC) Annotation Scheme
s
34. Method 2: Annotate with Core-Scientific
Concepts (CoreSC) Annotation Scheme
A three layer, ontology motivated annotation scheme for sentence annotation,
which views a paper as the humanly readable representation of a
scientific investigation [Liakata et al 2010], with 45-page guidelines
[Liakata & Soldatova 2008]
1st layer: Core Scientific Concepts (CoreSCs):
Hypothesis, Motivation, Goal, Object, Background, Method, Experiment, Model,
Observation, Result, Conclusion
2nd layer: Properties of CoreSCs. Novelty (New/Old) and Advantage
(advantage/disadvantage) s
3rd layer: Concept Identifiers: linking sentences together which refer to
the same instance of a CoreSC
36. CoreSC Annotation Scheme (layers 1&2)
Hypothesis A statement not yet confirmed rather than a fact
Motivation The reasons behind an investigation
Background Background knowledge & previous work
Goal A target state of the investigation
Object-New A main product or theme of the investigation
Object-New-Advantage Advantage of an object
Object-New-Disadvantage Disadvantage of an object
Method-New Means by which the goals of the investigation are achieved
Method-New-Advantage Advantage of a Method
Method-New-Disadvantage Disadvantage of a Method
Method-Old A method pertaining to previous work
Method-Old-Disadvantage Disadvantage of method in previous work
Method-Old-Advantage Advantage of method in previous work
Experiment An experimental method
Model Statement about theoretical model, method or framework
Observation Data/phenomena recorded in an investigation
Result Factual statements about the results of an investigation
Conclusion Statements inferred from observations and results
38. Method 3: Bio-event Annotation
- A
dynamic
biological
rela0onship
involving
one
or
more
par0cipants
39. Method 3: Bio-event Annotation
- A
dynamic
biological
rela0onship
involving
one
or
more
par0cipants
We
found
that
Y
ac.vates
the
expression
of
X
40. Method 3: Bio-event Annotation
- A
dynamic
biological
rela0onship
involving
one
or
more
par0cipants
We
found
that
Y
ac.vates
the
expression
of
X
41. Method 3: Bio-event Annotation
- A
dynamic
biological
rela0onship
involving
one
or
more
par0cipants
We
found
that
Y
ac.vates
the
expression
of
X
ID:
E1
TRIGGER:
expression
TYPE:
GENE_EXPRESSION
THEME:
X
:
gene
CAUSE:
none
(empty)
42. Method 3: Bio-event Annotation
- A
dynamic
biological
rela0onship
involving
one
or
more
par0cipants
We
found
that
Y
ac.vates
the
expression
of
X
ID:
E1
TRIGGER:
expression
TYPE:
GENE_EXPRESSION
THEME:
X
:
gene
CAUSE:
none
(empty)
43. Method 3: Bio-event Annotation
- A
dynamic
biological
rela0onship
involving
one
or
more
par0cipants
We
found
that
Y
ac.vates
the
expression
of
X
ID:
E2 ID:
E1
TRIGGER:
ac3vates
TRIGGER:
expression
TYPE:
POSITIVE_REGULATION TYPE:
GENE_EXPRESSION
THEME:
E1
:
event
THEME:
X
:
gene
CAUSE:
Y
:
protein CAUSE:
none
(empty)
44. Meta-Knowledge annotation
scheme for BioEvents
Knowledge
Type Certainty
Level
•
InvesHgaHon
•L3
•
ObservaHon
•L2
•
Analysis
•L1
•
General
ParHcipants Bio-‐Event Class
/
Type
•
Theme(s) (Centred
on
an
Event
(Grounded
to
an
event
•
Actor(s) Trigger) ontology)
Source Manner Polarity
•
High
•
Other •
NegaHve
•
Low
•
Current •
PosiHve
•
Neutral
45. Meta-Knowledge annotation
scheme for BioEvents
Knowledge
Type Certainty
Level
•
InvesHgaHon
•L3
•
ObservaHon
•L2
•
Analysis
•L1
•
General
ParHcipants Bio-‐Event Class
/
Type
•
Theme(s) (Centred
on
an
Event
(Grounded
to
an
event
•
Actor(s) Trigger) ontology)
Source Manner Polarity
•
High
•
Other •
NegaHve
•
Low
•
Current •
PosiHve
•
Neutral
• Currently being applied to the entire GENIA event corpus (1000 MEDLINE
abstracts)
47. BioEvent/MetaKnowledge Annotation
S3 = These results suggest that Y has no effect on
expression of X
Knowledge Certainty Lexical
Event Manner Source
Type Level Polarity
E1 General L3 PosiHve Neutral Current
E2 Analysis L2 NegaHve Neutral Current
48. BioEvent/MetaKnowledge Annotation
S3 = These results suggest that Y has no effect on
expression of X
Knowledge Certainty Lexical
Event Manner Source
Type Level Polarity
E1 General L3 PosiHve Neutral Current
E2 Analysis L2 NegaHve Neutral Current
49. BioEvent/MetaKnowledge Annotation
S3 = These results suggest that Y has no effect on
expression of X
Knowledge Certainty Lexical
Event Manner Source
Type Level Polarity
E1 General L3 PosiHve Neutral Current
E2 Analysis L2 NegaHve Neutral Current
50. Comparing 3 annotating systems
Name Purpose Granularity Manual/
Automated
CoreSC Identify main Sentence Manual corpus,
components of automated annotation
scientific investigation tools
for machine learning
MetaKnowledge/ Enhance information Events (intra- Manual corpus,
BioEvents extraction for sentential): can be working on automated
biomedical texts to several per
enable metadiscourse sentence, or one in
annotation more sentences
Discourse Segment Identify mechanisms of Clause Manual
Types conveying (epistemic)
knowledge in scientific
discourse
52. 3 Annotation Systems on the same paper:
CoreSC:
<annotationART atype="GSC" type="Res" conceptID="Res24"
novelty="None" advantage="None">
Here we show that BOB.1/OBF.1 regulates Btk gene expression.
</annotationART>
BioEvent/MetaKnowledge:
<sentence id="S6">Here we show that
<term id="T13" sem="Protein_family_or_group">
<gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/
<gene-or-gene-product id="G10">OBF.1</gene-or-gene-product>
</term> regulates
<term id="T14" sem="Biological_process">
<term id="T15" sem="DNA_domain_or_region">
<gene-or-gene-product id="G11">Btk
</gene-or-gene-product> gene
</term> expression
</term>.
</sentence>
Discourse Segments:
<segment segID ="286" section = "D" segtype = "RegImplication">
Here we show that
</segment>
<segment segID ="287" section = "D" segtype = "Implication">
53. 3 Annotation Systems on the same paper:
CoreSC:
<annotationART atype="GSC" type="Res" conceptID="Res24"
novelty="None" advantage="None">
Here we show that BOB.1/OBF.1 regulates Btk gene expression.
</annotationART>
BioEvent/MetaKnowledge:
<sentence id="S6">Here we show that
<term id="T13" sem="Protein_family_or_group">
<gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/
<gene-or-gene-product id="G10">OBF.1</gene-or-gene-product>
</term> regulates
<term id="T14" sem="Biological_process">
<term id="T15" sem="DNA_domain_or_region">
<gene-or-gene-product id="G11">Btk
</gene-or-gene-product> gene
</term> expression
</term>.
</sentence>
Discourse Segments:
<segment segID ="286" section = "D" segtype = "RegImplication">
Here we show that
</segment>
<segment segID ="287" section = "D" segtype = "Implication">
BOB.1/OBF.1 regulates Btk gene expression.
</segment>
54. 3 Annotation Systems on the same paper:
CoreSC:
<annotationART atype="GSC" type="Res" conceptID="Res24"
<event KT="Gen-Other" CL="L3" Manner="Neutral"
novelty="None" advantage="None"> Polarity=Positive"
Here we show that BOB.1/OBF.1 regulates Btk gene expression. id="E16">
Source="Current"
<type class="Gene_expression"/>
</annotationART> <theme idref="G11"/>
<clue>Here we show that BOB.1/OBF.1 regulates Btk
BioEvent/MetaKnowledge: gene
<sentence id="S6">Here we show that <clueType>expression</clueType>. </clue>
<term id="T13" sem="Protein_family_or_group"> </event>
<gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/
<gene-or-gene-product id="G10">OBF.1</gene-or-gene-product> CL="L3" Manner="Neutral"
<event KT="Analysis"
Polarity=Positive"
</term> regulates Source="Current" id="E17">
<term id="T14" sem="Biological_process"> <type class="Regulation"/>
<term id="T15" sem="DNA_domain_or_region">idref="E16"/>
<theme
<gene-or-gene-product id="G11">Btk <cause idref="T13"/>
</gene-or-gene-product> gene <clue>Here we <clueKT>show</clueKT> that BOB.1/
</term> expression OBF.1
</term>. <clueType>regulates</clueType> Btk gene expression. </
clue>
</sentence>
</event>
Discourse Segments:
<segment segID ="286" section = "D" segtype = "RegImplication">
Here we show that
</segment>
<segment segID ="287" section = "D" segtype = "Implication">
BOB.1/OBF.1 regulates Btk gene expression.
</segment>
55. 3 Annotation Systems on the same paper:
CoreSC:
<annotationART atype="GSC" type="Res" conceptID="Res24"
<event KT="Gen-Other" CL="L3" Manner="Neutral"
novelty="None" advantage="None"> Polarity=Positive"
Here we show that BOB.1/OBF.1 regulates Btk gene expression. id="E16">
Source="Current"
<type class="Gene_expression"/>
</annotationART> <theme idref="G11"/>
<clue>Here we show that BOB.1/OBF.1 regulates Btk
BioEvent/MetaKnowledge: gene
<sentence id="S6">Here we show that <clueType>expression</clueType>. </clue>
<term id="T13" sem="Protein_family_or_group"> </event>
<gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/
<gene-or-gene-product id="G10">OBF.1</gene-or-gene-product> CL="L3" Manner="Neutral"
<event KT="Analysis"
Polarity=Positive"
</term> regulates Source="Current" id="E17">
<term id="T14" sem="Biological_process"> <type class="Regulation"/>
<term id="T15" sem="DNA_domain_or_region">idref="E16"/>
<theme
<gene-or-gene-product id="G11">Btk <cause idref="T13"/>
</gene-or-gene-product> gene <clue>Here we <clueKT>show</clueKT> that BOB.1/
</term> expression OBF.1
</term>. <clueType>regulates</clueType> Btk gene expression. </
clue>
</sentence>
</event>
Discourse Segments:
<segment segID ="286" section = "D" segtype = "RegImplication">
Here we show that
</segment>
<segment segID ="287" section = "D" segtype = "Implication">
BOB.1/OBF.1 regulates Btk gene expression.
</segment>
56. CoreSC vs Event Meta-knowledge
- Meta-knowledge event annotation can help to provide a more fine-grained analysis of
CoreSC Background.
- Certainty Level and Source can help to refine Results and Conclusions
- More straightforward mappings occur between other categories, e.g. most sentences of the
Motivation category contain only events of type Investigation.
- Categories such as Goal and Object are catered for by CoreSCs but not covered by the
meta-knowledge scheme.
- Observation_L3_Current can be refined into CoreSC Obs, Res, Con and Hyp
57. CoreSC vs Segments
- In most cases natural mapping between the two schemes:
- CoreSC Observation maps to Result, Res maps to Result and Implication.
- CoreSC Conclusion maps to Implication and Hypothesis.
- Implication consists of CoreSC Conclusion and Result.
- Fact is CoreSC Background and Conclusion.
- Hypothesis is CoreSC Hypothesis and Conclusion.
- Problem is CoreSC Motivation.
- Most of CoreSC Bac maps to Fact and the Other categories, which refine it.
- CoreSCs refines Method and Result Segments
58. Segments vs Event Meta-knowledge
- Schemes can be complementary to each other
- Segment types can refine the interpretation of Analysis events into Hypothesis,
Implication or Result.
- Certainty level can help determine the confidence ascribed to the segments
- Likewise, meta-knowledge can help to distinguish Result segments that
correspond either to analyses of results or experimental observations.
59. Conclusions (in detail):
Common categories across the three schemes:
(CoreSC Observation, Observation_L3_Current, Result)
(CoreSC Hypothesis, Analysis_L2_Current, Hypothesis)
(CoreSC Motivation, Investigation_L3_Current, Problem)
Categories that need refining from the three schemes:
CoreSC: Background, Conclusion
Metaknowledge: Gen_Other_L3_Current, Observation_L3_Current
Segments: Method and Result
The three schemes have different strengths and offer
annotation at different levels:
- CoreSC: complimenting the other two schemes, more fine grained
Methods, Objectives and Results.
- Metaknowledge: Certainty levels and Source can help to refine the
interpretation of certain CoreSC and segment types.
- Segments: Refinement of Background; signals for modality cues
60. Conclusions (general)
Very small example, shows differences can be overcome. Each has advantages:
- Clause-level is most precise for identifying core claims
- Knowledge type/Certainty level are important refinement
- CoreSC refines methods and results and shows most promise for
automated recognition
So we need to work together!
- Plan to join forces; work on joint corpus
- Other work to add: KEfED, SWAN, ScholOnto
- Together develop a ‘claim identifier’ (not a fact extractor)
+ standards for modality/evidence scales and types
- Work together towards claim-evidence network
representation! (cf also Hypotheses, Evidence and Relationships)
61. Models of Scientific Discourse Annotation, Portland, OR, June 25
http://msda2011.wordpress.com/
The goal of the Workshop on “Models of Scientific Discourse Annotation” is to compare and
contrast the motivation behind efforts in the discourse annotation of scientific text, the
techniques and principles applied in the various approaches, and discuss ways in which they can
complement each other and collaborate to form standards for an optimal method of annotating
appropriate levels of discourse, with enhanced accuracy and usefulness.
We wish to compare, contrast and evaluate different scientific discourse annotation schemes
and tools, in order to answer questions such as:
• What motivates a certain level, method, viewpoint for annotating scientific text?
• What is the annotation level for a unit of argumentation: an event, a sentence, a segment?
What are advantages and disadvantages of all three?
• How easily can different schemes to be applied to texts? Are they easily trainable?
• Which schemes are the most portable? Can they be applied to both full papers and abstracts?
Can they be applied to texts in different domains?
• How granular should annotation schemes be? What are the advantages/disadvantages of fine
and coarse grained annotation categories?
• Can different schemes complement each other to provide different levels of information? Can
different schemes be combined to give better results?
• How can we compare annotations, how do we decide which features, approaches, techniques
work best?
• How do we exchange and evaluate each other’s annotations?
• How applicable are these efforts towards improved methods of publishing or summarizing
science?
62. CoreSC References
Liakata, M. and Teufel, S. and Siddharthan, A. and Batchelor. 2010. Corpora for the
conceptualisation and zoning of scientific papers. Proceedings of 7th International Conference
on Language Resources and Evaluation, Malta.
Guo,Y. and Korhonen, A. and Liakata, M. and Silins, I and sSun, L. and Stenius, U. 2010.
Identifying the Information Structure of Scientific Abstracts: An investigation of Three Different
Schemes. Proceedings of BioNLP 2010, Uppsala, Sweden.
Liakata, M. and Q, Claire and Soldatova, S. 2009
Semantic Annotation of Papers: Interface & Enrichment Tool (SAPIENT)
Proceedings of BioNLP-09, 2009, Boulder, Colorado
Liakata M. and Soldatova L.N. 2008. Guidelines for the annotation of General Scientific
Concepts. Aberystwyth University, JISC Project
Report http://ie-repository.jisc.ac.uk/88/ 2008.
Soldatova L.N and Liakata M. 2007. An ontology methodology and CISP - the proposed Core
Information about Scientific Papers. JISC Project Report, http://ie-repository.jisc.ac.uk/137/.
63. Meta-Annotation References
Ananiadou, S., Thompson, P. and Nawaz, R. (2010). "Improving Search Through
Event-based Biomedical Text Mining. In Proceedings of First International
Workshop on Automated Motif Discovery in Cultural Heritage and Scientific
Communication Texts (AMICUS 2010).
Nawaz, R., Thompson, P., McNaught, J. and Ananiadou, S. (2010). Meta-
Knowledge Annotation of Bio-Events. In Proceedings of the Seventh International
Conference on Language Resources and Evaluation (LREC 2010), pp. 2498-2505
Nawaz, R., Thompson, P. and Ananiadou, S. (2010). Evaluating a Meta-
Knowledge Annotation Scheme for Bio-Events. In Proceedings of the Workshop
on Negation and Speculation in Natural Language Processing, pp. 69-77
Nawaz, R., Thompson, P. and Ananiadou, S. (2010). Event Interpretation: A Step
towards Event-Centred Text Mining. In Proccedings of the First International
Workshop on Automated Motif Discovery in Cultural Heritage and Scientific
Communication Texts (AMICUS 2010).
64. Discourse Segment References
de Waard, A. (2010d). The Story of Science: A syntagmatic/paradigmatic analysis of scientific text.
Proceedings of the AMICUS Workshop,Vienna, Austria, October 2010.
de Waard, A., and Pandermaat, H. (2010). A Classification of Research Verbs to Facilitate Discourse Segment
Identification in Biological Text, Proceedings of the Interdisciplinary Workshop on Verbs. The Identification
and Representation of Verb Features, Pisa, Italy, November 4-5 2010.
de Waard, A. (2010c). The Future of the Journal? Integrating research data with scientific discourse, Logos
vol. 21, issues 1-2, January 2011.
de Waard, A. (2010b). From Proteins to Fairytales: Directions in Semantic Publishing. IEEE Intelligent Systems
25(2): 83-88 (2010)
de Waard, A. (2010a). Realm Traversal In Biological Discourse: From Model To Experiment and back again,
Workshop on Multidisciplinary Perspectives on Signalling Text Organisation (MAD 2010), March 17-20,
2010, Moissac, France.
de Waard, A. (2009b), Categorizing Epistemic Segment Types in Biology Research Articles. Workshop on
Linguistic and Psycholinguistic Approaches to Text Structuring (LPTS 2009), September 21-23 2009. –
to be published as a chapter in Linguistic and Psycholinguistic Approaches to Text Structuring, Laure
Sarda, Shirley Carter Thomas & Benjamin Fagard (eds), John Benjamins, (planned for 2010).
de Waard, A., Simon Buckingham Shum, Annamaria Carusi, Jack Park, Matthias Samwald and Ágnes
Sándor. (2009). Hypotheses, Evidence and Relationships:The HypER Approach for Representing Scientific
Knowledge Claims, Proceedings of the Workshop on Semantic Web Applications in Scientific Discourse
(SWASD 2009), co-located with the 8th International Semantic Web Conference (ISWC-2009).
de Waard, A. Buitelaar, P., & Eigner, T. (2009), Identifying the Epistemic Value of Discourse Segments in Biology
Texts, In: Proceedings of the Eighth International Conference on Computational Semantics, Tilburg, The
Netherlands, Jan.7-9 2009.