SlideShare uma empresa Scribd logo
1 de 64
Baixar para ler offline
Comparing three models of
scientific discourse annotation
   for enhanced knowledge
           extraction
          Anita de Waard, Maria Liakata,
Paul Thompson, Raheel Nawaz and Sophia Ananiadou
Accessing the knowledge in papers
Accessing the knowledge in papers
-   Papers are ‘Stories that persuade with data’
Accessing the knowledge in papers
-   Papers are ‘Stories that persuade with data’
-   So how is this persuasion done? Three ways of annotating
    key rhetorical moves:
    -   Discourse segment types (de Waard, Elsevier/Utrecht)
    -   Zones of conceptualisation using Core Scientific
        Concepts (Liakata, Aberystwyth/EBI)
    -   Metaknowledge annotation of BioEvents (Thompson,
        Ananiadou et al, NACTeM/Manchester)
Accessing the knowledge in papers
-   Papers are ‘Stories that persuade with data’
-   So how is this persuasion done? Three ways of annotating
    key rhetorical moves:
    -   Discourse segment types (de Waard, Elsevier/Utrecht)
    -   Zones of conceptualisation using Core Scientific
        Concepts (Liakata, Aberystwyth/EBI)
    -   Metaknowledge annotation of BioEvents (Thompson,
        Ananiadou et al, NACTeM/Manchester)
-   Comparison of 3 methods on full-text paper
Accessing the knowledge in papers
-   Papers are ‘Stories that persuade with data’
-   So how is this persuasion done? Three ways of annotating
    key rhetorical moves:
    -   Discourse segment types (de Waard, Elsevier/Utrecht)
    -   Zones of conceptualisation using Core Scientific
        Concepts (Liakata, Aberystwyth/EBI)
    -   Metaknowledge annotation of BioEvents (Thompson,
        Ananiadou et al, NACTeM/Manchester)
-   Comparison of 3 methods on full-text paper
-   What are overlaps/differences? Can we combine?
“Scientific articles are stories...
The Story of Goldilocks and Story                 Grammar     Paper              The AXH Domain of Ataxin-1 Mediates
the Three Bears                                                                  Neurodegeneration through Its Interaction with Gfi-1/
                                                                                 Senseless Proteins
Once upon a time                     Time         Setting     Background         The mechanisms mediating SCA1 pathogenesis are still not fully
                                                                                 understood, but some general principles have emerged.
a little girl named Goldilocks       Characters               Objects of study   the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract
She went for a walk in the           Location                 Experimental       studied and compared in vivo effects and interactions to those o
forest. Pretty soon, she came                                 setup              the human protein
upon a house.
She knocked and, when no one Goal                 Theme       Research           Gain insight into how Atx-1's function contributes to SCA1
answered,                                                     goal               pathogenesis. How these interactions might contribute to the
                                                                                 disease process and how they might cause toxicity in only a
                                                                                 subset of neurons in SCA1 is not fully understood.
she walked right in.                 Attempt                  Hypothesis         Atx-1 may play a role in the regulation of gene expression

At the table in the kitchen, there Name           Episode 1   Name               dAtX-1 and hAtx-1 Induce Similar Phenotypes When
were three bowls of porridge.                                                    Overexpressed in Files
Goldilocks was hungry.               Subgoal                  Subgoal            test the function of the AXH domain
She tasted the porridge from         Attempt                  Method             overexpressed dAtx-1 in flies using the GAL4/UAS system
the first bowl.                                                                  (Brand and Perrimon, 1993) and compared its effects to those o
                                                                                 hAtx-1.
This porridge is too hot! she        Outcome                  Results            Although at 2 days after eclosion, overexpression of either Atx-1
exclaimed.                                                                       does not show obvious morphological changes in the
                                                                                 photoreceptor cells
So, she tasted the porridge          Activity                 Data               (data not shown),
from the second bowl.
This porridge is too cold, she       Outcome                  Results            both genotypes show many large holes and loss of cell integrity
said                                                                             at 28 days
So, she tasted the last bowl of       Activity                Data               (Figures 1B-1D).
porridge.                3
Ahhh, this porridge is just right,   Outcome                  Results            Overexpression of dAtx-1 using the GMR-GAL4 driver also
she said happily and                                                             induces eye abnormalities. The external structures of the eyes
“Scientific articles are stories...
The Story of Goldilocks and Story                 Grammar     Paper              The AXH Domain of Ataxin-1 Mediates
the Three Bears                                                                  Neurodegeneration through Its Interaction with Gfi-1/
                                                                                 Senseless Proteins
Once upon a time                     Time         Setting     Background         The mechanisms mediating SCA1 pathogenesis are still not fully
                                                                                 understood, but some general principles have emerged.
a little girl named Goldilocks       Characters               Objects of study   the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract
She went for a walk in the           Location                 Experimental       studied and compared in vivo effects and interactions to those o
forest. Pretty soon, she came                                 setup              the human protein
upon a house.
She knocked and, when no one Goal                 Theme       Research           Gain insight into how Atx-1's function contributes to SCA1
answered,                                                     goal               pathogenesis. How these interactions might contribute to the
                                                                                 disease process and how they might cause toxicity in only a
                                                                                 subset of neurons in SCA1 is not fully understood.
she walked right in.                 Attempt                  Hypothesis         Atx-1 may play a role in the regulation of gene expression

At the table in the kitchen, there Name           Episode 1   Name               dAtX-1 and hAtx-1 Induce Similar Phenotypes When
were three bowls of porridge.                                                    Overexpressed in Files
Goldilocks was hungry.               Subgoal                  Subgoal            test the function of the AXH domain
She tasted the porridge from         Attempt                  Method             overexpressed dAtx-1 in flies using the GAL4/UAS system
the first bowl.                                                                  (Brand and Perrimon, 1993) and compared its effects to those o
                                                                                 hAtx-1.
This porridge is too hot! she        Outcome                  Results            Although at 2 days after eclosion, overexpression of either Atx-1
exclaimed.                                                                       does not show obvious morphological changes in the
                                                                                 photoreceptor cells
So, she tasted the porridge          Activity                 Data               (data not shown),
from the second bowl.
This porridge is too cold, she       Outcome                  Results            both genotypes show many large holes and loss of cell integrity
said                                                                             at 28 days
So, she tasted the last bowl of       Activity                Data               (Figures 1B-1D).
porridge.                3
Ahhh, this porridge is just right,   Outcome                  Results            Overexpression of dAtx-1 using the GMR-GAL4 driver also
she said happily and                                                             induces eye abnormalities. The external structures of the eyes
“...that persuade (reviewers/readers)…”




     4
“...that persuade (reviewers/readers)…”
Aristotle Quintilian                                                                         Scientific Paper

prooimion Introduction The introduction of a speech, where one announces the                 Introduction:
             / exordium    subject and purpose of the discourse, and where one usually       positioning
                           employs the persuasive appeal to ethos in order to
                           establish credibility with the audience.

prothesis    Statement of The speaker here provides a narrative account of what has          Introduction: research
             Facts/narratio happened and generally explains the nature of the case.          question


             Summary/      The propositio provides a brief summary of what one is about      Summary of contents
             propostitio   to speak on, or concisely puts forth the charges or accusation.

pistis       Proof/        The main body of the speech where one offers logical              Results
             confirmatio    arguments as proof. The appeal to logos is emphasized
                           here.
             Refutation/   As the name connotes, this section of a speech was devoted to Related Work
             refutatio     answering the counterarguments of one's opponent.


epilogos     peroratio     Following the refutatio and concluding the classical oration, the Discussion: summary,
                           peroratio conventionally employed appeals through                 implications.
                  4
                           pathos, and often included a summing up.
“...that persuade (reviewers/readers)…”
Aristotle Quintilian                                                                         Scientific Paper

prooimion Introduction The introduction of a speech, where one announces the                 Introduction:
             / exordium    subject and purpose of the discourse, and where one usually       positioning
                           employs the persuasive appeal to ethos in order to
                           establish credibility with the audience.

prothesis    Statement of The speaker here provides a narrative account of what has          Introduction: research
             Facts/narratio happened and generally explains the nature of the case.          question


             Summary/      The propositio provides a brief summary of what one is about      Summary of contents
             propostitio   to speak on, or concisely puts forth the charges or accusation.

pistis       Proof/        The main body of the speech where one offers logical              Results
             confirmatio    arguments as proof. The appeal to logos is emphasized
                           here.
             Refutation/   As the name connotes, this section of a speech was devoted to Related Work
             refutatio     answering the counterarguments of one's opponent.


epilogos     peroratio     Following the refutatio and concluding the classical oration, the Discussion: summary,
                           peroratio conventionally employed appeals through                 implications.
                  4
                           pathos, and often included a summing up.
“... with data.”




5
Annotate: fine-grained models of argumentation
     Method 1: Discourse Segment Types
 Both seminomas and the EC component of
 nonseminomas share features with ES cells. To
 exclude that the detection of miR-371-3 merely
 reflects its expression pattern in ES cells, we tested
 by RPA miR-302a-d, another ES cells-specific
 miRNA cluster (Suh et al, 2004). In many of the
 miR-371-3 expressing seminomas and
 nonseminomas, miR-302a-d was undetectable
 (Figs S7 and S8), suggesting that miR-371-3
 expression is a selective event during
 tumorigenesis.
Annotate: fine-grained models of argumentation
     Method 1: Discourse Segment Types
 Both seminomas and the EC component of
  Both seminomas and the EC component of
 nonseminomas share features with ES cells.
  nonseminomas share features with ES cells. To
 exclude thatthat detection of miR-371-3 merely
  To exclude the
 reflects its expression pattern in ES cells,reflects its
  the detection of miR-371-3 merely we tested
 by RPA miR-302a-d, another ES cells-specific
  expression pattern in ES cells,
 miRNA cluster RPA miR-302a-d, another ES cells-
  we tested by (Suh et al, 2004). In many of the
 m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
  specific 1 - 3 e cluster i n g s al, i n o m
 nonseminomas, miR-371-3 expressing seminomas
  In many of the miR-302a-d was undetectable
 (Figs nonseminomas, miR-302a-d that undetectable
  and S7 and S8), suggesting was miR-371-3
 e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
   xpress on is
 tumorigenesis.
  suggesting that
  miR-371-3 expression is a selective event during
  tumorigenesis.
Annotate: fine-grained models of argumentation
     Method 1: Discourse Segment Types
 Both seminomas and the EC component of
  Both seminomas and the EC component of                        Fact
 nonseminomas share features with ES cells.
  nonseminomas share features with ES cells. To
 exclude thatthat detection of miR-371-3 merely
  To exclude the
 reflects its expression pattern in ES cells,reflects its
  the detection of miR-371-3 merely we tested
 by RPA miR-302a-d, another ES cells-specific
  expression pattern in ES cells,
 miRNA cluster RPA miR-302a-d, another ES cells-
  we tested by (Suh et al, 2004). In many of the
 m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
  specific 1 - 3 e cluster i n g s al, i n o m
 nonseminomas, miR-371-3 expressing seminomas
  In many of the miR-302a-d was undetectable
 (Figs nonseminomas, miR-302a-d that undetectable
  and S7 and S8), suggesting was miR-371-3
 e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
   xpress on is
 tumorigenesis.
  suggesting that
  miR-371-3 expression is a selective event during
  tumorigenesis.
Annotate: fine-grained models of argumentation
     Method 1: Discourse Segment Types
 Both seminomas and the EC component of
  Both seminomas and the EC component of                        Fact
 nonseminomas share features with ES cells.
  nonseminomas share features with ES cells. To
 exclude thatthat detection of miR-371-3 merely
  To exclude the
 reflects its expression pattern in ES cells,reflects its
  the detection of miR-371-3 merely we tested                   Hypothesis
 by RPA miR-302a-d, another ES cells-specific
  expression pattern in ES cells,
 miRNA cluster RPA miR-302a-d, another ES cells-
  we tested by (Suh et al, 2004). In many of the
 m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
  specific 1 - 3 e cluster i n g s al, i n o m
 nonseminomas, miR-371-3 expressing seminomas
  In many of the miR-302a-d was undetectable
 (Figs nonseminomas, miR-302a-d that undetectable
  and S7 and S8), suggesting was miR-371-3
 e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
   xpress on is
 tumorigenesis.
  suggesting that
  miR-371-3 expression is a selective event during
  tumorigenesis.
Annotate: fine-grained models of argumentation
     Method 1: Discourse Segment Types
 Both seminomas and the EC component of
  Both seminomas and the EC component of                        Fact
 nonseminomas share features with ES cells.
  nonseminomas share features with ES cells. To
 exclude thatthat detection of miR-371-3 merely
  To exclude the
 reflects its expression pattern in ES cells,reflects its
  the detection of miR-371-3 merely we tested                   Hypothesis
 by RPA miR-302a-d, another ES cells-specific
  expression pattern in ES cells,
 miRNA cluster RPA miR-302a-d, another ES cells-
  we tested by (Suh et al, 2004). In many of the
 m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
  specific 1 - 3 e cluster i n g s al, i n o m                  Method
 nonseminomas, miR-371-3 expressing seminomas
  In many of the miR-302a-d was undetectable
 (Figs nonseminomas, miR-302a-d that undetectable
  and S7 and S8), suggesting was miR-371-3
 e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
   xpress on is
 tumorigenesis.
  suggesting that
  miR-371-3 expression is a selective event during
  tumorigenesis.
Annotate: fine-grained models of argumentation
     Method 1: Discourse Segment Types
 Both seminomas and the EC component of
  Both seminomas and the EC component of                        Fact
 nonseminomas share features with ES cells.
  nonseminomas share features with ES cells. To
 exclude thatthat detection of miR-371-3 merely
  To exclude the
 reflects its expression pattern in ES cells,reflects its
  the detection of miR-371-3 merely we tested                   Hypothesis
 by RPA miR-302a-d, another ES cells-specific
  expression pattern in ES cells,
 miRNA cluster RPA miR-302a-d, another ES cells-
  we tested by (Suh et al, 2004). In many of the
 m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
  specific 1 - 3 e cluster i n g s al, i n o m                  Method
 nonseminomas, miR-371-3 expressing seminomas
  In many of the miR-302a-d was undetectable
 (Figs nonseminomas, miR-302a-d that undetectable
  and S7 and S8), suggesting was miR-371-3                      Result
 e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
   xpress on is
 tumorigenesis.
  suggesting that
  miR-371-3 expression is a selective event during
  tumorigenesis.
Annotate: fine-grained models of argumentation
     Method 1: Discourse Segment Types
 Both seminomas and the EC component of
  Both seminomas and the EC component of                        Fact
 nonseminomas share features with ES cells.
  nonseminomas share features with ES cells. To
 exclude thatthat detection of miR-371-3 merely
  To exclude the
 reflects its expression pattern in ES cells,reflects its
  the detection of miR-371-3 merely we tested                   Hypothesis
 by RPA miR-302a-d, another ES cells-specific
  expression pattern in ES cells,
 miRNA cluster RPA miR-302a-d, another ES cells-
  we tested by (Suh et al, 2004). In many of the
 m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
  specific 1 - 3 e cluster i n g s al, i n o m                  Method
 nonseminomas, miR-371-3 expressing seminomas
  In many of the miR-302a-d was undetectable
 (Figs nonseminomas, miR-302a-d that undetectable
  and S7 and S8), suggesting was miR-371-3                      Result
 e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
   xpress on is
 tumorigenesis.
  suggesting that
  miR-371-3 expression is a selective event during
                                                                Implication
  tumorigenesis.
Annotate: fine-grained models of argumentation
     Method 1: Discourse Segment Types
 Both seminomas and the EC component of
  Both seminomas and the EC component of                        Fact
 nonseminomas share features with ES cells.
  nonseminomas share features with ES cells. To
 exclude thatthat detection of miR-371-3 merely
  To exclude the                                                Goal
 reflects its expression pattern in ES cells,reflects its
  the detection of miR-371-3 merely we tested                   Hypothesis
 by RPA miR-302a-d, another ES cells-specific
  expression pattern in ES cells,
 miRNA cluster RPA miR-302a-d, another ES cells-
  we tested by (Suh et al, 2004). In many of the
 m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
  specific 1 - 3 e cluster i n g s al, i n o m                  Method
 nonseminomas, miR-371-3 expressing seminomas
  In many of the miR-302a-d was undetectable
 (Figs nonseminomas, miR-302a-d that undetectable
  and S7 and S8), suggesting was miR-371-3                      Result
 e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
   xpress on is
 tumorigenesis.
  suggesting that
  miR-371-3 expression is a selective event during
                                                                Implication
  tumorigenesis.
Annotate: fine-grained models of argumentation
     Method 1: Discourse Segment Types
 Both seminomas and the EC component of
  Both seminomas and the EC component of                        Fact
 nonseminomas share features with ES cells.
  nonseminomas share features with ES cells. To
 exclude thatthat detection of miR-371-3 merely
  To exclude the                                                Goal
 reflects its expression pattern in ES cells,reflects its
  the detection of miR-371-3 merely we tested                   Hypothesis
 by RPA miR-302a-d, another ES cells-specific
  expression pattern in ES cells,
 miRNA cluster RPA miR-302a-d, another ES cells-
  we tested by (Suh et al, 2004). In many of the
 m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
  specific 1 - 3 e cluster i n g s al, i n o m                  Method
 nonseminomas, miR-371-3 expressing seminomas
  In many of the miR-302a-d was undetectable
 (Figs nonseminomas, miR-302a-d that undetectable
  and S7 and S8), suggesting was miR-371-3                      Result
 e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
   xpress on is
 tumorigenesis.
  suggesting that                                               Reg-Implication
  miR-371-3 expression is a selective event during
                                                                Implication
  tumorigenesis.
Annotate: fine-grained models of argumentation
     Method 1: Discourse Segment Types
                                                                          Conceptual
 Both seminomas and the EC component of
  Both seminomas and the EC component of                                  knowledge
                                                                Fact
 nonseminomas share features with ES cells.
  nonseminomas share features with ES cells. To
 exclude thatthat detection of miR-371-3 merely
  To exclude the                                                Goal
 reflects its expression pattern in ES cells,reflects its
  the detection of miR-371-3 merely we tested                   Hypothesis
 by RPA miR-302a-d, another ES cells-specific
  expression pattern in ES cells,
 miRNA cluster RPA miR-302a-d, another ES cells-
  we tested by (Suh et al, 2004). In many of the
 m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
  specific 1 - 3 e cluster i n g s al, i n o m                  Method
 nonseminomas, miR-371-3 expressing seminomas
  In many of the miR-302a-d was undetectable
 (Figs nonseminomas, miR-302a-d that undetectable
  and S7 and S8), suggesting was miR-371-3                      Result
 e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
   xpress on is
 tumorigenesis.
  suggesting that                                               Reg-Implication
  miR-371-3 expression is a selective event during
                                                                Implication
  tumorigenesis.
Annotate: fine-grained models of argumentation
     Method 1: Discourse Segment Types
                                                                          Conceptual
 Both seminomas and the EC component of
  Both seminomas and the EC component of                                  knowledge
                                                                Fact
 nonseminomas share features with ES cells.
  nonseminomas share features with ES cells. To
 exclude thatthat detection of miR-371-3 merely
  To exclude the                                                Goal
 reflects its expression pattern in ES cells,reflects its
  the detection of miR-371-3 merely we tested                   Hypothesis
 by RPA miR-302a-d, another ES cells-specific
  expression pattern in ES cells,
 miRNA cluster RPA miR-302a-d, another ES cells-
  we tested by (Suh et al, 2004). In many of the
 m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
  specific 1 - 3 e cluster i n g s al, i n o m                  Method
                                                                    Experimental
 nonseminomas, miR-371-3 expressing seminomas
  In many of the miR-302a-d was undetectable
                                                                        Evidence
 (Figs nonseminomas, miR-302a-d that undetectable
  and S7 and S8), suggesting was miR-371-3                      Result
 e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g
   xpress on is
 tumorigenesis.
  suggesting that                                               Reg-Implication
  miR-371-3 expression is a selective event during
                                                                Implication
  tumorigenesis.
Segment types point to realms of discourse:
Segment types point to realms of discourse:

    Fact                              Problem

(1) Both seminomas and the   (2) b. the detection of
EC component of              miR-371-3 merely reflects
nonseminomas share           its expression pattern in ES
features with ES cells.      cells,
Segment types point to realms of discourse:

    Fact                               Problem

(1) Both seminomas and the    (2) b. the detection of
EC component of               miR-371-3 merely reflects
nonseminomas share            its expression pattern in ES
features with ES cells.       cells,


           Goal

     (2) a. To exclude that
Segment types point to realms of discourse:

    Fact                               Problem

(1) Both seminomas and the    (2) b. the detection of
EC component of               miR-371-3 merely reflects
nonseminomas share            its expression pattern in ES
features with ES cells.       cells,


           Goal

     (2) a. To exclude that



                     Method                                   Result

           (2) c. we tested by RPA                  (3) a. In many of the miR-371-3
           miR-302a-d, another ES cells-            expressing seminomas and
           specific miRNA cluster (Suh et al,       nonseminomas, miR-302a-d was
           2004).                                   undetectable (Figs S7 and S8),
Segment types point to realms of discourse:

    Fact                               Problem

(1) Both seminomas and the    (2) b. the detection of
EC component of               miR-371-3 merely reflects
nonseminomas share            its expression pattern in ES
features with ES cells.       cells,


           Goal                                                          Regulatory-Implication

     (2) a. To exclude that                                            (3) b. suggesting that



                     Method                                   Result

           (2) c. we tested by RPA                  (3) a. In many of the miR-371-3
           miR-302a-d, another ES cells-            expressing seminomas and
           specific miRNA cluster (Suh et al,       nonseminomas, miR-302a-d was
           2004).                                   undetectable (Figs S7 and S8),
Segment types point to realms of discourse:

    Fact                               Problem                              Implication

(1) Both seminomas and the    (2) b. the detection of                  (3) c. miR-371-3
EC component of               miR-371-3 merely reflects                expression is a selective
nonseminomas share            its expression pattern in ES             event during
features with ES cells.       cells,                                   tumorigenesis.


           Goal                                                          Regulatory-Implication

     (2) a. To exclude that                                            (3) b. suggesting that



                     Method                                   Result

           (2) c. we tested by RPA                  (3) a. In many of the miR-371-3
           miR-302a-d, another ES cells-            expressing seminomas and
           specific miRNA cluster (Suh et al,       nonseminomas, miR-302a-d was
           2004).                                   undetectable (Figs S7 and S8),
Segment types point to realms of discourse:
                  Concepts, models, ‘facts’: Present tense
    Fact                               Problem                              Implication

(1) Both seminomas and the    (2) b. the detection of                  (3) c. miR-371-3
EC component of               miR-371-3 merely reflects                expression is a selective
nonseminomas share            its expression pattern in ES             event during
features with ES cells.       cells,                                   tumorigenesis.


           Goal                                                          Regulatory-Implication

     (2) a. To exclude that                                            (3) b. suggesting that



                     Method                                   Result

           (2) c. we tested by RPA                  (3) a. In many of the miR-371-3
           miR-302a-d, another ES cells-            expressing seminomas and
           specific miRNA cluster (Suh et al,       nonseminomas, miR-302a-d was
           2004).                                   undetectable (Figs S7 and S8),
Segment types point to realms of discourse:
                  Concepts, models, ‘facts’: Present tense
    Fact                               Problem                              Implication

(1) Both seminomas and the    (2) b. the detection of                  (3) c. miR-371-3
EC component of               miR-371-3 merely reflects                expression is a selective
nonseminomas share            its expression pattern in ES             event during
features with ES cells.       cells,                                   tumorigenesis.


           Goal                                                          Regulatory-Implication

     (2) a. To exclude that                                            (3) b. suggesting that



                     Method                                   Result

           (2) c. we tested by RPA                  (3) a. In many of the miR-371-3
           miR-302a-d, another ES cells-            expressing seminomas and
           specific miRNA cluster (Suh et al,       nonseminomas, miR-302a-d was
           2004).                                   undetectable (Figs S7 and S8),

                                  Experiment: Past tense
Segment types point to realms of discourse:
                  Concepts, models, ‘facts’: Present tense
    Fact                               Problem                              Implication

(1) Both seminomas and the    (2) b. the detection of                  (3) c. miR-371-3
EC component of               miR-371-3 merely reflects                expression is a selective
nonseminomas share            its expression pattern in ES             event during
features with ES cells.       cells,                                   tumorigenesis.


           Goal                                                          Regulatory-Implication

     (2) a. To exclude that       Transitions: present tense           (3) b. suggesting that



                     Method                                   Result

           (2) c. we tested by RPA                  (3) a. In many of the miR-371-3
           miR-302a-d, another ES cells-            expressing seminomas and
           specific miRNA cluster (Suh et al,       nonseminomas, miR-302a-d was
           2004).                                   undetectable (Figs S7 and S8),

                                  Experiment: Past tense
Method 2: Annotate with Core-Scientific
Concepts (CoreSC) Annotation Scheme




                                         s
Method 2: Annotate with Core-Scientific
         Concepts (CoreSC) Annotation Scheme
A three layer, ontology motivated annotation scheme for sentence annotation,
which views a paper as the humanly readable representation of a
scientific investigation [Liakata et al 2010], with 45-page guidelines
[Liakata & Soldatova 2008]

1st layer: Core Scientific Concepts (CoreSCs):
Hypothesis, Motivation, Goal, Object, Background, Method, Experiment, Model,
Observation, Result, Conclusion

2nd layer: Properties of CoreSCs. Novelty (New/Old) and Advantage
(advantage/disadvantage)                                                       s



3rd layer: Concept Identifiers: linking sentences together which refer to
the same instance of a CoreSC
CoreSC Annotation Scheme (layers 1&2)
CoreSC Annotation Scheme (layers 1&2)
Hypothesis                A statement not yet confirmed rather than a fact
Motivation                The reasons behind an investigation
Background                Background knowledge & previous work
Goal                      A target state of the investigation
Object-New                A main product or theme of the investigation
Object-New-Advantage      Advantage of an object
Object-New-Disadvantage   Disadvantage of an object
Method-New                Means by which the goals of the investigation are achieved
Method-New-Advantage      Advantage of a Method
Method-New-Disadvantage   Disadvantage of a Method
Method-Old                A method pertaining to previous work
Method-Old-Disadvantage   Disadvantage of method in previous work
Method-Old-Advantage      Advantage of method in previous work
Experiment                An experimental method
Model                     Statement about theoretical model, method or framework
Observation               Data/phenomena recorded in an investigation
Result                    Factual statements about the results of an investigation
Conclusion                Statements inferred from observations and results
CoreSC Annotation tool:
Method 3: Bio-event Annotation
- A	
  dynamic	
  biological	
  rela0onship	
  involving	
  one	
  
   or	
  more	
  par0cipants
Method 3: Bio-event Annotation
- A	
  dynamic	
  biological	
  rela0onship	
  involving	
  one	
  
 or	
  more	
  par0cipants
We	
  found	
  that	
  Y	
  ac.vates	
  the	
  expression	
  of	
  X
Method 3: Bio-event Annotation
- A	
  dynamic	
  biological	
  rela0onship	
  involving	
  one	
  
 or	
  more	
  par0cipants
We	
  found	
  that	
  Y	
  ac.vates	
  the	
  expression	
  of	
  X
Method 3: Bio-event Annotation
- A	
  dynamic	
  biological	
  rela0onship	
  involving	
  one	
  
 or	
  more	
  par0cipants
We	
  found	
  that	
  Y	
  ac.vates	
  the	
  expression	
  of	
  X

                                        ID:	
  	
  	
         E1
                                        TRIGGER:	
  	
  expression	
  
                                        TYPE:	
  	
  	
  	
  	
  	
  	
  GENE_EXPRESSION
                                        THEME:	
  	
  	
  	
  	
  X	
  :	
  gene
                                        CAUSE:	
  	
  	
  	
  	
  none	
  (empty)
                                        	
  
Method 3: Bio-event Annotation
- A	
  dynamic	
  biological	
  rela0onship	
  involving	
  one	
  
 or	
  more	
  par0cipants
We	
  found	
  that	
  Y	
  ac.vates	
  the	
  expression	
  of	
  X

                                        ID:	
  	
  	
         E1
                                        TRIGGER:	
  	
  expression	
  
                                        TYPE:	
  	
  	
  	
  	
  	
  	
  GENE_EXPRESSION
                                        THEME:	
  	
  	
  	
  	
  X	
  :	
  gene
                                        CAUSE:	
  	
  	
  	
  	
  none	
  (empty)
                                        	
  
Method 3: Bio-event Annotation
        - A	
  dynamic	
  biological	
  rela0onship	
  involving	
  one	
  
            or	
  more	
  par0cipants
           We	
  found	
  that	
  Y	
  ac.vates	
  the	
  expression	
  of	
  X

ID:	
  	
   	
  	
  	
  	
  	
  	
  E2                             ID:	
  	
  	
         E1
TRIGGER:	
  	
  	
  ac3vates	
                                     TRIGGER:	
  	
  expression	
  
TYPE:	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  POSITIVE_REGULATION   TYPE:	
  	
  	
  	
  	
  	
  	
  GENE_EXPRESSION
THEME:	
  	
  	
  	
  	
  E1	
  :	
  event	
                       THEME:	
  	
  	
  	
  	
  X	
  :	
  gene
CAUSE:	
   	
  	
  	
  	
  	
  	
  Y	
  :	
  protein               CAUSE:	
  	
  	
  	
  	
  none	
  (empty)
                                                                   	
  
Meta-Knowledge annotation
         scheme for BioEvents
               Knowledge	
  Type                                        Certainty	
  Level
•	
  InvesHgaHon
                                                       •L3
•	
  ObservaHon
                                                       •L2
•	
  Analysis
                                                       •L1
•	
  General


     ParHcipants                      Bio-­‐Event                               Class	
  /	
  Type
 •	
  Theme(s)                      (Centred	
  on	
  an	
  Event	
         (Grounded	
  to	
  an	
  event	
  
 •	
  Actor(s)                            Trigger)                                ontology)


        Source                             Manner                                     Polarity
                                   •	
  High
•	
  Other                                                                    •	
  NegaHve
                                   •	
  Low
•	
  Current                                                                  •	
  PosiHve
                                   •	
  Neutral
Meta-Knowledge annotation
            scheme for BioEvents
                  Knowledge	
  Type                                        Certainty	
  Level
   •	
  InvesHgaHon
                                                          •L3
   •	
  ObservaHon
                                                          •L2
   •	
  Analysis
                                                          •L1
   •	
  General


        ParHcipants                      Bio-­‐Event                               Class	
  /	
  Type
    •	
  Theme(s)                      (Centred	
  on	
  an	
  Event	
         (Grounded	
  to	
  an	
  event	
  
    •	
  Actor(s)                            Trigger)                                ontology)


           Source                             Manner                                     Polarity
                                      •	
  High
   •	
  Other                                                                    •	
  NegaHve
                                      •	
  Low
   •	
  Current                                                                  •	
  PosiHve
                                      •	
  Neutral
• Currently being applied to the entire GENIA event corpus (1000 MEDLINE
  abstracts)
BioEvent/MetaKnowledge Annotation
S3 = These results suggest that Y has no effect on
     expression of X
BioEvent/MetaKnowledge Annotation
S3 = These results suggest that Y has no effect on
     expression of X

         Knowledge   Certainty   Lexical	
  
 Event                                         Manner    Source
           Type        Level     Polarity


  E1      General       L3       PosiHve       Neutral   Current



  E2      Analysis      L2       NegaHve       Neutral   Current
BioEvent/MetaKnowledge Annotation
S3 = These results suggest that Y has no effect on
     expression of X

         Knowledge   Certainty   Lexical	
  
 Event                                         Manner    Source
           Type        Level     Polarity


  E1      General       L3       PosiHve       Neutral   Current



  E2      Analysis      L2       NegaHve       Neutral   Current
BioEvent/MetaKnowledge Annotation
S3 = These results suggest that Y has no effect on
     expression of X

         Knowledge   Certainty   Lexical	
  
 Event                                         Manner    Source
           Type        Level     Polarity


  E1      General       L3       PosiHve       Neutral   Current



  E2      Analysis      L2       NegaHve       Neutral   Current
Comparing 3 annotating systems
Name              Purpose                   Granularity        Manual/
                                                               Automated
CoreSC            Identify main             Sentence           Manual corpus,
                  components of                                automated annotation
                  scientific investigation                      tools
                  for machine learning

MetaKnowledge/    Enhance information       Events (intra-      Manual corpus,
BioEvents         extraction for            sentential): can be working on automated
                  biomedical texts to       several per
                  enable metadiscourse      sentence, or one in
                  annotation                more sentences
Discourse Segment Identify mechanisms of Clause                Manual
Types             conveying (epistemic)
                  knowledge in scientific
                  discourse
3 Annotation Systems on the same paper:
3 Annotation Systems on the same paper:
   CoreSC:
<annotationART atype="GSC" type="Res" conceptID="Res24"
novelty="None" advantage="None">
Here we show that BOB.1/OBF.1 regulates Btk gene expression.
</annotationART>

BioEvent/MetaKnowledge:
<sentence id="S6">Here we show that
<term id="T13" sem="Protein_family_or_group">
    <gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/
    <gene-or-gene-product id="G10">OBF.1</gene-or-gene-product>
</term> regulates
    <term id="T14" sem="Biological_process">
        <term id="T15" sem="DNA_domain_or_region">
            <gene-or-gene-product id="G11">Btk
            </gene-or-gene-product> gene
        </term> expression
    </term>.
</sentence>

Discourse Segments:
<segment segID ="286" section = "D" segtype = "RegImplication">
Here we show that
</segment>
<segment segID ="287" section = "D" segtype = "Implication">
3 Annotation Systems on the same paper:
   CoreSC:
<annotationART atype="GSC" type="Res" conceptID="Res24"
novelty="None" advantage="None">
Here we show that BOB.1/OBF.1 regulates Btk gene expression.
</annotationART>

BioEvent/MetaKnowledge:
<sentence id="S6">Here we show that
<term id="T13" sem="Protein_family_or_group">
    <gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/
    <gene-or-gene-product id="G10">OBF.1</gene-or-gene-product>
</term> regulates
    <term id="T14" sem="Biological_process">
        <term id="T15" sem="DNA_domain_or_region">
            <gene-or-gene-product id="G11">Btk
            </gene-or-gene-product> gene
        </term> expression
    </term>.
</sentence>

Discourse Segments:
<segment segID ="286" section = "D" segtype = "RegImplication">
Here we show that
</segment>
<segment segID ="287" section = "D" segtype = "Implication">
BOB.1/OBF.1 regulates Btk gene expression.
</segment>
3 Annotation Systems on the same paper:
   CoreSC:
<annotationART atype="GSC" type="Res" conceptID="Res24"
                                               <event KT="Gen-Other" CL="L3" Manner="Neutral"
novelty="None" advantage="None">               Polarity=Positive"
Here we show that BOB.1/OBF.1 regulates Btk gene expression. id="E16">
                                               Source="Current"
                                               <type class="Gene_expression"/>
</annotationART>                               <theme idref="G11"/>
                                              <clue>Here we show that BOB.1/OBF.1 regulates Btk
BioEvent/MetaKnowledge:                       gene
<sentence id="S6">Here we show that           <clueType>expression</clueType>. </clue>
<term id="T13" sem="Protein_family_or_group"> </event>
    <gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/
    <gene-or-gene-product id="G10">OBF.1</gene-or-gene-product> CL="L3" Manner="Neutral"
                                               <event KT="Analysis"
                                               Polarity=Positive"
</term> regulates                              Source="Current" id="E17">
    <term id="T14" sem="Biological_process"> <type class="Regulation"/>
        <term id="T15" sem="DNA_domain_or_region">idref="E16"/>
                                               <theme
            <gene-or-gene-product id="G11">Btk <cause idref="T13"/>
            </gene-or-gene-product> gene       <clue>Here we <clueKT>show</clueKT> that BOB.1/
        </term> expression                     OBF.1
    </term>.                                   <clueType>regulates</clueType> Btk gene expression. </
                                               clue>
</sentence>
                                                    </event>
Discourse Segments:
<segment segID ="286" section = "D" segtype = "RegImplication">
Here we show that
</segment>
<segment segID ="287" section = "D" segtype = "Implication">
BOB.1/OBF.1 regulates Btk gene expression.
</segment>
3 Annotation Systems on the same paper:
   CoreSC:
<annotationART atype="GSC" type="Res" conceptID="Res24"
                                               <event KT="Gen-Other" CL="L3" Manner="Neutral"
novelty="None" advantage="None">               Polarity=Positive"
Here we show that BOB.1/OBF.1 regulates Btk gene expression. id="E16">
                                               Source="Current"
                                               <type class="Gene_expression"/>
</annotationART>                               <theme idref="G11"/>
                                              <clue>Here we show that BOB.1/OBF.1 regulates Btk
BioEvent/MetaKnowledge:                       gene
<sentence id="S6">Here we show that           <clueType>expression</clueType>. </clue>
<term id="T13" sem="Protein_family_or_group"> </event>
    <gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/
    <gene-or-gene-product id="G10">OBF.1</gene-or-gene-product> CL="L3" Manner="Neutral"
                                               <event KT="Analysis"
                                               Polarity=Positive"
</term> regulates                              Source="Current" id="E17">
    <term id="T14" sem="Biological_process"> <type class="Regulation"/>
        <term id="T15" sem="DNA_domain_or_region">idref="E16"/>
                                               <theme
            <gene-or-gene-product id="G11">Btk <cause idref="T13"/>
            </gene-or-gene-product> gene       <clue>Here we <clueKT>show</clueKT> that BOB.1/
        </term> expression                     OBF.1
    </term>.                                   <clueType>regulates</clueType> Btk gene expression. </
                                               clue>
</sentence>
                                                    </event>
Discourse Segments:
<segment segID ="286" section = "D" segtype = "RegImplication">
Here we show that
</segment>
<segment segID ="287" section = "D" segtype = "Implication">
BOB.1/OBF.1 regulates Btk gene expression.
</segment>
CoreSC vs Event Meta-knowledge




-   Meta-knowledge event annotation can help to provide a more fine-grained analysis of
    CoreSC Background.
-   Certainty Level and Source can help to refine Results and Conclusions
-   More straightforward mappings occur between other categories, e.g. most sentences of the
    Motivation category contain only events of type Investigation.
-   Categories such as Goal and Object are catered for by CoreSCs but not covered by the
    meta-knowledge scheme.
-   Observation_L3_Current can be refined into CoreSC Obs, Res, Con and Hyp
CoreSC vs Segments




-       In most cases natural mapping between the two schemes:
    -     CoreSC Observation maps to Result, Res maps to Result and Implication.
    -     CoreSC Conclusion maps to Implication and Hypothesis.
    -     Implication consists of CoreSC Conclusion and Result.
    -     Fact is CoreSC Background and Conclusion.
    -     Hypothesis is CoreSC Hypothesis and Conclusion.
    -     Problem is CoreSC Motivation.
-        Most of CoreSC Bac maps to Fact and the Other categories, which refine it.
-       CoreSCs refines Method and Result Segments
Segments vs Event Meta-knowledge




- Schemes can be complementary to each other
- Segment types can refine the interpretation of Analysis events into Hypothesis,
    Implication or Result.
-   Certainty level can help determine the confidence ascribed to the segments
-   Likewise, meta-knowledge can help to distinguish Result segments that
    correspond either to analyses of results or experimental observations.
Conclusions (in detail):
Common categories across the three schemes:
(CoreSC Observation, Observation_L3_Current, Result)
(CoreSC Hypothesis, Analysis_L2_Current, Hypothesis)
(CoreSC Motivation, Investigation_L3_Current, Problem)
Categories that need refining from the three schemes:
CoreSC: Background, Conclusion
Metaknowledge: Gen_Other_L3_Current, Observation_L3_Current
Segments: Method and Result
The three schemes have different strengths and offer
annotation at different levels:
- CoreSC: complimenting the other two schemes, more fine grained
Methods, Objectives and Results.
- Metaknowledge: Certainty levels and Source can help to refine the
interpretation of certain CoreSC and segment types.
- Segments: Refinement of Background; signals for modality cues
Conclusions (general)
Very small example, shows differences can be overcome. Each has advantages:

  - Clause-level is most precise for identifying core claims
  - Knowledge type/Certainty level are important refinement
  - CoreSC refines methods and results and shows most promise for
        automated recognition
So we need to work together!

  - Plan to join forces; work on joint corpus
  - Other work to add: KEfED, SWAN, ScholOnto
  - Together develop a ‘claim identifier’ (not a fact extractor)
        + standards for modality/evidence scales and types

  - Work together towards claim-evidence network
        representation! (cf also Hypotheses, Evidence and Relationships)
Models of Scientific Discourse Annotation, Portland, OR, June 25
                            http://msda2011.wordpress.com/
The goal of the Workshop on “Models of Scientific Discourse Annotation” is to compare and
contrast the motivation behind efforts in the discourse annotation of scientific text, the
techniques and principles applied in the various approaches, and discuss ways in which they can
complement each other and collaborate to form standards for an optimal method of annotating
appropriate levels of discourse, with enhanced accuracy and usefulness.
We wish to compare, contrast and evaluate different scientific discourse annotation schemes
and tools, in order to answer questions such as:
• What motivates a certain level, method, viewpoint for annotating scientific text?
• What is the annotation level for a unit of argumentation: an event, a sentence, a segment?
What are advantages and disadvantages of all three?
• How easily can different schemes to be applied to texts? Are they easily trainable?
• Which schemes are the most portable? Can they be applied to both full papers and abstracts?
Can they be applied to texts in different domains?
• How granular should annotation schemes be? What are the advantages/disadvantages of fine
and coarse grained annotation categories?
• Can different schemes complement each other to provide different levels of information? Can
different schemes be combined to give better results?
• How can we compare annotations, how do we decide which features, approaches, techniques
work best?
• How do we exchange and evaluate each other’s annotations?
• How applicable are these efforts towards improved methods of publishing or summarizing
science?
CoreSC References
Liakata, M. and Teufel, S. and Siddharthan, A. and Batchelor. 2010. Corpora for the
conceptualisation and zoning of scientific papers. Proceedings of 7th International Conference
on Language Resources and Evaluation, Malta.

Guo,Y. and Korhonen, A. and Liakata, M. and Silins, I and sSun, L. and Stenius, U. 2010.
Identifying the Information Structure of Scientific Abstracts: An investigation of Three Different
Schemes. Proceedings of BioNLP 2010, Uppsala, Sweden.

Liakata, M. and Q, Claire and Soldatova, S. 2009
Semantic Annotation of Papers: Interface & Enrichment Tool (SAPIENT)
Proceedings of BioNLP-09, 2009, Boulder, Colorado

Liakata M. and Soldatova L.N. 2008. Guidelines for the annotation of General Scientific
Concepts. Aberystwyth University, JISC Project
Report http://ie-repository.jisc.ac.uk/88/ 2008.
Soldatova L.N and Liakata M. 2007. An ontology methodology and CISP - the proposed Core
Information about Scientific Papers. JISC Project Report, http://ie-repository.jisc.ac.uk/137/.
Meta-Annotation References
Ananiadou, S., Thompson, P. and Nawaz, R. (2010). "Improving Search Through
Event-based Biomedical Text Mining. In Proceedings of First International
Workshop on Automated Motif Discovery in Cultural Heritage and Scientific
Communication Texts (AMICUS 2010).
Nawaz, R., Thompson, P., McNaught, J. and Ananiadou, S. (2010). Meta-
Knowledge Annotation of Bio-Events. In Proceedings of the Seventh International
Conference on Language Resources and Evaluation (LREC 2010), pp. 2498-2505
Nawaz, R., Thompson, P. and Ananiadou, S. (2010). Evaluating a Meta-
Knowledge Annotation Scheme for Bio-Events. In Proceedings of the Workshop
on Negation and Speculation in Natural Language Processing, pp. 69-77
Nawaz, R., Thompson, P. and Ananiadou, S. (2010). Event Interpretation: A Step
towards Event-Centred Text Mining. In Proccedings of the First International
Workshop on Automated Motif Discovery in Cultural Heritage and Scientific
Communication Texts (AMICUS 2010).
Discourse Segment References
de Waard, A. (2010d). The Story of Science: A syntagmatic/paradigmatic analysis of scientific text.
Proceedings of the AMICUS Workshop,Vienna, Austria, October 2010.
de Waard, A., and Pandermaat, H. (2010). A Classification of Research Verbs to Facilitate Discourse Segment
Identification in Biological Text, Proceedings of the Interdisciplinary Workshop on Verbs. The Identification
and Representation of Verb Features, Pisa, Italy, November 4-5 2010.
de Waard, A. (2010c). The Future of the Journal? Integrating research data with scientific discourse, Logos
vol. 21, issues 1-2, January 2011.
de Waard, A. (2010b). From Proteins to Fairytales: Directions in Semantic Publishing. IEEE Intelligent Systems
25(2): 83-88 (2010)
de Waard, A. (2010a). Realm Traversal In Biological Discourse: From Model To Experiment and back again,
Workshop on Multidisciplinary Perspectives on Signalling Text Organisation (MAD 2010), March 17-20,
2010, Moissac, France.
de Waard, A. (2009b), Categorizing Epistemic Segment Types in Biology Research Articles. Workshop on
Linguistic and Psycholinguistic Approaches to Text Structuring (LPTS 2009), September 21-23 2009. –
to be published as a chapter in Linguistic and Psycholinguistic Approaches to Text Structuring, Laure
Sarda, Shirley Carter Thomas & Benjamin Fagard (eds), John Benjamins, (planned for 2010).
de Waard, A., Simon Buckingham Shum, Annamaria Carusi, Jack Park, Matthias Samwald and Ágnes
Sándor. (2009). Hypotheses, Evidence and Relationships:The HypER Approach for Representing Scientific
Knowledge Claims, Proceedings of the Workshop on Semantic Web Applications in Scientific Discourse
(SWASD 2009), co-located with the 8th International Semantic Web Conference (ISWC-2009).
de Waard, A. Buitelaar, P., & Eigner, T. (2009), Identifying the Epistemic Value of Discourse Segments in Biology
Texts, In: Proceedings of the Eighth International Conference on Computational Semantics, Tilburg, The
Netherlands, Jan.7-9 2009.

Mais conteúdo relacionado

Semelhante a Annotation systems

A syntagmatic and paradigmatic analysis of scientific text
A syntagmatic and paradigmatic analysis of scientific textA syntagmatic and paradigmatic analysis of scientific text
A syntagmatic and paradigmatic analysis of scientific textAnita de Waard
 
Scientific Sensemaking
Scientific SensemakingScientific Sensemaking
Scientific SensemakingAnita de Waard
 
Argumentation in biology papers
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papersAnita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Anita de Waard
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Paul Groth
 
C. elegans Lab Report FINAL!!!!
C. elegans Lab Report FINAL!!!!C. elegans Lab Report FINAL!!!!
C. elegans Lab Report FINAL!!!!Kian Bagheri
 
Temporal-Spatial Expressions of Spy1 in Rat Sciatic Nerve After Crush
Temporal-Spatial Expressions of Spy1 in Rat Sciatic Nerve After CrushTemporal-Spatial Expressions of Spy1 in Rat Sciatic Nerve After Crush
Temporal-Spatial Expressions of Spy1 in Rat Sciatic Nerve After CrushJiao Yang
 
Inestrosa idn 2011
Inestrosa idn 2011Inestrosa idn 2011
Inestrosa idn 2011Jorge Parodi
 
Inestrosa idn 2011
Inestrosa idn 2011Inestrosa idn 2011
Inestrosa idn 2011Jorge Parodi
 
CHARACTERIZATION OF THE INTERACTOME OF DYNLT1 AND ITS BIOLOGICAL FUNCTIONS
CHARACTERIZATION OF THE INTERACTOME OF DYNLT1 AND ITS BIOLOGICAL FUNCTIONSCHARACTERIZATION OF THE INTERACTOME OF DYNLT1 AND ITS BIOLOGICAL FUNCTIONS
CHARACTERIZATION OF THE INTERACTOME OF DYNLT1 AND ITS BIOLOGICAL FUNCTIONSMeghnaSalil
 
Directed research spring 2016 Daniel Svedberg
Directed research spring 2016 Daniel SvedbergDirected research spring 2016 Daniel Svedberg
Directed research spring 2016 Daniel SvedbergDan Svedberg
 
Poster Monterey 2005
Poster Monterey 2005Poster Monterey 2005
Poster Monterey 2005Anna Öberg
 

Semelhante a Annotation systems (20)

KNDI Toronto panel
KNDI Toronto panelKNDI Toronto panel
KNDI Toronto panel
 
A syntagmatic and paradigmatic analysis of scientific text
A syntagmatic and paradigmatic analysis of scientific textA syntagmatic and paradigmatic analysis of scientific text
A syntagmatic and paradigmatic analysis of scientific text
 
Scientific Sensemaking
Scientific SensemakingScientific Sensemaking
Scientific Sensemaking
 
Argumentation in biology papers
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papers
 
ICPW2007.deWaard
ICPW2007.deWaardICPW2007.deWaard
ICPW2007.deWaard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*
 
Reiter lecture 11.11.14
Reiter lecture 11.11.14Reiter lecture 11.11.14
Reiter lecture 11.11.14
 
Elpub
ElpubElpub
Elpub
 
C. elegans Lab Report FINAL!!!!
C. elegans Lab Report FINAL!!!!C. elegans Lab Report FINAL!!!!
C. elegans Lab Report FINAL!!!!
 
Temporal-Spatial Expressions of Spy1 in Rat Sciatic Nerve After Crush
Temporal-Spatial Expressions of Spy1 in Rat Sciatic Nerve After CrushTemporal-Spatial Expressions of Spy1 in Rat Sciatic Nerve After Crush
Temporal-Spatial Expressions of Spy1 in Rat Sciatic Nerve After Crush
 
2014 BDSRA Hofmann INCL
2014 BDSRA Hofmann INCL2014 BDSRA Hofmann INCL
2014 BDSRA Hofmann INCL
 
Biochemistry Poster
Biochemistry PosterBiochemistry Poster
Biochemistry Poster
 
Lucas...Cowell 2014
Lucas...Cowell 2014Lucas...Cowell 2014
Lucas...Cowell 2014
 
Inestrosa idn 2011
Inestrosa idn 2011Inestrosa idn 2011
Inestrosa idn 2011
 
Inestrosa idn 2011
Inestrosa idn 2011Inestrosa idn 2011
Inestrosa idn 2011
 
Final Poster.
Final Poster.Final Poster.
Final Poster.
 
CHARACTERIZATION OF THE INTERACTOME OF DYNLT1 AND ITS BIOLOGICAL FUNCTIONS
CHARACTERIZATION OF THE INTERACTOME OF DYNLT1 AND ITS BIOLOGICAL FUNCTIONSCHARACTERIZATION OF THE INTERACTOME OF DYNLT1 AND ITS BIOLOGICAL FUNCTIONS
CHARACTERIZATION OF THE INTERACTOME OF DYNLT1 AND ITS BIOLOGICAL FUNCTIONS
 
Directed research spring 2016 Daniel Svedberg
Directed research spring 2016 Daniel SvedbergDirected research spring 2016 Daniel Svedberg
Directed research spring 2016 Daniel Svedberg
 
Poster Monterey 2005
Poster Monterey 2005Poster Monterey 2005
Poster Monterey 2005
 

Mais de Anita de Waard

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingAnita de Waard
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...Anita de Waard
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 

Mais de Anita de Waard (20)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
History of the future
History of the futureHistory of the future
History of the future
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 

Annotation systems

  • 1. Comparing three models of scientific discourse annotation for enhanced knowledge extraction Anita de Waard, Maria Liakata, Paul Thompson, Raheel Nawaz and Sophia Ananiadou
  • 3. Accessing the knowledge in papers - Papers are ‘Stories that persuade with data’
  • 4. Accessing the knowledge in papers - Papers are ‘Stories that persuade with data’ - So how is this persuasion done? Three ways of annotating key rhetorical moves: - Discourse segment types (de Waard, Elsevier/Utrecht) - Zones of conceptualisation using Core Scientific Concepts (Liakata, Aberystwyth/EBI) - Metaknowledge annotation of BioEvents (Thompson, Ananiadou et al, NACTeM/Manchester)
  • 5. Accessing the knowledge in papers - Papers are ‘Stories that persuade with data’ - So how is this persuasion done? Three ways of annotating key rhetorical moves: - Discourse segment types (de Waard, Elsevier/Utrecht) - Zones of conceptualisation using Core Scientific Concepts (Liakata, Aberystwyth/EBI) - Metaknowledge annotation of BioEvents (Thompson, Ananiadou et al, NACTeM/Manchester) - Comparison of 3 methods on full-text paper
  • 6. Accessing the knowledge in papers - Papers are ‘Stories that persuade with data’ - So how is this persuasion done? Three ways of annotating key rhetorical moves: - Discourse segment types (de Waard, Elsevier/Utrecht) - Zones of conceptualisation using Core Scientific Concepts (Liakata, Aberystwyth/EBI) - Metaknowledge annotation of BioEvents (Thompson, Ananiadou et al, NACTeM/Manchester) - Comparison of 3 methods on full-text paper - What are overlaps/differences? Can we combine?
  • 7. “Scientific articles are stories... The Story of Goldilocks and Story Grammar Paper The AXH Domain of Ataxin-1 Mediates the Three Bears Neurodegeneration through Its Interaction with Gfi-1/ Senseless Proteins Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged. a little girl named Goldilocks Characters Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract She went for a walk in the Location Experimental studied and compared in vivo effects and interactions to those o forest. Pretty soon, she came setup the human protein upon a house. She knocked and, when no one Goal Theme Research Gain insight into how Atx-1's function contributes to SCA1 answered, goal pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood. she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression At the table in the kitchen, there Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When were three bowls of porridge. Overexpressed in Files Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain She tasted the porridge from Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system the first bowl. (Brand and Perrimon, 1993) and compared its effects to those o hAtx-1. This porridge is too hot! she Outcome Results Although at 2 days after eclosion, overexpression of either Atx-1 exclaimed. does not show obvious morphological changes in the photoreceptor cells So, she tasted the porridge Activity Data (data not shown), from the second bowl. This porridge is too cold, she Outcome Results both genotypes show many large holes and loss of cell integrity said at 28 days So, she tasted the last bowl of  Activity Data (Figures 1B-1D). porridge. 3 Ahhh, this porridge is just right, Outcome Results Overexpression of dAtx-1 using the GMR-GAL4 driver also she said happily and induces eye abnormalities. The external structures of the eyes
  • 8. “Scientific articles are stories... The Story of Goldilocks and Story Grammar Paper The AXH Domain of Ataxin-1 Mediates the Three Bears Neurodegeneration through Its Interaction with Gfi-1/ Senseless Proteins Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged. a little girl named Goldilocks Characters Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract She went for a walk in the Location Experimental studied and compared in vivo effects and interactions to those o forest. Pretty soon, she came setup the human protein upon a house. She knocked and, when no one Goal Theme Research Gain insight into how Atx-1's function contributes to SCA1 answered, goal pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood. she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression At the table in the kitchen, there Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When were three bowls of porridge. Overexpressed in Files Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain She tasted the porridge from Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system the first bowl. (Brand and Perrimon, 1993) and compared its effects to those o hAtx-1. This porridge is too hot! she Outcome Results Although at 2 days after eclosion, overexpression of either Atx-1 exclaimed. does not show obvious morphological changes in the photoreceptor cells So, she tasted the porridge Activity Data (data not shown), from the second bowl. This porridge is too cold, she Outcome Results both genotypes show many large holes and loss of cell integrity said at 28 days So, she tasted the last bowl of  Activity Data (Figures 1B-1D). porridge. 3 Ahhh, this porridge is just right, Outcome Results Overexpression of dAtx-1 using the GMR-GAL4 driver also she said happily and induces eye abnormalities. The external structures of the eyes
  • 10. “...that persuade (reviewers/readers)…” Aristotle Quintilian Scientific Paper prooimion Introduction The introduction of a speech, where one announces the Introduction: / exordium subject and purpose of the discourse, and where one usually positioning employs the persuasive appeal to ethos in order to establish credibility with the audience. prothesis Statement of The speaker here provides a narrative account of what has Introduction: research Facts/narratio happened and generally explains the nature of the case. question   Summary/ The propositio provides a brief summary of what one is about Summary of contents propostitio to speak on, or concisely puts forth the charges or accusation. pistis Proof/ The main body of the speech where one offers logical Results confirmatio arguments as proof. The appeal to logos is emphasized here.   Refutation/ As the name connotes, this section of a speech was devoted to Related Work refutatio answering the counterarguments of one's opponent. epilogos peroratio  Following the refutatio and concluding the classical oration, the Discussion: summary, peroratio conventionally employed appeals through implications. 4 pathos, and often included a summing up.
  • 11. “...that persuade (reviewers/readers)…” Aristotle Quintilian Scientific Paper prooimion Introduction The introduction of a speech, where one announces the Introduction: / exordium subject and purpose of the discourse, and where one usually positioning employs the persuasive appeal to ethos in order to establish credibility with the audience. prothesis Statement of The speaker here provides a narrative account of what has Introduction: research Facts/narratio happened and generally explains the nature of the case. question   Summary/ The propositio provides a brief summary of what one is about Summary of contents propostitio to speak on, or concisely puts forth the charges or accusation. pistis Proof/ The main body of the speech where one offers logical Results confirmatio arguments as proof. The appeal to logos is emphasized here.   Refutation/ As the name connotes, this section of a speech was devoted to Related Work refutatio answering the counterarguments of one's opponent. epilogos peroratio  Following the refutatio and concluding the classical oration, the Discussion: summary, peroratio conventionally employed appeals through implications. 4 pathos, and often included a summing up.
  • 13. Annotate: fine-grained models of argumentation Method 1: Discourse Segment Types Both seminomas and the EC component of nonseminomas share features with ES cells. To exclude that the detection of miR-371-3 merely reflects its expression pattern in ES cells, we tested by RPA miR-302a-d, another ES cells-specific miRNA cluster (Suh et al, 2004). In many of the miR-371-3 expressing seminomas and nonseminomas, miR-302a-d was undetectable (Figs S7 and S8), suggesting that miR-371-3 expression is a selective event during tumorigenesis.
  • 14. Annotate: fine-grained models of argumentation Method 1: Discourse Segment Types Both seminomas and the EC component of Both seminomas and the EC component of nonseminomas share features with ES cells. nonseminomas share features with ES cells. To exclude thatthat detection of miR-371-3 merely To exclude the reflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested by RPA miR-302a-d, another ES cells-specific expression pattern in ES cells, miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of the m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m nonseminomas, miR-371-3 expressing seminomas In many of the miR-302a-d was undetectable (Figs nonseminomas, miR-302a-d that undetectable and S7 and S8), suggesting was miR-371-3 e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g xpress on is tumorigenesis. suggesting that miR-371-3 expression is a selective event during tumorigenesis.
  • 15. Annotate: fine-grained models of argumentation Method 1: Discourse Segment Types Both seminomas and the EC component of Both seminomas and the EC component of Fact nonseminomas share features with ES cells. nonseminomas share features with ES cells. To exclude thatthat detection of miR-371-3 merely To exclude the reflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested by RPA miR-302a-d, another ES cells-specific expression pattern in ES cells, miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of the m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m nonseminomas, miR-371-3 expressing seminomas In many of the miR-302a-d was undetectable (Figs nonseminomas, miR-302a-d that undetectable and S7 and S8), suggesting was miR-371-3 e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g xpress on is tumorigenesis. suggesting that miR-371-3 expression is a selective event during tumorigenesis.
  • 16. Annotate: fine-grained models of argumentation Method 1: Discourse Segment Types Both seminomas and the EC component of Both seminomas and the EC component of Fact nonseminomas share features with ES cells. nonseminomas share features with ES cells. To exclude thatthat detection of miR-371-3 merely To exclude the reflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesis by RPA miR-302a-d, another ES cells-specific expression pattern in ES cells, miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of the m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m nonseminomas, miR-371-3 expressing seminomas In many of the miR-302a-d was undetectable (Figs nonseminomas, miR-302a-d that undetectable and S7 and S8), suggesting was miR-371-3 e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g xpress on is tumorigenesis. suggesting that miR-371-3 expression is a selective event during tumorigenesis.
  • 17. Annotate: fine-grained models of argumentation Method 1: Discourse Segment Types Both seminomas and the EC component of Both seminomas and the EC component of Fact nonseminomas share features with ES cells. nonseminomas share features with ES cells. To exclude thatthat detection of miR-371-3 merely To exclude the reflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesis by RPA miR-302a-d, another ES cells-specific expression pattern in ES cells, miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of the m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m Method nonseminomas, miR-371-3 expressing seminomas In many of the miR-302a-d was undetectable (Figs nonseminomas, miR-302a-d that undetectable and S7 and S8), suggesting was miR-371-3 e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g xpress on is tumorigenesis. suggesting that miR-371-3 expression is a selective event during tumorigenesis.
  • 18. Annotate: fine-grained models of argumentation Method 1: Discourse Segment Types Both seminomas and the EC component of Both seminomas and the EC component of Fact nonseminomas share features with ES cells. nonseminomas share features with ES cells. To exclude thatthat detection of miR-371-3 merely To exclude the reflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesis by RPA miR-302a-d, another ES cells-specific expression pattern in ES cells, miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of the m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m Method nonseminomas, miR-371-3 expressing seminomas In many of the miR-302a-d was undetectable (Figs nonseminomas, miR-302a-d that undetectable and S7 and S8), suggesting was miR-371-3 Result e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g xpress on is tumorigenesis. suggesting that miR-371-3 expression is a selective event during tumorigenesis.
  • 19. Annotate: fine-grained models of argumentation Method 1: Discourse Segment Types Both seminomas and the EC component of Both seminomas and the EC component of Fact nonseminomas share features with ES cells. nonseminomas share features with ES cells. To exclude thatthat detection of miR-371-3 merely To exclude the reflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesis by RPA miR-302a-d, another ES cells-specific expression pattern in ES cells, miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of the m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m Method nonseminomas, miR-371-3 expressing seminomas In many of the miR-302a-d was undetectable (Figs nonseminomas, miR-302a-d that undetectable and S7 and S8), suggesting was miR-371-3 Result e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g xpress on is tumorigenesis. suggesting that miR-371-3 expression is a selective event during Implication tumorigenesis.
  • 20. Annotate: fine-grained models of argumentation Method 1: Discourse Segment Types Both seminomas and the EC component of Both seminomas and the EC component of Fact nonseminomas share features with ES cells. nonseminomas share features with ES cells. To exclude thatthat detection of miR-371-3 merely To exclude the Goal reflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesis by RPA miR-302a-d, another ES cells-specific expression pattern in ES cells, miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of the m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m Method nonseminomas, miR-371-3 expressing seminomas In many of the miR-302a-d was undetectable (Figs nonseminomas, miR-302a-d that undetectable and S7 and S8), suggesting was miR-371-3 Result e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g xpress on is tumorigenesis. suggesting that miR-371-3 expression is a selective event during Implication tumorigenesis.
  • 21. Annotate: fine-grained models of argumentation Method 1: Discourse Segment Types Both seminomas and the EC component of Both seminomas and the EC component of Fact nonseminomas share features with ES cells. nonseminomas share features with ES cells. To exclude thatthat detection of miR-371-3 merely To exclude the Goal reflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesis by RPA miR-302a-d, another ES cells-specific expression pattern in ES cells, miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of the m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m Method nonseminomas, miR-371-3 expressing seminomas In many of the miR-302a-d was undetectable (Figs nonseminomas, miR-302a-d that undetectable and S7 and S8), suggesting was miR-371-3 Result e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g xpress on is tumorigenesis. suggesting that Reg-Implication miR-371-3 expression is a selective event during Implication tumorigenesis.
  • 22. Annotate: fine-grained models of argumentation Method 1: Discourse Segment Types Conceptual Both seminomas and the EC component of Both seminomas and the EC component of knowledge Fact nonseminomas share features with ES cells. nonseminomas share features with ES cells. To exclude thatthat detection of miR-371-3 merely To exclude the Goal reflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesis by RPA miR-302a-d, another ES cells-specific expression pattern in ES cells, miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of the m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m Method nonseminomas, miR-371-3 expressing seminomas In many of the miR-302a-d was undetectable (Figs nonseminomas, miR-302a-d that undetectable and S7 and S8), suggesting was miR-371-3 Result e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g xpress on is tumorigenesis. suggesting that Reg-Implication miR-371-3 expression is a selective event during Implication tumorigenesis.
  • 23. Annotate: fine-grained models of argumentation Method 1: Discourse Segment Types Conceptual Both seminomas and the EC component of Both seminomas and the EC component of knowledge Fact nonseminomas share features with ES cells. nonseminomas share features with ES cells. To exclude thatthat detection of miR-371-3 merely To exclude the Goal reflects its expression pattern in ES cells,reflects its the detection of miR-371-3 merely we tested Hypothesis by RPA miR-302a-d, another ES cells-specific expression pattern in ES cells, miRNA cluster RPA miR-302a-d, another ES cells- we tested by (Suh et al, 2004). In many of the m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d specific 1 - 3 e cluster i n g s al, i n o m Method Experimental nonseminomas, miR-371-3 expressing seminomas In many of the miR-302a-d was undetectable Evidence (Figs nonseminomas, miR-302a-d that undetectable and S7 and S8), suggesting was miR-371-3 Result e(Figs S7iand S8), a s e l e c t i v e e v e n t d u r i n g xpress on is tumorigenesis. suggesting that Reg-Implication miR-371-3 expression is a selective event during Implication tumorigenesis.
  • 24. Segment types point to realms of discourse:
  • 25. Segment types point to realms of discourse: Fact Problem (1) Both seminomas and the (2) b. the detection of EC component of miR-371-3 merely reflects nonseminomas share its expression pattern in ES features with ES cells. cells,
  • 26. Segment types point to realms of discourse: Fact Problem (1) Both seminomas and the (2) b. the detection of EC component of miR-371-3 merely reflects nonseminomas share its expression pattern in ES features with ES cells. cells, Goal (2) a. To exclude that
  • 27. Segment types point to realms of discourse: Fact Problem (1) Both seminomas and the (2) b. the detection of EC component of miR-371-3 merely reflects nonseminomas share its expression pattern in ES features with ES cells. cells, Goal (2) a. To exclude that Method Result (2) c. we tested by RPA (3) a. In many of the miR-371-3 miR-302a-d, another ES cells- expressing seminomas and specific miRNA cluster (Suh et al, nonseminomas, miR-302a-d was 2004). undetectable (Figs S7 and S8),
  • 28. Segment types point to realms of discourse: Fact Problem (1) Both seminomas and the (2) b. the detection of EC component of miR-371-3 merely reflects nonseminomas share its expression pattern in ES features with ES cells. cells, Goal Regulatory-Implication (2) a. To exclude that (3) b. suggesting that Method Result (2) c. we tested by RPA (3) a. In many of the miR-371-3 miR-302a-d, another ES cells- expressing seminomas and specific miRNA cluster (Suh et al, nonseminomas, miR-302a-d was 2004). undetectable (Figs S7 and S8),
  • 29. Segment types point to realms of discourse: Fact Problem Implication (1) Both seminomas and the (2) b. the detection of (3) c. miR-371-3 EC component of miR-371-3 merely reflects expression is a selective nonseminomas share its expression pattern in ES event during features with ES cells. cells, tumorigenesis. Goal Regulatory-Implication (2) a. To exclude that (3) b. suggesting that Method Result (2) c. we tested by RPA (3) a. In many of the miR-371-3 miR-302a-d, another ES cells- expressing seminomas and specific miRNA cluster (Suh et al, nonseminomas, miR-302a-d was 2004). undetectable (Figs S7 and S8),
  • 30. Segment types point to realms of discourse: Concepts, models, ‘facts’: Present tense Fact Problem Implication (1) Both seminomas and the (2) b. the detection of (3) c. miR-371-3 EC component of miR-371-3 merely reflects expression is a selective nonseminomas share its expression pattern in ES event during features with ES cells. cells, tumorigenesis. Goal Regulatory-Implication (2) a. To exclude that (3) b. suggesting that Method Result (2) c. we tested by RPA (3) a. In many of the miR-371-3 miR-302a-d, another ES cells- expressing seminomas and specific miRNA cluster (Suh et al, nonseminomas, miR-302a-d was 2004). undetectable (Figs S7 and S8),
  • 31. Segment types point to realms of discourse: Concepts, models, ‘facts’: Present tense Fact Problem Implication (1) Both seminomas and the (2) b. the detection of (3) c. miR-371-3 EC component of miR-371-3 merely reflects expression is a selective nonseminomas share its expression pattern in ES event during features with ES cells. cells, tumorigenesis. Goal Regulatory-Implication (2) a. To exclude that (3) b. suggesting that Method Result (2) c. we tested by RPA (3) a. In many of the miR-371-3 miR-302a-d, another ES cells- expressing seminomas and specific miRNA cluster (Suh et al, nonseminomas, miR-302a-d was 2004). undetectable (Figs S7 and S8), Experiment: Past tense
  • 32. Segment types point to realms of discourse: Concepts, models, ‘facts’: Present tense Fact Problem Implication (1) Both seminomas and the (2) b. the detection of (3) c. miR-371-3 EC component of miR-371-3 merely reflects expression is a selective nonseminomas share its expression pattern in ES event during features with ES cells. cells, tumorigenesis. Goal Regulatory-Implication (2) a. To exclude that Transitions: present tense (3) b. suggesting that Method Result (2) c. we tested by RPA (3) a. In many of the miR-371-3 miR-302a-d, another ES cells- expressing seminomas and specific miRNA cluster (Suh et al, nonseminomas, miR-302a-d was 2004). undetectable (Figs S7 and S8), Experiment: Past tense
  • 33. Method 2: Annotate with Core-Scientific Concepts (CoreSC) Annotation Scheme s
  • 34. Method 2: Annotate with Core-Scientific Concepts (CoreSC) Annotation Scheme A three layer, ontology motivated annotation scheme for sentence annotation, which views a paper as the humanly readable representation of a scientific investigation [Liakata et al 2010], with 45-page guidelines [Liakata & Soldatova 2008] 1st layer: Core Scientific Concepts (CoreSCs): Hypothesis, Motivation, Goal, Object, Background, Method, Experiment, Model, Observation, Result, Conclusion 2nd layer: Properties of CoreSCs. Novelty (New/Old) and Advantage (advantage/disadvantage) s 3rd layer: Concept Identifiers: linking sentences together which refer to the same instance of a CoreSC
  • 35. CoreSC Annotation Scheme (layers 1&2)
  • 36. CoreSC Annotation Scheme (layers 1&2) Hypothesis A statement not yet confirmed rather than a fact Motivation The reasons behind an investigation Background Background knowledge & previous work Goal A target state of the investigation Object-New A main product or theme of the investigation Object-New-Advantage Advantage of an object Object-New-Disadvantage Disadvantage of an object Method-New Means by which the goals of the investigation are achieved Method-New-Advantage Advantage of a Method Method-New-Disadvantage Disadvantage of a Method Method-Old A method pertaining to previous work Method-Old-Disadvantage Disadvantage of method in previous work Method-Old-Advantage Advantage of method in previous work Experiment An experimental method Model Statement about theoretical model, method or framework Observation Data/phenomena recorded in an investigation Result Factual statements about the results of an investigation Conclusion Statements inferred from observations and results
  • 38. Method 3: Bio-event Annotation - A  dynamic  biological  rela0onship  involving  one   or  more  par0cipants
  • 39. Method 3: Bio-event Annotation - A  dynamic  biological  rela0onship  involving  one   or  more  par0cipants We  found  that  Y  ac.vates  the  expression  of  X
  • 40. Method 3: Bio-event Annotation - A  dynamic  biological  rela0onship  involving  one   or  more  par0cipants We  found  that  Y  ac.vates  the  expression  of  X
  • 41. Method 3: Bio-event Annotation - A  dynamic  biological  rela0onship  involving  one   or  more  par0cipants We  found  that  Y  ac.vates  the  expression  of  X ID:       E1 TRIGGER:    expression   TYPE:              GENE_EXPRESSION THEME:          X  :  gene CAUSE:          none  (empty)  
  • 42. Method 3: Bio-event Annotation - A  dynamic  biological  rela0onship  involving  one   or  more  par0cipants We  found  that  Y  ac.vates  the  expression  of  X ID:       E1 TRIGGER:    expression   TYPE:              GENE_EXPRESSION THEME:          X  :  gene CAUSE:          none  (empty)  
  • 43. Method 3: Bio-event Annotation - A  dynamic  biological  rela0onship  involving  one   or  more  par0cipants We  found  that  Y  ac.vates  the  expression  of  X ID:                E2 ID:       E1 TRIGGER:      ac3vates   TRIGGER:    expression   TYPE:                    POSITIVE_REGULATION TYPE:              GENE_EXPRESSION THEME:          E1  :  event   THEME:          X  :  gene CAUSE:              Y  :  protein CAUSE:          none  (empty)  
  • 44. Meta-Knowledge annotation scheme for BioEvents Knowledge  Type Certainty  Level •  InvesHgaHon •L3 •  ObservaHon •L2 •  Analysis •L1 •  General ParHcipants Bio-­‐Event Class  /  Type •  Theme(s) (Centred  on  an  Event   (Grounded  to  an  event   •  Actor(s) Trigger) ontology) Source Manner Polarity •  High •  Other •  NegaHve •  Low •  Current •  PosiHve •  Neutral
  • 45. Meta-Knowledge annotation scheme for BioEvents Knowledge  Type Certainty  Level •  InvesHgaHon •L3 •  ObservaHon •L2 •  Analysis •L1 •  General ParHcipants Bio-­‐Event Class  /  Type •  Theme(s) (Centred  on  an  Event   (Grounded  to  an  event   •  Actor(s) Trigger) ontology) Source Manner Polarity •  High •  Other •  NegaHve •  Low •  Current •  PosiHve •  Neutral • Currently being applied to the entire GENIA event corpus (1000 MEDLINE abstracts)
  • 46. BioEvent/MetaKnowledge Annotation S3 = These results suggest that Y has no effect on expression of X
  • 47. BioEvent/MetaKnowledge Annotation S3 = These results suggest that Y has no effect on expression of X Knowledge Certainty Lexical   Event Manner Source Type Level Polarity E1 General L3 PosiHve Neutral Current E2 Analysis L2 NegaHve Neutral Current
  • 48. BioEvent/MetaKnowledge Annotation S3 = These results suggest that Y has no effect on expression of X Knowledge Certainty Lexical   Event Manner Source Type Level Polarity E1 General L3 PosiHve Neutral Current E2 Analysis L2 NegaHve Neutral Current
  • 49. BioEvent/MetaKnowledge Annotation S3 = These results suggest that Y has no effect on expression of X Knowledge Certainty Lexical   Event Manner Source Type Level Polarity E1 General L3 PosiHve Neutral Current E2 Analysis L2 NegaHve Neutral Current
  • 50. Comparing 3 annotating systems Name Purpose Granularity Manual/ Automated CoreSC Identify main Sentence Manual corpus, components of automated annotation scientific investigation tools for machine learning MetaKnowledge/ Enhance information Events (intra- Manual corpus, BioEvents extraction for sentential): can be working on automated biomedical texts to several per enable metadiscourse sentence, or one in annotation more sentences Discourse Segment Identify mechanisms of Clause Manual Types conveying (epistemic) knowledge in scientific discourse
  • 51. 3 Annotation Systems on the same paper:
  • 52. 3 Annotation Systems on the same paper: CoreSC: <annotationART atype="GSC" type="Res" conceptID="Res24" novelty="None" advantage="None"> Here we show that BOB.1/OBF.1 regulates Btk gene expression. </annotationART> BioEvent/MetaKnowledge: <sentence id="S6">Here we show that <term id="T13" sem="Protein_family_or_group"> <gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/ <gene-or-gene-product id="G10">OBF.1</gene-or-gene-product> </term> regulates <term id="T14" sem="Biological_process"> <term id="T15" sem="DNA_domain_or_region"> <gene-or-gene-product id="G11">Btk </gene-or-gene-product> gene </term> expression </term>. </sentence> Discourse Segments: <segment segID ="286" section = "D" segtype = "RegImplication"> Here we show that </segment> <segment segID ="287" section = "D" segtype = "Implication">
  • 53. 3 Annotation Systems on the same paper: CoreSC: <annotationART atype="GSC" type="Res" conceptID="Res24" novelty="None" advantage="None"> Here we show that BOB.1/OBF.1 regulates Btk gene expression. </annotationART> BioEvent/MetaKnowledge: <sentence id="S6">Here we show that <term id="T13" sem="Protein_family_or_group"> <gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/ <gene-or-gene-product id="G10">OBF.1</gene-or-gene-product> </term> regulates <term id="T14" sem="Biological_process"> <term id="T15" sem="DNA_domain_or_region"> <gene-or-gene-product id="G11">Btk </gene-or-gene-product> gene </term> expression </term>. </sentence> Discourse Segments: <segment segID ="286" section = "D" segtype = "RegImplication"> Here we show that </segment> <segment segID ="287" section = "D" segtype = "Implication"> BOB.1/OBF.1 regulates Btk gene expression. </segment>
  • 54. 3 Annotation Systems on the same paper: CoreSC: <annotationART atype="GSC" type="Res" conceptID="Res24" <event KT="Gen-Other" CL="L3" Manner="Neutral" novelty="None" advantage="None"> Polarity=Positive" Here we show that BOB.1/OBF.1 regulates Btk gene expression. id="E16"> Source="Current" <type class="Gene_expression"/> </annotationART> <theme idref="G11"/> <clue>Here we show that BOB.1/OBF.1 regulates Btk BioEvent/MetaKnowledge: gene <sentence id="S6">Here we show that <clueType>expression</clueType>. </clue> <term id="T13" sem="Protein_family_or_group"> </event> <gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/ <gene-or-gene-product id="G10">OBF.1</gene-or-gene-product> CL="L3" Manner="Neutral" <event KT="Analysis" Polarity=Positive" </term> regulates Source="Current" id="E17"> <term id="T14" sem="Biological_process"> <type class="Regulation"/> <term id="T15" sem="DNA_domain_or_region">idref="E16"/> <theme <gene-or-gene-product id="G11">Btk <cause idref="T13"/> </gene-or-gene-product> gene <clue>Here we <clueKT>show</clueKT> that BOB.1/ </term> expression OBF.1 </term>. <clueType>regulates</clueType> Btk gene expression. </ clue> </sentence> </event> Discourse Segments: <segment segID ="286" section = "D" segtype = "RegImplication"> Here we show that </segment> <segment segID ="287" section = "D" segtype = "Implication"> BOB.1/OBF.1 regulates Btk gene expression. </segment>
  • 55. 3 Annotation Systems on the same paper: CoreSC: <annotationART atype="GSC" type="Res" conceptID="Res24" <event KT="Gen-Other" CL="L3" Manner="Neutral" novelty="None" advantage="None"> Polarity=Positive" Here we show that BOB.1/OBF.1 regulates Btk gene expression. id="E16"> Source="Current" <type class="Gene_expression"/> </annotationART> <theme idref="G11"/> <clue>Here we show that BOB.1/OBF.1 regulates Btk BioEvent/MetaKnowledge: gene <sentence id="S6">Here we show that <clueType>expression</clueType>. </clue> <term id="T13" sem="Protein_family_or_group"> </event> <gene-or-gene-product id="G9">BOB.1</gene-or-gene-product>/ <gene-or-gene-product id="G10">OBF.1</gene-or-gene-product> CL="L3" Manner="Neutral" <event KT="Analysis" Polarity=Positive" </term> regulates Source="Current" id="E17"> <term id="T14" sem="Biological_process"> <type class="Regulation"/> <term id="T15" sem="DNA_domain_or_region">idref="E16"/> <theme <gene-or-gene-product id="G11">Btk <cause idref="T13"/> </gene-or-gene-product> gene <clue>Here we <clueKT>show</clueKT> that BOB.1/ </term> expression OBF.1 </term>. <clueType>regulates</clueType> Btk gene expression. </ clue> </sentence> </event> Discourse Segments: <segment segID ="286" section = "D" segtype = "RegImplication"> Here we show that </segment> <segment segID ="287" section = "D" segtype = "Implication"> BOB.1/OBF.1 regulates Btk gene expression. </segment>
  • 56. CoreSC vs Event Meta-knowledge - Meta-knowledge event annotation can help to provide a more fine-grained analysis of CoreSC Background. - Certainty Level and Source can help to refine Results and Conclusions - More straightforward mappings occur between other categories, e.g. most sentences of the Motivation category contain only events of type Investigation. - Categories such as Goal and Object are catered for by CoreSCs but not covered by the meta-knowledge scheme. - Observation_L3_Current can be refined into CoreSC Obs, Res, Con and Hyp
  • 57. CoreSC vs Segments - In most cases natural mapping between the two schemes: - CoreSC Observation maps to Result, Res maps to Result and Implication. - CoreSC Conclusion maps to Implication and Hypothesis. - Implication consists of CoreSC Conclusion and Result. - Fact is CoreSC Background and Conclusion. - Hypothesis is CoreSC Hypothesis and Conclusion. - Problem is CoreSC Motivation. - Most of CoreSC Bac maps to Fact and the Other categories, which refine it. - CoreSCs refines Method and Result Segments
  • 58. Segments vs Event Meta-knowledge - Schemes can be complementary to each other - Segment types can refine the interpretation of Analysis events into Hypothesis, Implication or Result. - Certainty level can help determine the confidence ascribed to the segments - Likewise, meta-knowledge can help to distinguish Result segments that correspond either to analyses of results or experimental observations.
  • 59. Conclusions (in detail): Common categories across the three schemes: (CoreSC Observation, Observation_L3_Current, Result) (CoreSC Hypothesis, Analysis_L2_Current, Hypothesis) (CoreSC Motivation, Investigation_L3_Current, Problem) Categories that need refining from the three schemes: CoreSC: Background, Conclusion Metaknowledge: Gen_Other_L3_Current, Observation_L3_Current Segments: Method and Result The three schemes have different strengths and offer annotation at different levels: - CoreSC: complimenting the other two schemes, more fine grained Methods, Objectives and Results. - Metaknowledge: Certainty levels and Source can help to refine the interpretation of certain CoreSC and segment types. - Segments: Refinement of Background; signals for modality cues
  • 60. Conclusions (general) Very small example, shows differences can be overcome. Each has advantages: - Clause-level is most precise for identifying core claims - Knowledge type/Certainty level are important refinement - CoreSC refines methods and results and shows most promise for automated recognition So we need to work together! - Plan to join forces; work on joint corpus - Other work to add: KEfED, SWAN, ScholOnto - Together develop a ‘claim identifier’ (not a fact extractor) + standards for modality/evidence scales and types - Work together towards claim-evidence network representation! (cf also Hypotheses, Evidence and Relationships)
  • 61. Models of Scientific Discourse Annotation, Portland, OR, June 25 http://msda2011.wordpress.com/ The goal of the Workshop on “Models of Scientific Discourse Annotation” is to compare and contrast the motivation behind efforts in the discourse annotation of scientific text, the techniques and principles applied in the various approaches, and discuss ways in which they can complement each other and collaborate to form standards for an optimal method of annotating appropriate levels of discourse, with enhanced accuracy and usefulness. We wish to compare, contrast and evaluate different scientific discourse annotation schemes and tools, in order to answer questions such as: • What motivates a certain level, method, viewpoint for annotating scientific text? • What is the annotation level for a unit of argumentation: an event, a sentence, a segment? What are advantages and disadvantages of all three? • How easily can different schemes to be applied to texts? Are they easily trainable? • Which schemes are the most portable? Can they be applied to both full papers and abstracts? Can they be applied to texts in different domains? • How granular should annotation schemes be? What are the advantages/disadvantages of fine and coarse grained annotation categories? • Can different schemes complement each other to provide different levels of information? Can different schemes be combined to give better results? • How can we compare annotations, how do we decide which features, approaches, techniques work best? • How do we exchange and evaluate each other’s annotations? • How applicable are these efforts towards improved methods of publishing or summarizing science?
  • 62. CoreSC References Liakata, M. and Teufel, S. and Siddharthan, A. and Batchelor. 2010. Corpora for the conceptualisation and zoning of scientific papers. Proceedings of 7th International Conference on Language Resources and Evaluation, Malta. Guo,Y. and Korhonen, A. and Liakata, M. and Silins, I and sSun, L. and Stenius, U. 2010. Identifying the Information Structure of Scientific Abstracts: An investigation of Three Different Schemes. Proceedings of BioNLP 2010, Uppsala, Sweden. Liakata, M. and Q, Claire and Soldatova, S. 2009 Semantic Annotation of Papers: Interface & Enrichment Tool (SAPIENT) Proceedings of BioNLP-09, 2009, Boulder, Colorado Liakata M. and Soldatova L.N. 2008. Guidelines for the annotation of General Scientific Concepts. Aberystwyth University, JISC Project Report http://ie-repository.jisc.ac.uk/88/ 2008. Soldatova L.N and Liakata M. 2007. An ontology methodology and CISP - the proposed Core Information about Scientific Papers. JISC Project Report, http://ie-repository.jisc.ac.uk/137/.
  • 63. Meta-Annotation References Ananiadou, S., Thompson, P. and Nawaz, R. (2010). "Improving Search Through Event-based Biomedical Text Mining. In Proceedings of First International Workshop on Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts (AMICUS 2010). Nawaz, R., Thompson, P., McNaught, J. and Ananiadou, S. (2010). Meta- Knowledge Annotation of Bio-Events. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), pp. 2498-2505 Nawaz, R., Thompson, P. and Ananiadou, S. (2010). Evaluating a Meta- Knowledge Annotation Scheme for Bio-Events. In Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 69-77 Nawaz, R., Thompson, P. and Ananiadou, S. (2010). Event Interpretation: A Step towards Event-Centred Text Mining. In Proccedings of the First International Workshop on Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts (AMICUS 2010).
  • 64. Discourse Segment References de Waard, A. (2010d). The Story of Science: A syntagmatic/paradigmatic analysis of scientific text. Proceedings of the AMICUS Workshop,Vienna, Austria, October 2010. de Waard, A., and Pandermaat, H. (2010). A Classification of Research Verbs to Facilitate Discourse Segment Identification in Biological Text, Proceedings of the Interdisciplinary Workshop on Verbs. The Identification and Representation of Verb Features, Pisa, Italy, November 4-5 2010. de Waard, A. (2010c). The Future of the Journal? Integrating research data with scientific discourse, Logos vol. 21, issues 1-2, January 2011. de Waard, A. (2010b). From Proteins to Fairytales: Directions in Semantic Publishing. IEEE Intelligent Systems 25(2): 83-88 (2010) de Waard, A. (2010a). Realm Traversal In Biological Discourse: From Model To Experiment and back again, Workshop on Multidisciplinary Perspectives on Signalling Text Organisation (MAD 2010), March 17-20, 2010, Moissac, France. de Waard, A. (2009b), Categorizing Epistemic Segment Types in Biology Research Articles. Workshop on Linguistic and Psycholinguistic Approaches to Text Structuring (LPTS 2009), September 21-23 2009. – to be published as a chapter in Linguistic and Psycholinguistic Approaches to Text Structuring, Laure Sarda, Shirley Carter Thomas & Benjamin Fagard (eds), John Benjamins, (planned for 2010). de Waard, A., Simon Buckingham Shum, Annamaria Carusi, Jack Park, Matthias Samwald and Ágnes Sándor. (2009). Hypotheses, Evidence and Relationships:The HypER Approach for Representing Scientific Knowledge Claims, Proceedings of the Workshop on Semantic Web Applications in Scientific Discourse (SWASD 2009), co-located with the 8th International Semantic Web Conference (ISWC-2009). de Waard, A. Buitelaar, P., & Eigner, T. (2009), Identifying the Epistemic Value of Discourse Segments in Biology Texts, In: Proceedings of the Eighth International Conference on Computational Semantics, Tilburg, The Netherlands, Jan.7-9 2009.