1. How to Execute
the Research Paper
Anita de Waard
Disruptive Technology Director, Elsevier Labs
http://elsatglabs.com/labs/anita
2. How to execute a research paper
- Why?
- Three use cases for linked, integrated knowledge
- What?
- Three technologies for enabling this linking and execution
- How?
- Three tools for annotation, storage and access
- What next?
- Force11 and ideas about the future
5. Use case #1: Claim-Evidence Network in Medicine
Background: Proper implementation of clinical decision support systems (CDS) can:
- Reduce errors in medical care
- Bring research results faster to the front-line clinician
- Significantly improve patient outcome.
6. Use case #1: Claim-Evidence Network in Medicine
Background: Proper implementation of clinical decision support systems (CDS) can:
- Reduce errors in medical care
- Bring research results faster to the front-line clinician
- Significantly improve patient outcome.
Requirements: To that end, such systems need to:
- Be able to answer complex questions
- Aggregate data from multiple sources, combining complex patient specific data
with information from external sources
- Be semantically aware
- Be continually updated with the latest validated research results.
7. Use case #1: Claim-Evidence Network in Medicine
Background: Proper implementation of clinical decision support systems (CDS) can:
- Reduce errors in medical care
- Bring research results faster to the front-line clinician
- Significantly improve patient outcome.
Requirements: To that end, such systems need to:
- Be able to answer complex questions
- Aggregate data from multiple sources, combining complex patient specific data
with information from external sources
- Be semantically aware
- Be continually updated with the latest validated research results.
Components: To develop such semantically aware systems, we need:
- Flexible frameworks supporting the development of such applications
- Seamless integration of relevant content
- Content sources with high quality content
- Tools enabling the extraction and aggregation of such content.
8. Use case #1: Claim-Evidence Network in Medicine
B. Elsevier-published
A. Philips’ Electronic Patient Records Clinical Guideline
C. Elsevier (or other publisher’s)
Research Report or Data
5
9. Use case #1: Claim-Evidence Network in Medicine
Step 1: Patient data + diagnosis
link to Guideline recommendation
B. Elsevier-published
A. Philips’ Electronic Patient Records Clinical Guideline
C. Elsevier (or other publisher’s)
Research Report or Data
5
10. Use case #1: Claim-Evidence Network in Medicine
Step 1: Patient data + diagnosis
link to Guideline recommendation
B. Elsevier-published
A. Philips’ Electronic Patient Records Clinical Guideline
Step 2: Guideline recommendation
links to evidence in report or data
C. Elsevier (or other publisher’s)
Research Report or Data
5
12. Use case #2: Updating Drug-Drug Interactions
Background:
- Drug-drug interactions (DDIs) are a significant source of preventable adverse
effects
- Factors contributing to the occurrence of preventable DDIs include:
- a lack of knowledge of the patient’s concurrent medications
- inaccurate or inadequate knowledge of interactions by health care providers
13. Use case #2: Updating Drug-Drug Interactions
Background:
- Drug-drug interactions (DDIs) are a significant source of preventable adverse
effects
- Factors contributing to the occurrence of preventable DDIs include:
- a lack of knowledge of the patient’s concurrent medications
- inaccurate or inadequate knowledge of interactions by health care providers
Requirements: We (HCLS SciDiscourse group: Elsevier, DERI, Pittsburgh, EBI) will:
- Manually mark up a diverse collection of content with DDIs
- Develop/train NLP tools to recognize these
- Create a triple store to maintain the relationships between drugs-DDIs-content
14. Use case #2: Updating Drug-Drug Interactions
Background:
- Drug-drug interactions (DDIs) are a significant source of preventable adverse
effects
- Factors contributing to the occurrence of preventable DDIs include:
- a lack of knowledge of the patient’s concurrent medications
- inaccurate or inadequate knowledge of interactions by health care providers
Requirements: We (HCLS SciDiscourse group: Elsevier, DERI, Pittsburgh, EBI) will:
- Manually mark up a diverse collection of content with DDIs
- Develop/train NLP tools to recognize these
- Create a triple store to maintain the relationships between drugs-DDIs-content
Components: To develop this system, we need:
- Scientific discourse ontologies to mark up relevant statement and seed NLP
- Natural language processing to identify relevant DDI
- Linked Data architecture to enable storage and access to this information
15. Use case #2: Updating Drug-Drug Interactions
Images from: Discovering drug–drug interactions: a text-mining and reasoning
approach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar,
Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553
doi:10.1093/bioinformatics/btq382 7
16. Use case #2: Updating Drug-Drug Interactions
Step 1: Manually identify DDIs and
drug names in wide collection of
content sources
Images from: Discovering drug–drug interactions: a text-mining and reasoning
approach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar,
Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553
doi:10.1093/bioinformatics/btq382 7
17. Use case #2: Updating Drug-Drug Interactions
Step 1: Manually identify DDIs and
drug names in wide collection of
content sources
Step 2: Develop a model of Drug-
Drug Interaction and define
candidates
Images from: Discovering drug–drug interactions: a text-mining and reasoning
approach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar,
Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553
doi:10.1093/bioinformatics/btq382 7
18. Use case #2: Updating Drug-Drug Interactions
Step 1: Manually identify DDIs and
drug names in wide collection of
content sources
Step 2: Develop a model of Drug-
Drug Interaction and define
candidates
Step 3: Automate this process and
store as Linked Data
Images from: Discovering drug–drug interactions: a text-mining and reasoning
approach based on properties of drug metabolism, Luis Tari∗, Saadat Anwar,
Shanshan Liang, James Cai and Chitta Baral Vol. 26 ECCB 2010, pages i547–i553
doi:10.1093/bioinformatics/btq382 7
20. Use Case #3: Review and share code
Background:
- Core of computational papers is the software
- If code is not part of the paper, hard to assess quality
- Code reuse can reduce waste of time and (taxpayer’s) money
21. Use Case #3: Review and share code
Background:
- Core of computational papers is the software
- If code is not part of the paper, hard to assess quality
- Code reuse can reduce waste of time and (taxpayer’s) money
Requirements:
- Provide a way to create, share and review code
- Integrate this with the research paper
- Enable integration with publisher’s system
22. Use Case #3: Review and share code
Background:
- Core of computational papers is the software
- If code is not part of the paper, hard to assess quality
- Code reuse can reduce waste of time and (taxpayer’s) money
Requirements:
- Provide a way to create, share and review code
- Integrate this with the research paper
- Enable integration with publisher’s system
Components:
- Integration between workflow and text authoring
- Code authoring tools and standards that allow reuse
- User environment that allows access to disparate results types
23. Use Case #3: Review and share code
Step 1: Develop Virtual Machine
environment for creating code
Pieter Van Gorp, Stefen Mazanek, SHARE: a web portal for
creating and sharing executable research papers
Procedia Computer Science 00 (2011) 1–6 9
24. Use Case #3: Review and share code
Step 1: Develop Virtual Machine
environment for creating code
Step 2: Create authoring/review
environment to allow VM evaluation
Pieter Van Gorp, Stefen Mazanek, SHARE: a web portal for
creating and sharing executable research papers
Procedia Computer Science 00 (2011) 1–6 9
25. Use Case #3: Review and share code
Step 1: Develop Virtual Machine
environment for creating code
Step 2: Create authoring/review
environment to allow VM evaluation
Step 3: Allow access to integrated
environment through SciVerse App store
Pieter Van Gorp, Stefen Mazanek, SHARE: a web portal for
creating and sharing executable research papers
Procedia Computer Science 00 (2011) 1–6 9
28. Technology #1: Discourse Annotation - at text level
Aristotle Quintilian Scientific Paper
The introduction of a speech, where one announces the
Introduction subject and purpose of the discourse, and where one usually Introduction:
prooimion / exordium employs the persuasive appeal to ethos in order to positioning
establish credibility with the audience.
Statement
The speaker here provides a narrative account of what has Introduction: research
prothesis of Facts/
happened and generally explains the nature of the case.
narratio question
The propositio provides a brief summary of what one is
Summary/
propostitio
about to speak on, or concisely puts forth the charges or Summary of contents
accusation.
The main body of the speech where one offers logical
Proof/
pistis confirmatio
arguments as proof. The appeal to logos is emphasized Results
here.
Refutation/ As the name connotes, this section of a speech was devoted
refutatio to answering the counterarguments of one's opponent. Related Work
Following the refutatio and concluding the classical oration,
Discussion: summary,
epilogos peroratio the peroratio conventionally employed appeals through
pathos, and often included a summing up. implications.
11
29. Technology #1: Discourse Annotation - at text level
Aristotle Quintilian Scientific Paper
The introduction of a speech, where one announces the
Introduction subject and purpose of the discourse, and where one usually Introduction:
prooimion / exordium employs the persuasive appeal to ethos in order to positioning
establish credibility with the audience.
Statement
The speaker here provides a narrative account of what has Introduction: research
prothesis of Facts/
happened and generally explains the nature of the case.
narratio question
The propositio provides a brief summary of what one is
Summary/
propostitio
about to speak on, or concisely puts forth the charges or Summary of contents
accusation.
The main body of the speech where one offers logical
Proof/
pistis confirmatio
arguments as proof. The appeal to logos is emphasized Results
here.
Refutation/ As the name connotes, this section of a speech was devoted
refutatio to answering the counterarguments of one's opponent. Related Work
Following the refutatio and concluding the classical oration,
Discussion: summary,
epilogos peroratio the peroratio conventionally employed appeals through
pathos, and often included a summing up. implications.
11
30. Technology #1: Discourse Annotation - at paragraph level
The Story of Goldilocks and Story Grammar Paper The AXH Domain of Ataxin-1 Mediates
the Three Bears Neurodegeneration through Its Interaction with Gfi-1/
Senseless Proteins
Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully
understood, but some general principles have emerged.
a little girl named Goldilocks Characters Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,
She went for a walk in the forest. Location Experimental studied and compared in vivo effects and interactions to those of
Pretty soon, she came upon a setup the human protein
house.
She knocked and, when no one Goal Theme Research Gain insight into how Atx-1's function contributes to SCA1
answered, goal pathogenesis. How these interactions might contribute to the
disease process and how they might cause toxicity in only a
she walked right in. subset of neurons in SCA1 is not fully understood.
Atx-1 may play a role in the regulation of gene expression
Attempt Hypothesis
At the table in the kitchen, there Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When
were three bowls of porridge. Overexpressed in Files
Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain
She tasted the porridge from the Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system
first bowl. (Brand and Perrimon, 1993) and compared its effects to those of
This porridge is too hot! she Outcome Results hAtx-1.
Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which
exclaimed. drives expression in the differentiated R1-R6 photoreceptor cells
(Mollereau et al., 2000 and O'Tousa et al., 1985), results in
neurodegeneration in the eye, as does overexpression of hAtx-1
[82Q]. Although at 2 days after eclosion, overexpression of either
So, she tasted the porridge from Activity Data (data not shown),
Atx-1 does not show obvious morphological changes in the
the second bowl.
photoreceptor cells
This porridge is too cold, she said Outcome Results both genotypes show many large holes and loss of cell integrity
at 28 days
So, she tasted the last bowl of Activity Data (Figures 1B-1D).
porridge.
Ahhh, this porridge is just right, Outcome Results Overexpression of dAtx-1 using the GMR-GAL4 driver also
she said happily and induces eye abnormalities. The external structures of the eyes
12 that overexpress dAtx-1 show disorganized ommatidia and loss
she ate it all up. Data (Figure 1F),
of interommatidial bristles
31. Technology #1: Discourse Annotation - at clause level
Both seminomas and the EC component of
nonseminomas share features with ES cells. To
exclude that the detection of miR-371-3 merely
reflects its expression pattern in ES cells, we tested
by RPA miR-302a-d, another ES cells-specific
miRNA cluster (Suh et al, 2004). In many of the
miR-371-3 expressing seminomas and
nonseminomas, miR-302a-d was undetectable (Figs
S7 and S8), suggesting that miR-371-3 expression
is a selective event during tumorigenesis.
32. Technology #1: Discourse Annotation - at clause level
Both seminomas and the EC component of
Both seminomas and the EC component of
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m
nonseminomas, miR-302a-d was undetectable (Figs
In many of the miR-371-3 expressing seminomas
S7 and S8), suggesting that miR-371-3undetectable
and nonseminomas, miR-302a-d was expression
is a selective event during tumorigenesis.
(Figs S7 and S8),
suggesting that
miR-371-3 expression is a selective event during
tumorigenesis.
33. Technology #1: Discourse Annotation - at clause level
Both seminomas and the EC component of
Both seminomas and the EC component of Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m
nonseminomas, miR-302a-d was undetectable (Figs
In many of the miR-371-3 expressing seminomas
S7 and S8), suggesting that miR-371-3undetectable
and nonseminomas, miR-302a-d was expression
is a selective event during tumorigenesis.
(Figs S7 and S8),
suggesting that
miR-371-3 expression is a selective event during
tumorigenesis.
34. Technology #1: Discourse Annotation - at clause level
Both seminomas and the EC component of
Both seminomas and the EC component of Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m
nonseminomas, miR-302a-d was undetectable (Figs
In many of the miR-371-3 expressing seminomas
S7 and S8), suggesting that miR-371-3undetectable
and nonseminomas, miR-302a-d was expression
is a selective event during tumorigenesis.
(Figs S7 and S8),
suggesting that
miR-371-3 expression is a selective event during
tumorigenesis.
35. Technology #1: Discourse Annotation - at clause level
Both seminomas and the EC component of
Both seminomas and the EC component of Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m Method
nonseminomas, miR-302a-d was undetectable (Figs
In many of the miR-371-3 expressing seminomas
S7 and S8), suggesting that miR-371-3undetectable
and nonseminomas, miR-302a-d was expression
is a selective event during tumorigenesis.
(Figs S7 and S8),
suggesting that
miR-371-3 expression is a selective event during
tumorigenesis.
36. Technology #1: Discourse Annotation - at clause level
Both seminomas and the EC component of
Both seminomas and the EC component of Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m Method
nonseminomas, miR-302a-d was undetectable (Figs
In many of the miR-371-3 expressing seminomas
S7 and S8), suggesting that miR-371-3undetectable
and nonseminomas, miR-302a-d was expression Result
is a selective event during tumorigenesis.
(Figs S7 and S8),
suggesting that
miR-371-3 expression is a selective event during
tumorigenesis.
37. Technology #1: Discourse Annotation - at clause level
Both seminomas and the EC component of
Both seminomas and the EC component of Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m Method
nonseminomas, miR-302a-d was undetectable (Figs
In many of the miR-371-3 expressing seminomas
S7 and S8), suggesting that miR-371-3undetectable
and nonseminomas, miR-302a-d was expression Result
is a selective event during tumorigenesis.
(Figs S7 and S8),
suggesting that
miR-371-3 expression is a selective event during
Implication
tumorigenesis.
38. Technology #1: Discourse Annotation - at clause level
Both seminomas and the EC component of
Both seminomas and the EC component of Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the Goal
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m Method
nonseminomas, miR-302a-d was undetectable (Figs
In many of the miR-371-3 expressing seminomas
S7 and S8), suggesting that miR-371-3undetectable
and nonseminomas, miR-302a-d was expression Result
is a selective event during tumorigenesis.
(Figs S7 and S8),
suggesting that
miR-371-3 expression is a selective event during
Implication
tumorigenesis.
39. Technology #1: Discourse Annotation - at clause level
Both seminomas and the EC component of
Both seminomas and the EC component of Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the Goal
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m Method
nonseminomas, miR-302a-d was undetectable (Figs
In many of the miR-371-3 expressing seminomas
S7 and S8), suggesting that miR-371-3undetectable
and nonseminomas, miR-302a-d was expression Result
is a selective event during tumorigenesis.
(Figs S7 and S8),
suggesting that Reg-Implication
miR-371-3 expression is a selective event during
Implication
tumorigenesis.
40. Technology #1: Discourse Annotation - at clause level
Conceptual
Both seminomas and the EC component of
Both seminomas and the EC component of knowledge
Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the Goal
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m Method
nonseminomas, miR-302a-d was undetectable (Figs
In many of the miR-371-3 expressing seminomas
S7 and S8), suggesting that miR-371-3undetectable
and nonseminomas, miR-302a-d was expression Result
is a selective event during tumorigenesis.
(Figs S7 and S8),
suggesting that Reg-Implication
miR-371-3 expression is a selective event during
Implication
tumorigenesis.
41. Technology #1: Discourse Annotation - at clause level
Conceptual
Both seminomas and the EC component of
Both seminomas and the EC component of knowledge
Fact
nonseminomas share features with ES cells.
nonseminomas share features with ES cells. To
exclude thatthat detection of miR-371-3 merely
To exclude the Goal
reflects its expression pattern in ES cells,reflects its
the detection of miR-371-3 merely we tested Hypothesis
by RPA miR-302a-d, another ES cells-specific
expression pattern in ES cells,
miRNA cluster RPA miR-302a-d, another ES cells-
we tested by (Suh et al, 2004). In many of the
m i R - 3 7 miRNAx p r e s s(Suh et e m2004). a s a n d
specific 1 - 3 e cluster i n g s al, i n o m Method
Experimental
nonseminomas, miR-302a-d was undetectable (Figs
In many of the miR-371-3 expressing seminomas
Evidence
S7 and S8), suggesting that miR-371-3undetectable
and nonseminomas, miR-302a-d was expression Result
is a selective event during tumorigenesis.
(Figs S7 and S8),
suggesting that Reg-Implication
miR-371-3 expression is a selective event during
Implication
tumorigenesis.
42. Technology #1: Discourse Annotation - across texts
Voorhoeve et al, Cell, 2006:
To investigate the possibility that miR-372 and miR-373 suppress the
expression of LATS2, we...
Therefore, these results point to LATS2 as a mediator of the miR-372 and
miR-373 effects on cell proliferation and tumorigenicity,
43. Technology #1: Discourse Annotation - across texts
Voorhoeve et al, Cell, 2006:
To investigate the possibility that miR-372 and miR-373 suppress the Hypothesis
expression of LATS2, we...
Therefore, these results point to LATS2 as a mediator of the miR-372 and
miR-373 effects on cell proliferation and tumorigenicity,
44. Technology #1: Discourse Annotation - across texts
Voorhoeve et al, Cell, 2006:
To investigate the possibility that miR-372 and miR-373 suppress the Hypothesis
expression of LATS2, we...
Therefore, these results point to LATS2 as a mediator of the miR-372 and
miR-373 effects on cell proliferation and tumorigenicity, Implication
45. Technology #1: Discourse Annotation - across texts
Voorhoeve et al, Cell, 2006:
To investigate the possibility that miR-372 and miR-373 suppress the Hypothesis
expression of LATS2, we...
Therefore, these results point to LATS2 as a mediator of the miR-372 and
miR-373 effects on cell proliferation and tumorigenicity, Implication
Raver-Shapira et.al, JMolCell 2007
... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in
testicular germ cell tumors by inhibition of LATS2 expression, which suggests
that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).
46. Technology #1: Discourse Annotation - across texts
Voorhoeve et al, Cell, 2006:
To investigate the possibility that miR-372 and miR-373 suppress the Hypothesis
expression of LATS2, we...
Therefore, these results point to LATS2 as a mediator of the miR-372 and
miR-373 effects on cell proliferation and tumorigenicity, Implication
Raver-Shapira et.al, JMolCell 2007 Cited Implication
... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in
testicular germ cell tumors by inhibition of LATS2 expression, which suggests
that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).
47. Technology #1: Discourse Annotation - across texts
Voorhoeve et al, Cell, 2006:
To investigate the possibility that miR-372 and miR-373 suppress the Hypothesis
expression of LATS2, we...
Therefore, these results point to LATS2 as a mediator of the miR-372 and
miR-373 effects on cell proliferation and tumorigenicity, Implication
Raver-Shapira et.al, JMolCell 2007 Cited Implication
... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in
testicular germ cell tumors by inhibition of LATS2 expression, which suggests
that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).
Yabuta, JBioChem 2007:
miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)
48. Technology #1: Discourse Annotation - across texts
Voorhoeve et al, Cell, 2006:
To investigate the possibility that miR-372 and miR-373 suppress the Hypothesis
expression of LATS2, we...
Therefore, these results point to LATS2 as a mediator of the miR-372 and
miR-373 effects on cell proliferation and tumorigenicity, Implication
Raver-Shapira et.al, JMolCell 2007 Cited Implication
... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in
testicular germ cell tumors by inhibition of LATS2 expression, which suggests
that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).
Yabuta, JBioChem 2007: Fact
miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)
51. Technology #1: Towards automated
Discourse Annotation: CoreSC
- Classified with Support Vector Machines (SVM)
- Sequence labelling by Conditional Random Fields (CRF)
- F-score between 18% (motivation) and 76% (experimental methods)
- ‘We plan to use CoreSC annotated papers in biology to guide information
extraction and retrieval, characterise extracted events and relations and
facilitate inference from hypotheses to conclusions in scientific papers.’
Automatic recognition of conceptualisation zones in scientific articles to aid biological information extraction
Maria Liakata,, Shyamasree Saha. Simon Dobnik,Colin Batchelor and Dietrich Rebholz-Schuhmann
Bioinformatics 2011 (Accepted)
53. Technology #2: Linked Data
1. Use URIs to name things
2. Use HTTP URIs so they can be looked up
3. Return useful data when things are looked up
4. Include links to other things in the returned data
54. Technology #2: Linked Data
1. Use URIs to name things
2. Use HTTP URIs so they can be looked up
3. Return useful data when things are looked up
4. Include links to other things in the returned data
“Linked data is just a term for how to publish data on the web
while working with the web. And the web is the best architecture
we know for publishing information in a hugely diverse and
distributed environment, in a gradual and sustainable way.”
Tennison J, 2010. Why Linked Data for data.gov.uk?
http://www.jenitennison.com/blog/node/140
55. Technology # 3: Workflow integration
A. de Waard, The Future of the Journal?
Integrating research data with scientific discourse
http://precedings.nature.com/documents/4742/version/1
56. Technology # 3: Workflow integration
1. Research: Each item in the system has metadata
metadata (including provenance) and relations to other data items
metadata added to it.
metadata
metadata
metadata
A. de Waard, The Future of the Journal?
Integrating research data with scientific discourse
http://precedings.nature.com/documents/4742/version/1
57. Technology # 3: Workflow integration
1. Research: Each item in the system has metadata
metadata (including provenance) and relations to other data items
metadata added to it.
2. Workflow: All data items created in the lab are added
metadata
to a (lab-owned) workflow system.
metadata
metadata
A. de Waard, The Future of the Journal?
Integrating research data with scientific discourse
http://precedings.nature.com/documents/4742/version/1
58. Technology # 3: Workflow integration
1. Research: Each item in the system has metadata
metadata (including provenance) and relations to other data items
metadata added to it.
2. Workflow: All data items created in the lab are added
metadata
to a (lab-owned) workflow system.
3. Authoring: A paper is written in an authoring tool which
can pull data with provenance from the workflow tool in the
appropriate representation into the document.
metadata
metadata
Rats were subjected to two grueling
tests
(click on fig 2 to see underlying
data). These results suggest that the
neurological pain pro-
A. de Waard, The Future of the Journal?
Integrating research data with scientific discourse
http://precedings.nature.com/documents/4742/version/1
59. Technology # 3: Workflow integration
1. Research: Each item in the system has metadata
metadata (including provenance) and relations to other data items
metadata added to it.
2. Workflow: All data items created in the lab are added
metadata
to a (lab-owned) workflow system.
3. Authoring: A paper is written in an authoring tool which
can pull data with provenance from the workflow tool in the
appropriate representation into the document.
metadata 4. Editing and review: Once the co-authors agree, the
paper is ‘exposed’ to the editors, who in turn expose it to
metadata reviewers. Reports are stored in the authoring/editing
system, the paper gets updated, until it is validated.
Rats were subjected to two grueling
tests
(click on fig 2 to see underlying
data). These results suggest that the
neurological pain pro-
Review
Revise
Edit
A. de Waard, The Future of the Journal?
Integrating research data with scientific discourse
http://precedings.nature.com/documents/4742/version/1
60. Technology # 3: Workflow integration
1. Research: Each item in the system has metadata
metadata (including provenance) and relations to other data items
metadata added to it.
2. Workflow: All data items created in the lab are added
metadata
to a (lab-owned) workflow system.
3. Authoring: A paper is written in an authoring tool which
can pull data with provenance from the workflow tool in the
appropriate representation into the document.
metadata 4. Editing and review: Once the co-authors agree, the
paper is ‘exposed’ to the editors, who in turn expose it to
metadata reviewers. Reports are stored in the authoring/editing
system, the paper gets updated, until it is validated.
5. Publishing and distribution: When a paper is
published, a collection of validated information is
exposed to the world. It remains connected to its related
Rats were subjected to two grueling data item, and its heritage can be traced.
tests
(click on fig 2 to see underlying
data). These results suggest that the
neurological pain pro-
Review
Revise
Edit
A. de Waard, The Future of the Journal?
Integrating research data with scientific discourse
http://precedings.nature.com/documents/4742/version/1
61. Technology # 3: Workflow integration
1. Research: Each item in the system has metadata
metadata (including provenance) and relations to other data items
metadata added to it.
2. Workflow: All data items created in the lab are added
metadata
to a (lab-owned) workflow system.
3. Authoring: A paper is written in an authoring tool which
can pull data with provenance from the workflow tool in the
appropriate representation into the document.
metadata 4. Editing and review: Once the co-authors agree, the
paper is ‘exposed’ to the editors, who in turn expose it to
metadata reviewers. Reports are stored in the authoring/editing
system, the paper gets updated, until it is validated.
5. Publishing and distribution: When a paper is
published, a collection of validated information is
exposed to the world. It remains connected to its related
Rats were subjected to two grueling data item, and its heritage can be traced.
tests
(click on fig 2 to see underlying 6. User applications: distributed applications run on this
data). These results suggest that the ‘exposed data’ universe.
neurological pain pro-
Some other publisher
Review
Revise
Edit
A. de Waard, The Future of the Journal?
Integrating research data with scientific discourse
http://precedings.nature.com/documents/4742/version/1
62. Technology # 3: Workflow integration
QTL
(C)
Dave
De
Roure Results Workflow
16
Logs
Metadata Slides Paper
Common
pathways
Workflow
13 Results
63. Technology # 3: Workflow integration
QTL
(C)
Dave
De
Roure Results Workflow
16
Logs
Metadata Slides Paper
Common
pathways
Workflow
13 Results
64. Technology # 3: Workflow integration
QTL
(C)
Dave
De
Roure Results Workflow
16
produces
Included
in
Included
in Published
in
Feeds
into
Logs produces Included
in Included
in
Metadata Slides Paper
produces Published
in
Common
pathways
Workflow
13 Results
66. Tool # 1: DOMEO annotation tool
http://purl.org/swan/af e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2224208/?tool=pubmed
Paolo Ciccarese, Marco Ocana, Tim Clark,
DOMEO: a web-based tool for semantic annotation of
online documents, Bioontologies, 2011
67. Tool # 1: DOMEO annotation tool
http://purl.org/swan/af e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2224208/?tool=pubmed
- Allows for manual and automated annotation, or both
- Now linked to NCBO text mining tool, expanding to all UIMA
- Standoff annotations in Annotation Ontology = RDF format, can be exported
Paolo Ciccarese, Marco Ocana, Tim Clark,
DOMEO: a web-based tool for semantic annotation of
online documents, Bioontologies, 2011
68. Tool # 1: DOMEO annotation tool
http://purl.org/swan/af e.g. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2224208/?tool=pubmed
- Allows for manual and automated annotation, or both
- Now linked to NCBO text mining tool, expanding to all UIMA
- Standoff annotations in Annotation Ontology = RDF format, can be exported
Paolo Ciccarese, Marco Ocana, Tim Clark,
DOMEO: a web-based tool for semantic annotation of
online documents, Bioontologies, 2011
73. Tool # 3: ScienceDirect app store
- Eclipse SDK platform accessing all
ScienceDirect/Scopus content
- Build applications on top of content
- Offer to users in marketplace
75. Force11
http://force11.org
Force11 = Future of Research Communication
and e-Scholarship, 2011 is a community of
scholars, librarians, archivists, publishers and
research funders that has arisen organically to
help facilitate the change toward improved
knowledge creation and sharing.
76. Force11
http://force11.org
Force11 = Future of Research Communication
and e-Scholarship, 2011 is a community of
scholars, librarians, archivists, publishers and
research funders that has arisen organically to
help facilitate the change toward improved
knowledge creation and sharing.
77. Force11
http://force11.org
Force11 = Future of Research Communication
and e-Scholarship, 2011 is a community of
scholars, librarians, archivists, publishers and
research funders that has arisen organically to
help facilitate the change toward improved
knowledge creation and sharing.
Individually and collectively, we aim
to bring about a change in
scholarly communication through
the effective use of information
technologies
Next step: work on these issues.
We need more publishers on
board!
79. Some thoughts about the future:
- Let’s think in terms of use cases, not technologies:
- Identify where knowledge exists, within and outside of the article
- Identify what the information needs are, and which components need
to be connected
- Only if our content plays well with others does it get to stay in the game!
80. Some thoughts about the future:
- Let’s think in terms of use cases, not technologies:
- Identify where knowledge exists, within and outside of the article
- Identify what the information needs are, and which components need
to be connected
- Only if our content plays well with others does it get to stay in the game!
- Work with scientists, grant agencies, libraries, software developers big and
small and.... each other!
81. Some thoughts about the future:
- Let’s think in terms of use cases, not technologies:
- Identify where knowledge exists, within and outside of the article
- Identify what the information needs are, and which components need
to be connected
- Only if our content plays well with others does it get to stay in the game!
- Work with scientists, grant agencies, libraries, software developers big and
small and.... each other!
- For instance, let’s collectively look at enabling:
- Standoff annotation formats
- Research data and workflow standards/integration
- Claim-evidence networks and discourse annotation:
82.
83. - Which discourse annotation schemes are most portable? Can they
be applied to both full papers and abstracts? Can they be applied to
texts in different domains and different genres (research papers,
reviews, patents, etc)?
- How can we compare annotations, and how can we decide which
features, approaches or techniques work best? What are the most
topical use cases? How can we evaluate performance and what are
the most appropriate tasks?
- What corpora are currently available for comparing and contrasting
discourse annotation, and how can we improve and increase these?
- How applicable are these efforts for improving methods of
publishing, detecting and correcting author's errors at the
discourse level, or summarizing scholarly text? How close are
we to implementing them at a production scale?
84. Thank you!
- Tim Clark, Paolo Ciccarese, Harvard, More information:
Cambridge, USA
- Data2Semantics:
- Eduard Hovy, Gully Burns, Cartic http://www.data2semantics.org
Ramakrishnan, ISI/USC, Los Angeles, USA
- W3C group on Discourse Structure:
- Phil Bourne, Maryann Martone, UCSD, USA http://www.w3.org/wiki/HCLSIG/SWANSIOC
- Sophia Ananiadou, NaCTeM, Manchester, UK - Executable Paper Challenge:
http://www.executablepapers.com
- Dave DeRoure, Oxford eScience Center, UK
- Parsing rhetoric:
- Maria Liakata, EBI, Cambridge, UK http://elsatglabs.com/labs/anita/
- Paul Groth, Frank van Harmelen,Vrije - Sapienta: http://www.sapientaproject.com/
Universiteit, Amsterdam, Netherlands
- SciVerse: http://developer.sciverse.com
- Henk Pander Maat, Ted Sanders, Universiteit
Utrecht, Netherlands - Force11: http://force11.org
- The Force11 members - DSSD2012: http://www.nactem.ac.uk/dssd/
Or contact me: Anita de Waard, a.dewaard@elsevier.com