Experiments Link Proteins to Hypotheses in Biology Articles
1. From Proteins to Hypotheses:
Some Experiments in
Semantic Enrichment
Anita de Waard
Principal Researcher, Disruptive Technologies
Elsevier Labs, Amsterdam
Utrecht Institute of Linguistics, University Utrecht
The Netherlands
a.dewaard@elsevier.com
3. Some Experiments in Semantic Enrichment
Entities and Relationships:
FEBS Structured Digital Abstracts
OKKAM Entity Repository
4. Some Experiments in Semantic Enrichment
Entities and Relationships:
FEBS Structured Digital Abstracts
OKKAM Entity Repository
Hypotheses and Evidence:
Discourse Analysis of Biology Articles
Hypotheses, Evidence and Relationships
5. Some Experiments in Semantic Enrichment
Entities and Relationships:
FEBS Structured Digital Abstracts
OKKAM Entity Repository
Hypotheses and Evidence:
Discourse Analysis of Biology Articles
Hypotheses, Evidence and Relationships
Collaborations:
Elsevier Grand Challenge
Future of Research Communications
7. FEBS Letters Structured Digital Abstracts
- With FEBS Letters Editorial Office in Heidelberg; and MINT database
Rome
- SDA [Gerstein et. al]: ‘machine-readable XML summary of pertinent
facts’
- For each ppi paper: proteins, protein-protein interactions, methods:
provided by author in XLS
- Started April 9, 2008: in ScienceDirect links to MINT and Uniprot
- Now 117 articles with SDA: http://www.febsletters.org/content/sda
- Being used as Gold Standard for Biocreative Challenge II.5
http://www.biocreative.org/tasks/biocreative-ii5/biocreative-ii5-evaluation/
8. expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the redistribution of TPAP from the
cytosol to the ER.
MINT-6168263:
Gsg1 (uniprotkb:Q8R1W2), TPAP
(uniprotkb:Q9WVP6) and Calmegin
(uniprotkb:P52194) colocalize (MI:0403) by
cosedimentation (MI:0027)
MINT-6168204, MINT-6168178:
Gsg1 (uniprotkb:Q8R1W2) and TPAP
(uniprotkb:Q9WVP6) colocalize (MI:0403) by
fluorescence microscopy (MI:0416)
MINT-6167930:
Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
9. expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the redistribution of TPAP from the
cytosol to the ER.
MINT-6168263:
Gsg1 (uniprotkb:Q8R1W2), TPAP
(uniprotkb:Q9WVP6) and Calmegin
(uniprotkb:P52194) colocalize (MI:0403) by
cosedimentation (MI:0027)
MINT-6168204, MINT-6168178:
Gsg1 (uniprotkb:Q8R1W2) and TPAP
(uniprotkb:Q9WVP6) colocalize (MI:0403) by
fluorescence microscopy (MI:0416)
MINT-6167930:
Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
10. expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the redistribution of TPAP from the
cytosol to the ER.
MINT-6168263:
Gsg1 (uniprotkb:Q8R1W2), TPAP
(uniprotkb:Q9WVP6) and Calmegin
(uniprotkb:P52194) colocalize (MI:0403) by
cosedimentation (MI:0027)
MINT-6168204, MINT-6168178:
Gsg1 (uniprotkb:Q8R1W2) and TPAP
(uniprotkb:Q9WVP6) colocalize (MI:0403) by
fluorescence microscopy (MI:0416)
MINT-6167930:
Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
11. expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the redistribution of TPAP from the
cytosol to the ER.
MINT-6168263:
Gsg1 (uniprotkb:Q8R1W2), TPAP
(uniprotkb:Q9WVP6) and Calmegin
(uniprotkb:P52194) colocalize (MI:0403) by
cosedimentation (MI:0027)
MINT-6168204, MINT-6168178:
Gsg1 (uniprotkb:Q8R1W2) and TPAP
(uniprotkb:Q9WVP6) colocalize (MI:0403) by
fluorescence microscopy (MI:0416)
MINT-6167930:
Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
12. expression of GSG1 stimulates TPAP targeting to the
ER, suggesting that interactions between the two
proteins lead to the redistribution of TPAP from the
cytosol to the ER.
MINT-6168263:
Gsg1 (uniprotkb:Q8R1W2), TPAP
(uniprotkb:Q9WVP6) and Calmegin
(uniprotkb:P52194) colocalize (MI:0403) by
cosedimentation (MI:0027)
MINT-6168204, MINT-6168178:
Gsg1 (uniprotkb:Q8R1W2) and TPAP
(uniprotkb:Q9WVP6) colocalize (MI:0403) by
fluorescence microscopy (MI:0416)
MINT-6167930:
Gsg1 (uniprotkb:Q8R1W2) physically interacts (MI:
13. OKKAM Project
FP7 - EU funded project, 13 partners, 2008 - 2010
- Main goal: create entity identifier repository
- Create entity-centric architecture:
- create new entities + synonyms where needed
- offer interface/link where not needed
- contain resolver, matcher, repository
- E.g. http://sig.ma - semantic mashup of entity properties
15. OKKAM Entity-centric authoring tool
Current:
- Build MS Word plugin to find proteins in FEBS articles
(i.e., semi-automating FEBS SDA!)
- Now: testing web server interface to Whatizit
Next: link to Biocreative metaserver
- Add: links to other entities via OKKAM server
- Prototype now being tested with FEBS Office
Future:
- Create unique Author ID, straddling proprietary (Scopus) and
other (ResearcherID, OpenID) systems
- Allow query and resolution of authors/papers within Word
20. OKKAM plugin vs. UCSD group Plugin
UCSD plugin Okkam plugin
Works! Can be Prototype, starting to test with
downloaded, open source FEBS authors
etc.
Offline: load ontologies Online: call out to web services
locally (soon...)
Identify classes of entities Identify specific entities (e.g., a
(e.g., ‘inhibitor’) specific protein)
Works with other version of Word
Works with Word 2007
(on a PC!)
Stores annotations as XML Store annotations as Word
metadata - maintain comments - can generate report
metadata in Word info pane of entitieslinked to identifiers
Work well together!
21. OKKAM plugin vs. UCSD group Plugin
UCSD plugin Okkam plugin
Works! Can be Prototype, starting to test with
downloaded, open source FEBS authors
etc.
Offline: load ontologies Online: call out to web services
locally (soon...)
Identify classes of entities Identify specific entities (e.g., a
(e.g., ‘inhibitor’) specific protein)
Works with other version of Word
Works with Word 2007
(on a PC!)
Stores annotations as XML Store annotations as Word
metadata - maintain comments - can generate report
metadata in Word info pane of entitieslinked to identifiers
Work well together!
23. Issue with Biocreative challenge:
From one of the documents in the training set:
- In Xenopus oocyte maturation, cytoplasmic polyadenylation mediated by cytoplasmic
polyadenylation element binding protein (CPEB) induces the translation of maternal
mRNA [5].
- In mouse testis, another novel member of the CPEB protein family (CPEB2) and a
homolog of xGLD-2 (mGLD-2) have been identified [7] and [8]
versus:
- TPAP was present in GSG1 immunoprecipitates (Fig. 2B). The in vivo data suggest that
TPAP–GSG1 interactions occur in mammalian cells.
Issue 1: what is experimentally verified content?
Issue 2: what is new?
25. Discourse analysis of biological text
- How can we identify line of argumentation in biology text?
26. Discourse analysis of biological text
- How can we identify line of argumentation in biology text?
27. Discourse analysis of biological text
- How can we identify line of argumentation in biology text?
- Liguistics says: parse into clauses (anything with a verb)
28. Discourse analysis of biological text
- How can we identify line of argumentation in biology text?
- Liguistics says: parse into clauses (anything with a verb)
- Identify Discourse Segment Purpose
29. Discourse analysis of biological text
- How can we identify line of argumentation in biology text?
- Liguistics says: parse into clauses (anything with a verb)
- Identify Discourse Segment Purpose
- Identify verb tense, type, markers for each segment type
30. Discourse analysis of biological text
- How can we identify line of argumentation in biology text?
- Liguistics says: parse into clauses (anything with a verb)
- Identify Discourse Segment Purpose
- Identify verb tense, type, markers for each segment type
31. Discourse analysis of biological text
- How can we identify line of argumentation in biology text?
- Liguistics says: parse into clauses (anything with a verb)
- Identify Discourse Segment Purpose
- Identify verb tense, type, markers for each segment type
32. 3 Realms of Science:
Conceptual
realm
Experimental
realm
Data realm
33. 3 Realms of Science:
(1) Oncogene-induced senescence is (4b) transduction with either
Conceptual characterized by the appearance of miR-Vec-371&2 or miR-Vec-
V12
cells with a flat morphology that 373 prevents RAS -
realm express senescence associated (SA)- induced growth arrest in
-Galactosid a s e . primary human cells.
(2a) Indeed, (4a) Altogether, these data
show that
Experimental
realm
V12
(2b) control RAS -arrested (3b) very few cells showed
cells showed relatively high senescent morphology when
(3a) Consistent
abundance of flat cells transduced with either miR-
with the cell
expressing SA- - Vec-371&2, miR-Vec-373, or
growth assay, kd
Galactosidase control p53 .
(2c) (Figures
2G and 2H).
Data realm
(Figures)
34. 3 Realms of Science:
(1) Oncogene-induced senescence is (4b) transduction with either
Conceptual characterized by the appearance of miR-Vec-371&2 or miR-Vec-
V12
cells with a flat morphology that 373 prevents RAS -
realm express senescence associated (SA)- induced growth arrest in
-Galactosid a s e . primary human cells.
(2a) Indeed, (4a) Altogether, these data
show that
Experimental
realm
V12
(2b) control RAS -arrested (3b) very few cells showed
cells showed relatively high senescent morphology when
(3a) Consistent
abundance of flat cells transduced with either miR-
with the cell
expressing SA- - Vec-371&2, miR-Vec-373, or
growth assay, kd
Galactosidase control p53 .
(2c) (Figures
2G and 2H).
Data realm
(Figures)
37. Hypotheses ‘erode’ into facts:
To investigate the possibility that
miR-372 and miR-373 suppress
the expression of LATS2, we...
KnownFact KnownFact
Concepts Hypothesis
38. Hypotheses ‘erode’ into facts:
To investigate the possibility that
miR-372 and miR-373 suppress
the expression of LATS2, we...
KnownFact KnownFact
Concepts Hypothesis
Goal
Method Result
Data
Experiment 1
39. Hypotheses ‘erode’ into facts:
To investigate the possibility that
miR-372 and miR-373 suppress
the expression of LATS2, we...
Therefore, these results point to
LATS2 as a mediator of the miR-372 and
miR-373 effects on cell proliferation and
tumorigenicity,
KnownFact KnownFact
Concepts Hypothesis Implication
Goal
Method Result
Data
Experiment 1
40. Hypotheses ‘erode’ into facts:
To investigate the possibility that
miR-372 and miR-373 suppress
the expression of LATS2, we...
Therefore, these results point to
LATS2 as a mediator of the miR-372 and
miR-373 effects on cell proliferation and
tumorigenicity,
KnownFact KnownFact
Concepts Hypothesis Implication
Goal
Goal
Method Result Method Result
Data Data
Experiment 1 Experiment 2
41. Hypotheses ‘erode’ into facts:
To investigate the possibility that
miR-372 and miR-373 suppress
the expression of LATS2, we...
Raver-Shapira et.al, JMolCell 2007
Therefore, these results point to two miRNAs, miRNA-372 and-373, function as
LATS2 as a mediator of the miR-372 and potential novel oncogenes in testicular germ cell
miR-373 effects on cell proliferation and tumors by inhibition of LATS2 expression,
tumorigenicity, which suggests that Lats2 is an important
tumor suppressor (Voorhoeve et al., 2006).
KnownFact KnownFact
Concepts Hypothesis Implication Fact
Goal
Goal
Method Result Method Result
Data Data
Experiment 1 Experiment 2
42. Hypotheses ‘erode’ into facts: Yabuta, JBioChem 2007
miR-372 and miR-373 target the
Lats2 tumor suppressor
To investigate the possibility that (Voorhoeve et al., 2006)
miR-372 and miR-373 suppress
the expression of LATS2, we...
Raver-Shapira et.al, JMolCell 2007
Therefore, these results point to two miRNAs, miRNA-372 and-373, function as
LATS2 as a mediator of the miR-372 and potential novel oncogenes in testicular germ cell
miR-373 effects on cell proliferation and tumors by inhibition of LATS2 expression,
tumorigenicity, which suggests that Lats2 is an important
tumor suppressor (Voorhoeve et al., 2006).
KnownFact KnownFact
Concepts Hypothesis Implication Fact
Goal
Goal
Method Result Method Result
Data Data
Experiment 1 Experiment 2
43. Hypotheses ‘erode’ Hypothesis +
into facts: Yabuta, JBioChem 2007
miR-372 and miR-373 target the
Lats2 tumor suppressor
To investigate the possibility that Evidence (Voorhoeve et al., 2006)
miR-372 and miR-373 suppress
the expression of LATS2, we...
Raver-Shapira et.al, JMolCell 2007
Therefore, these results point to two miRNAs, miRNA-372 and-373, function as
LATS2 as a mediator of the miR-372 and potential novel oncogenes in testicular germ cell
miR-373 effects on cell proliferation and tumors by inhibition of LATS2 expression,
tumorigenicity, which suggests that Lats2 is an important
tumor suppressor (Voorhoeve et al., 2006).
KnownFact KnownFact
Concepts Hypothesis Implication Fact
Goal
Goal
Method Result Method Result
Data Data
Experiment 1 Experiment 2
47. HypER Working Group:
- Goal: Align and expand existing efforts on detection and
analysis of Hypotheses, Evidence & Relationships
- Partners:
- Harvard/MGH: SWAN, ARF
- Open University: Cohere
- Oxford University: CiTO, eLearning/Rhetoric
- DERI: SALT, aTags
- University of Trento: LiquidPub
- Xerox Research: XIP hypothesis identifier
- U Tilburg: ML for Science
- Elsevier, UUtrecht: Discourse analysis of biology
48. HypER Working Group:
- Goal: Align and expand existing efforts on detection and
analysis of Hypotheses, Evidence & Relationships
- Partners:
- Harvard/MGH: SWAN, ARF
- Open University: Cohere
- Oxford University: CiTO, eLearning/Rhetoric
- DERI: SALT, aTags
- University of Trento: LiquidPub
- Xerox Research: XIP hypothesis identifier
- U Tilburg: ML for Science
- Elsevier, UUtrecht: Discourse analysis of biology
49. HypER Working Group:
- Goal: Align and expand existing efforts on detection and
analysis of Hypotheses, Evidence & Relationships
- Partners:
- Harvard/MGH: SWAN, ARF
- Open University: Cohere
- Oxford University: CiTO, eLearning/Rhetoric
- DERI: SALT, aTags
- University of Trento: LiquidPub
- Xerox Research: XIP hypothesis identifier
- U Tilburg: ML for Science
- Elsevier, UUtrecht: Discourse analysis of biology
50. HypER Working Group:
- Goal: Align and expand existing efforts on detection and
analysis of Hypotheses, Evidence & Relationships
- Partners:
- Harvard/MGH: SWAN, ARF
- Open University: Cohere
- Oxford University: CiTO, eLearning/Rhetoric
- DERI: SALT, aTags
- University of Trento: LiquidPub
- Xerox Research: XIP hypothesis identifier
- U Tilburg: ML for Science
- Elsevier, UUtrecht: Discourse analysis of biology
51. HypER Working Group:
- Goal: Align and expand existing efforts on detection and
analysis of Hypotheses, Evidence & Relationships
- Partners:
- Harvard/MGH: SWAN, ARF
- Open University: Cohere
- Oxford University: CiTO, eLearning/Rhetoric
- DERI: SALT, aTags
- University of Trento: LiquidPub
- Xerox Research: XIP hypothesis identifier
- U Tilburg: ML for Science
- Elsevier, UUtrecht: Discourse analysis of biology
52. HypER Working Group:
- Goal: Align and expand existing efforts on detection and
analysis of Hypotheses, Evidence & Relationships
- Partners:
- Harvard/MGH: SWAN, ARF
- Open University: Cohere
- Oxford University: CiTO, eLearning/Rhetoric
- DERI: SALT, aTags
- University of Trento: LiquidPub
- Xerox Research: XIP hypothesis identifier
- U Tilburg: ML for Science
- Elsevier, UUtrecht: Discourse analysis of biology
53. HypER Working Group:
- Goal: Align and expand existing efforts on detection and
analysis of Hypotheses, Evidence & Relationships
- Partners:
- Harvard/MGH: SWAN, ARF
- Open University: Cohere
- Oxford University: CiTO, eLearning/Rhetoric
- DERI: SALT, aTags
- University of Trento: LiquidPub
- Xerox Research: XIP hypothesisHypothesis 22: Intramembrenous Aβ dimer may be toxic
identifier
- U Tilburg: ML for Science Derived from: POSTAT_CONTRIBUTION(This essay explo
possibility that a fraction of these Abeta peptides never leav
- Elsevier, UUtrecht: Discourse analysis of biology
54. HypER Activities: http://hyper.wik.is
Current activities:
- Aligning discourse ontologies: joint task with W3C HCLSSig
- Aligning architectures to exchange hypotheses + evidence
- Format for a rhetorical conference paper (SALT + abcde)
- Parser comparison of hypothesis identification tools
- With NIF+SWAN/SCF: Structured Rhetorical Abstracts for
Neuroscience publications
Further interests:
- Better structure of evidence: MyExperiment, KeFeD, ...
- Granularity of annotation/access: entity, hypothesis,
discussion?
- Where/how annotate: http://saakm2009.semanticauthoring.org/
56. Scope: Tools and processes to:
- Improve the process of creating, reviewing and editing
scientific content
- Interpret, visualize or connect science knowledge
- Provide tools/ideas for measuring the impact of these
improvements.
June 2008: 71 Submissions from 15 countries.
August 2008: 10 Semi-finalists teams, access to:
- 500,000 full text articles
- Plus EMTREE, EmBase, Scopus
- Created tool/demo
- Presented to the Judges
- Wrote a paper (accepted for JWeb Semantics)
April 2009: Judges selected 4 Finalist teams.
And the winners are:
57. Scope: Tools and processes to:
- Improve the process of creating, reviewing and editing
scientific content 2 Related Work
- Interpret, visualize or connect science knowledge Using link text for summarisation has been explo
- Provide tools/ideas for measuring the impact of these
previously by Amitay and Paris (2000). They ide
improvements. fied situations when it was possible to generate su
June 2008: 71 Submissions from 15 countries. maries of web-pages by recycling human-autho
descriptions of links from anchor text. In our wo
August 2008: 10 Semi-finalists teams, access to:
we use the anchor text as the reading context to p
- 500,000 full text articles vide an elaborative summary for the linked do
- Plus EMTREE, EmBase, Scopus ment.
Our work is similar in domain to that of the 2
- Created tool/demo
CLEF WiQA shared task.4 However, in contras
- Presented to the Judges our application scenario, the end goal of the sha
- Wrote a paper (accepted for JWeb Semantics) task focuses on suggesting editing updates fo
April 2009: Judges selected 4 Finalist teams. particular document and not on elaborating on
user’s reading context.
And the winners are:
A related task was explored at the Document
58. Scope: Tools and processes to:
- Improve the process of creating, reviewing and editing
scientific content 2 Related Work
- Interpret, visualize or connect science knowledge Using link text for summarisation has been explo
- Provide tools/ideas for measuring the impact of these
previously by Amitay and Paris (2000). They ide
improvements. fied situations when it was possible to generate su
June 2008: 71 Submissions from 15 countries. maries of web-pages by recycling human-autho
descriptions of links from anchor text. In our wo
August 2008: 10 Semi-finalists teams, access to:
we use the anchor text as the reading context to p
- 500,000 full text articles vide an elaborative summary for the linked do
- Plus EMTREE, EmBase, Scopus ment.
Our work is similar in domain to that of the 2
- Created tool/demo
CLEF WiQA shared task.4 However, in contras
- Presented to the Judges our application scenario, the end goal of the sha
- Wrote a paper (accepted for JWeb Semantics) task focuses on suggesting editing updates fo
April 2009: Judges selected 4 Finalist teams. particular document and not on elaborating on
user’s reading context.
And the winners are:
A related task was explored at the Document
59. Scope: Tools and processes to:
- Improve the process of creating, reviewing and editing
scientific content 2 Related Work
- Interpret, visualize or connect science knowledge Using link text for summarisation has been explo
- Provide tools/ideas for measuring the impact of these
previously by Amitay and Paris (2000). They ide
improvements. fied situations when it was possible to generate su
June 2008: 71 Submissions from 15 countries. maries of web-pages by recycling human-autho
descriptions of links from anchor text. In our wo
August 2008: 10 Semi-finalists teams, access to:
we use the anchor text as the reading context to p
- 500,000 full text articles vide an elaborative summary for the linked do
- Plus EMTREE, EmBase, Scopus ment.
Our work is similar in domain to that of the 2
- Created tool/demo
CLEF WiQA shared task.4 However, in contras
- Presented to the Judges our application scenario, the end goal of the sha
- Wrote a paper (accepted for JWeb Semantics) task focuses on suggesting editing updates fo
April 2009: Judges selected 4 Finalist teams. particular document and not on elaborating on
user’s reading context.
And the winners are:
A related task was explored at the Document
60. Scope: Tools and processes to:
- Improve the process of creating, reviewing and editing
scientific content 2 Related Work
- Interpret, visualize or connect science knowledge Using link text for summarisation has been explo
- Provide tools/ideas for measuring the impact of these
previously by Amitay and Paris (2000). They ide
improvements. fied situations when it was possible to generate su
June 2008: 71 Submissions from 15 countries. maries of web-pages by recycling human-autho
descriptions of links from anchor text. In our wo
August 2008: 10 Semi-finalists teams, access to:
we use the anchor text as the reading context to p
- 500,000 full text articles vide an elaborative summary for the linked do
- Plus EMTREE, EmBase, Scopus ment.
Our work is similar in domain to that of the 2
- Created tool/demo
CLEF WiQA shared task.4 However, in contras
- Presented to the Judges our application scenario, the end goal of the sha
- Wrote a paper (accepted for JWeb Semantics) task focuses on suggesting editing updates fo
April 2009: Judges selected 4 Finalist teams. particular document and not on elaborating on
user’s reading context.
And the winners are:
A related task was explored at the Document
61. Harvard, Fall 2010
FoRC: The Future of Research Communication
Improve ways to:
- create, review, edit scientific content
- interpret, visualize, connect scientific knowledge
- measure the impact of these improvements
- present new paradigms in publishing
- serve global academic knowledge communities
- communicate between collaboratories
- share research data
- support complex knowledge ecosystems.
62. Harvard, Fall 2010
FoRC: The Future of Research Communication
Improve ways to:
- create, review, edit scientific content
- interpret, visualize, connect scientific knowledge
- measure the impact of these improvements
- present new paradigms in publishing
- serve global academic knowledge communities
- communicate between collaboratories
- share research data
- support complex knowledge ecosystems.
New ways of communicating science requires new ways of communicating
Conference registrants form a community:
- Website is their meeting place, continues as long as users want to
- The physical conference is synchronous manifestation of the community
- Others can participate through a virtual environment at other time/place
63. Acknowledgments HypER:
- Funding: - Harvard/MGH:
FP7-Cordis Tim Clark
NWO Casimir Project Elizabeth Wu
- UUtrecht: - DERI:
Leen Breure, Siegfried Handschuh
Tudor Groza
Henk Pander Maat
Matthias Samwald
Ted Sanders
Paul Buitelaar
- ISI:
- Xerox Research:
Gully Burns Agnes Sandor
Ed Hovy
- Open University:
- Elsevier: Simon Buckingham Shum
David Marques David Shotton
Darin McBeath Jack Park
Stefano Bocconi Clara Mancini
Adriaan Klinkenberg
Emilie Marcus - Oxford University:
Annamaria Carusi
- Challenge judges and participants: David Shotton
EMBL Team, CMU Team
David Shotton - Tilburg U:
Piroska Lendvai
Alfonso Valencia