Exploring the Future Potential of AI-Enabled Smartphone Processors
What’s wrong with research papers - and (how) can we fix it?
1. Whatʼs wrong with
research papers -
and (how) can we fix it?
Anita de Waard
Disruptive Technologies Director
Elsevier Labs
a.dewaard@elsevier.com
http://elsatglabs.com/labs/anita
7. To address this problem, we make:
• databases
• text mining tools
• nanopublications
• data publications
• wiki publications
• ontologies; ontology integration tools
• workflow/data integration systems
• executable components
• ....and write emails/grants/papers/blogs about this...
• ... and we end up with:
3
8. To address this problem, we make:
• databases
• text mining tools
• nanopublications
• data publications
• wiki publications
• ontologies; ontology integration tools
• workflow/data integration systems
• executable components
• ....and write emails/grants/papers/blogs about this...
• ... and we end up with:
1)" Even more papers!!
2)" Even less time to read them!! 3
10. What problems are we solving?
• Weʼre mostly improving the format of the research article.
4
11. What problems are we solving?
• Weʼre mostly improving the format of the research article.
• This talk: aspects of the format that are being improved
(and some examples of work to improve them):
A.Issues with the paper format
B.Issues pertaining to habits of writing
C.Issues inherent to language
D.Issues in trying to create connected content
4
12. What problems are we solving?
• Weʼre mostly improving the format of the research article.
• This talk: aspects of the format that are being improved
(and some examples of work to improve them):
A.Issues with the paper format
B.Issues pertaining to habits of writing
C.Issues inherent to language
D.Issues in trying to create connected content
• Do any of these address the Big Problem?
4
13. What problems are we solving?
• Weʼre mostly improving the format of the research article.
• This talk: aspects of the format that are being improved
(and some examples of work to improve them):
A.Issues with the paper format
B.Issues pertaining to habits of writing
C.Issues inherent to language
D.Issues in trying to create connected content
• Do any of these address the Big Problem?
• What shall we do about it?
4
20. A1: Issue: paper is two-dimensional
• Some experiments: allow representations of interactive
figures (Wolfram Alpha), Utopia, Chem-3d
6
21. A1: Issue: paper is two-dimensional
• Some experiments: allow representations of interactive
figures (Wolfram Alpha), Utopia, Chem-3d
• Lack of experimentation with formats: the internet is
multi-dimensional, so why do we still need page limits?
6
22. A1: Issue: paper is two-dimensional
• Some experiments: allow representations of interactive
figures (Wolfram Alpha), Utopia, Chem-3d
• Lack of experimentation with formats: the internet is
multi-dimensional, so why do we still need page limits?
6
24. A2: Issue: paper is linear
• Read from front to back (although research
suggests a quick skim to core parts, but
linearity helps us do that)
7
25. A2: Issue: paper is linear
• Read from front to back (although research
suggests a quick skim to core parts, but
linearity helps us do that)
• References are at the end, so your reading is
not interrupted
7
26. A2: Issue: paper is linear
• Read from front to back (although research
suggests a quick skim to core parts, but
linearity helps us do that)
• References are at the end, so your reading is
not interrupted
• Headers are sequential - and not directly
accessible
7
29. A2: (Old) Experiment: ABCDE
• LaTeX Stylesheet:
–Annotation
–Background
–Contribution
–Discussion
–Entities (references, projects,
terms in ontologies, etc) in RDF
–Core sentences create structured
abstract
• E.g. in proceedings: collect all core Contribution
components
8
30. A2: (Old) Experiment: ABCDE
• LaTeX Stylesheet:
–Annotation
–Background
–Contribution
–Discussion
–Entities (references, projects,
terms in ontologies, etc) in RDF
–Core sentences create structured
abstract
• E.g. in proceedings: collect all core Contribution
components
• I still have the stylesheets, if anyone’s interested :-)!
8
32. A3: Paper is not interactive
• Experiment:
Executable papers:
–Run code within a paper
–Experiments: R, SPSS,
Vistrails
–Rerender code within a
paper, change algorithm/see effect;
run different dataset
–How do you archive software?
Satyanarayanan at CMU: Olive, ‘Internet ecosystem
of curated virtual machine image collections’
9
38. B1: Citations create facts:
- Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK
inhibition, possibly through direct inhibition of the expression of the
tumorsuppressor LATS2.”
11
39. B1: Citations create facts:
- Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK
inhibition, possibly through direct inhibition of the expression of the
tumorsuppressor LATS2.”
- Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and
miR-373 were found to allow proliferation of primary human cells
that express oncogenic RAS and active p53, possibly by inhibiting
the tumor suppressor LATS2 (Voorhoeve et al., 2006).”
11
40. B1: Citations create facts:
- Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK
inhibition, possibly through direct inhibition of the expression of the
tumorsuppressor LATS2.”
- Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and
miR-373 were found to allow proliferation of primary human cells
that express oncogenic RAS and active p53, possibly by inhibiting
the tumor suppressor LATS2 (Voorhoeve et al., 2006).”
- Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372
and-373, function as potential novel oncogenes in testicular germ
cell tumors by inhibition of LATS2 expression, which suggests
that Lats2 is an important tumor suppressor (Voorhoeve et al.,
2006).”
11
41. B1: Citations create facts:
- Voorhoeve, 2006: “These miRNAs neutralize p53- mediated CDK
inhibition, possibly through direct inhibition of the expression of the
tumorsuppressor LATS2.”
- Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and
miR-373 were found to allow proliferation of primary human cells
that express oncogenic RAS and active p53, possibly by inhibiting
the tumor suppressor LATS2 (Voorhoeve et al., 2006).”
- Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372
and-373, function as potential novel oncogenes in testicular germ
cell tumors by inhibition of LATS2 expression, which suggests
that Lats2 is an important tumor suppressor (Voorhoeve et al.,
2006).”
- Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and
miR-373, directly inhibit the expression of Lats2, thereby allowing
tumorigenic growth in the presence of p53 (Voorhoeve et al.,
2006).”
11
43. B1: TAC2012: Add authorʼs text to citation
Voorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 and
miRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181
Citing goal: “To perform genetic screens for novel functions of miRNAs,”
− in order to identify miRNAs functionally associated with carcinogenesis
− to identify miRNAs that when overexpressed could substitute for p53 loss and allow
continued proliferation in the context of Ras activation
12
44. B1: TAC2012: Add authorʼs text to citation
Voorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 and
miRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181
Citing goal: “To perform genetic screens for novel functions of miRNAs,”
− in order to identify miRNAs functionally associated with carcinogenesis
− to identify miRNAs that when overexpressed could substitute for p53 loss and allow
continued proliferation in the context of Ras activation
Citing method: “We subsequently created a human miRNA expression library (miR-Lib) by
cloning almost all annotated human miRNAs into our vector (Rfam release 6) (Figure S3).”
− Voorhoeve et al. (116) employed a novel strategy by combining an miRNA vector library
and corresponding bar code array
− using a retroviral expression library of miRNAs,
− Using a novel retroviral miRNA expression library, Agami and co-workers performed a
cell-based screen
12
45. B1: TAC2012: Add authorʼs text to citation
Voorhoeve, P. M.; le Sage, C et al. (2006). A Genetic Screen Implicates miRNA-372 and
miRNA-373 As Oncogenes in Testicular Germ Cell Tumors, Cell 124 (6) pp.1169 - 1181
Citing goal: “To perform genetic screens for novel functions of miRNAs,”
− in order to identify miRNAs functionally associated with carcinogenesis
− to identify miRNAs that when overexpressed could substitute for p53 loss and allow
continued proliferation in the context of Ras activation
Citing method: “We subsequently created a human miRNA expression library (miR-Lib) by
cloning almost all annotated human miRNAs into our vector (Rfam release 6) (Figure S3).”
− Voorhoeve et al. (116) employed a novel strategy by combining an miRNA vector library
and corresponding bar code array
− using a retroviral expression library of miRNAs,
− Using a novel retroviral miRNA expression library, Agami and co-workers performed a
cell-based screen
Citing result: “we identified miR-372-373, each permitting proliferation and tumorigenesis of
primary human cells that harbor both oncogenic RAS and active wildtype p53.”
− miR-372 and miR-373 were consequently found to permit proliferation and tumorigenesis
of these primary cells carrying both oncogenic RAS and wild-type p53,
− Voorhoeve et al. (2006) identified miR-372 and miR-373
− miR-372 and miR-373 were found to allow proliferation of primary human cells that
express oncogenic RAS and active p53, 12
46. B2: Issue: entities in papers are not exact
• Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and
from subjects with AD (n11) were obtained from the Rapid Autopsy Program
• Immunoblot analysis and antibodies
• The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution,
Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000,
Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8
mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau
Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356,
1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000,
Santa Cruz Biotechnology)…
Maryann Martone, Jan 2012:
2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
47. B2: Issue: entities in papers are not exact
• Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and
from subjects with AD (n11) were obtained from the Rapid Autopsy Program
• Immunoblot analysis and antibodies
• The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution,
Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000,
Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8
mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau
Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356,
1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000,
Santa Cruz Biotechnology)…
Maryann Martone, Jan 2012:
2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
48. B2: Issue: entities in papers are not exact
• Midfrontal cortex tissue samples from neurologically unimpaired subjects (n9) and
from subjects with AD (n11) were obtained from the Rapid Autopsy Program
• Immunoblot analysis and antibodies
• The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution,
Sigma-Aldrich); -tubulin mAb (1:10,000, Abcam); T46 mAb (specific to tau 404–441, 1:1000,
Invitrogen); Tau-5 mAb (human tau 218–225, 1:1000, BD Biosciences) (Porzig et al., 2007); AT8
mAb (phospho-tau Ser199, Ser202, and Thr205, 1:500, Innogenetics); PHF-1 mAb (phospho-tau
Ser396 and Ser404, 1:250, gift from P. Davies); 12E8 mAb (phospho-tau Ser262 and Ser356,
1:1000, gift from P. Seubert); NMDA receptors 2A, 2B and 2D goat pAbs (C terminus, 1:1000,
•95 antibodies were identified in 8 articles
Santa Cruz Biotechnology)…
•52 did not contain enough information to
determine the antibody used
Maryann Martone, Jan 2012:
2012 ACM SIGHIT International Health Informatics Symposium (IHI 2012)
50. B3: Issue: methods are written post-mortem
• Yolanda Gil at ISI modeled Bourne et al. paper in Wings
14
51. B3: Issue: methods are written post-mortem
• Yolanda Gil at ISI modeled Bourne et al. paper in Wings
• Anecdotal evidence: Phil Bourne couldn’t remember most
of this, even after digging through emails!
14
52. B3: So why not write the data first and
wrap the paper around it??
53. B3: So why not write the data first and
wrap the paper around it??
metadata
1. Research: Each item in the system has metadata (including
metadata provenance) and relations to other data items added to it.
metadata
metadata
metadata
54. B3: So why not write the data first and
wrap the paper around it??
metadata
1. Research: Each item in the system has metadata (including
metadata provenance) and relations to other data items added to it.
2. Workflow: All data items created in the lab are added to a
metadata
(lab-owned) workflow system.
metadata
metadata
55. B3: So why not write the data first and
wrap the paper around it??
metadata
1. Research: Each item in the system has metadata (including
metadata provenance) and relations to other data items added to it.
2. Workflow: All data items created in the lab are added to a
metadata
(lab-owned) workflow system.
3. Authoring: A paper is written in an authoring tool which can pull
data with provenance from the workflow tool in the appropriate
representation into the document.
metadata
metadata
Rats
were
subjected
to
two
grueling
tests
(click
on
fig
2
to
see
underlying
data).
These
results
suggest
that
the
neurological
pain
pro-‐
56. B3: So why not write the data first and
wrap the paper around it??
metadata
1. Research: Each item in the system has metadata (including
metadata provenance) and relations to other data items added to it.
2. Workflow: All data items created in the lab are added to a
metadata
(lab-owned) workflow system.
3. Authoring: A paper is written in an authoring tool which can pull
data with provenance from the workflow tool in the appropriate
representation into the document.
metadata 4. Editing and review: Once the co-authors agree, the paper is
‘exposed’ to the editors, who in turn expose it to reviewers.
metadata Reports are stored in the authoring/editing system, the paper gets
updated, until it is validated.
Rats
were
subjected
to
two
grueling
tests
(click
on
fig
2
to
see
underlying
data).
These
results
suggest
that
the
neurological
pain
pro-‐
Review
Revise
Edit
57. B3: So why not write the data first and
wrap the paper around it??
metadata
1. Research: Each item in the system has metadata (including
metadata provenance) and relations to other data items added to it.
2. Workflow: All data items created in the lab are added to a
metadata
(lab-owned) workflow system.
3. Authoring: A paper is written in an authoring tool which can pull
data with provenance from the workflow tool in the appropriate
representation into the document.
metadata 4. Editing and review: Once the co-authors agree, the paper is
‘exposed’ to the editors, who in turn expose it to reviewers.
metadata Reports are stored in the authoring/editing system, the paper gets
updated, until it is validated.
5. Publishing and distribution: When a paper is published, a
collection of validated information is exposed to the world. It
remains connected to its related data item, and its heritage can
Rats
were
subjected
to
two
grueling
be traced.
tests
(click
on
fig
2
to
see
underlying
data).
These
results
suggest
that
the
neurological
pain
pro-‐
Review
Revise
Edit
58. B3: So why not write the data first and
wrap the paper around it??
metadata
1. Research: Each item in the system has metadata (including
metadata provenance) and relations to other data items added to it.
2. Workflow: All data items created in the lab are added to a
metadata
(lab-owned) workflow system.
3. Authoring: A paper is written in an authoring tool which can pull
data with provenance from the workflow tool in the appropriate
representation into the document.
metadata 4. Editing and review: Once the co-authors agree, the paper is
‘exposed’ to the editors, who in turn expose it to reviewers.
metadata Reports are stored in the authoring/editing system, the paper gets
updated, until it is validated.
5. Publishing and distribution: When a paper is published, a
collection of validated information is exposed to the world. It
remains connected to its related data item, and its heritage can
Rats
were
subjected
to
two
grueling
be traced.
tests
(click
on
fig
2
to
see
underlying
data).
6. User applications: distributed applications run on this
These
results
suggest
that
the
‘exposed data’ universe.
neurological
pain
pro-‐
Some
other
publisher
Review
Revise
Edit
62. C. Issue: language
C1:" Language is coherent
C2:" Language is narrative
C3:" Language is abstract
16
63. C. Issue: language
C1:" Language is coherent
C2:" Language is narrative
C3:" Language is abstract
16
64. C1: Language is coherent:
Adding drug-drug interactions to DIKB
17
65. C1: Language is coherent:
Adding drug-drug interactions to DIKB
• Drug-Interaction Knowledge Base:
Clinically-oriented, evidence-based knowledge base
designed to support adding data to product inserts
17
66. C1: Language is coherent:
Adding drug-drug interactions to DIKB
• Drug-Interaction Knowledge Base:
Clinically-oriented, evidence-based knowledge base
designed to support adding data to product inserts
• Contains quantitative and qualitative assertions about drug
mechanisms and pharmacokinetic drug-drug interactions
(DDI) for over 60 drugs
17
67. C1: Language is coherent:
Adding drug-drug interactions to DIKB
• Drug-Interaction Knowledge Base:
Clinically-oriented, evidence-based knowledge base
designed to support adding data to product inserts
• Contains quantitative and qualitative assertions about drug
mechanisms and pharmacokinetic drug-drug interactions
(DDI) for over 60 drugs
• HCLS Sig: Currently working on expanding the DIKB with
more content and making a “mash‐up” view of package
inserts adding up‐to‐date information
View project: http://dbmi-icode-01.dbmi.pitt.edu/dikb-evidence/front-page.html
SPARQL endpoint: http://dbmi-icode-01.dbmi.pitt.edu:2020/directory/Drugs
17
69. C1: Coherent language is hard to parse
• Self-reference:
R-CT and its metabolites, studied using the same procedures, had
properties very similar to those of the corresponding S-enantiomers.
18
70. C1: Coherent language is hard to parse
• Self-reference:
R-CT and its metabolites, studied using the same procedures, had
properties very similar to those of the corresponding S-enantiomers.
18
71. C1: Coherent language is hard to parse
• Self-reference:
R-CT and its metabolites, studied using the same procedures, had
properties very similar to those of the corresponding S-enantiomers.
• Reference to external data sources:
Average relative in vivo abundances equivalent to the relative activity
factors, were estimated using methods described in detail previously
(Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001;
von Moltke et al., 1999 a,b; Störmer et al., 2000).
18
72. C1: Coherent language is hard to parse
• Self-reference:
R-CT and its metabolites, studied using the same procedures, had
properties very similar to those of the corresponding S-enantiomers.
• Reference to external data sources:
Average relative in vivo abundances equivalent to the relative activity
factors, were estimated using methods described in detail previously
(Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001;
von Moltke et al., 1999 a,b; Störmer et al., 2000).
18
73. C1: Coherent language is hard to parse
• Self-reference:
R-CT and its metabolites, studied using the same procedures, had
properties very similar to those of the corresponding S-enantiomers.
• Reference to external data sources:
Average relative in vivo abundances equivalent to the relative activity
factors, were estimated using methods described in detail previously
(Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001;
von Moltke et al., 1999 a,b; Störmer et al., 2000).
18
74. C1: Coherent language is hard to parse
• Self-reference:
R-CT and its metabolites, studied using the same procedures, had
properties very similar to those of the corresponding S-enantiomers.
• Reference to external data sources:
Average relative in vivo abundances equivalent to the relative activity
factors, were estimated using methods described in detail previously
(Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001;
von Moltke et al., 1999 a,b; Störmer et al., 2000).
• Ways of describing meant for human eyes
Based on established index reactions, S-CT and S-DCT were negligible
inhibitors (IC50> 100 µM) of CYP1A2, -2C9, -2C19, -2E1, and -3A, and
weakly inhibited CYP2D6 (IC50 = 70–80 µM)
18
75. C1: Coherent language is hard to parse
• Self-reference:
R-CT and its metabolites, studied using the same procedures, had
properties very similar to those of the corresponding S-enantiomers.
• Reference to external data sources:
Average relative in vivo abundances equivalent to the relative activity
factors, were estimated using methods described in detail previously
(Crespi, 1995; Venkatakrishnan et al., 1998 a,c, 1999, 2000, 2001;
von Moltke et al., 1999 a,b; Störmer et al., 2000).
• Ways of describing meant for human eyes
Based on established index reactions, S-CT and S-DCT were negligible
inhibitors (IC50> 100 µM) of CYP1A2, -2C9, -2C19, -2E1, and -3A, and
weakly inhibited CYP2D6 (IC50 = 70–80 µM)
• Many statements wrapped into one:
S-CT was transformed to S-DCT by CYP2C19 (Km = 69 µM), CYP2D6 (Km
= 29 µM), and CYP3A4 (Km = 588 µM).
18
77. C2: Issue: Language is narrative
• ‘The truth can only be told in stories’
19
78. C2: Issue: Language is narrative
• ‘The truth can only be told in stories’
• Complex knowledge such as scientific theories,
findings, conclusions have a narrative/rhetorical
structure
19
79. C2: Issue: Language is narrative
• ‘The truth can only be told in stories’
• Complex knowledge such as scientific theories,
findings, conclusions have a narrative/rhetorical
structure
• Typical pattern: claim/hypothesis, discussion of
experimental findings, recap of claim, rebuttals,
recap of claim
19
80. C2: Issue: Language is narrative
• ‘The truth can only be told in stories’
• Complex knowledge such as scientific theories,
findings, conclusions have a narrative/rhetorical
structure
• Typical pattern: claim/hypothesis, discussion of
experimental findings, recap of claim, rebuttals,
recap of claim
• Roughly the same claim appears 4 or 5 times in a
paper
19
83. C3: Issue: Language is abstract
“These results are consistent with those obtained by RPA
and demonstrate that AhR ligands suppress IL-6 mRNA levels
by approximately 40–60%.”
“Data presented in Figure 5A extend previous studies
performed with monocytes by demonstrating that
LPS induces NF-κB-DNA binding in bone marrow stromal cells.”
“An added incentive for these studies was provided by the
observation that the IL-6 gene promoter contains an NF-κB
binding site which plays a major role in regulating LPS-induced
IL-6 transcription [55-57].”
• Purple = deictic/anaphoric markers, pointing to current text
• Blue = metalanguage/epistemic evaluation
• Green = experimental method
• Red = conceptual claim
• Orange = claim referred to in other work
21
84. C3: Formal Language:
Biological Exchange Language
In a screen for miRNAs that cooperate with oncogenes in cellular transformation,
we identified miR-372 and miR-373, each permitting proliferation and tumorigenesis
of primary human cells that harbor both oncogenic RAS and active wild-type p53.
Increased abundance of miR-372 increases cell proliferation
r(MIR:miR-372) -| bp(GO:”Cell Proliferation”))
Increased abundance of miR-372 increases tumorgenesis
r(MIR:miR-372) -| bp(GO:Tumorgenesis))
We provide evidence that these miRNAs are potential novel oncogenes
participating in the development of human testicular germ cell tumors by numbing
the p53 pathway, thus allowing tumorigenic growth in the presence of wild-type p53.
Increased abundance of miR-372 decreases activity of TP53
r(MIR:miR-372) -| tscript(p(HUGO:Trp53))
Context: cancer
Activity of TP53 decreases cell growth
SET Disease = “Cancer”
tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”
22
86. C3: Experiment: add epistemic evaluation/
knowledge attribution to BEL
For a Proposition P, an epistemically marked clause E is an
Evaluation of P, EV, B, S(P), with:
- V = Value:
3 = Assumed true, 2 = Probable, 1 = Possible,
0 = Unknown,
(- 1= possibly untrue, - 2 = probably untrue, -3 = assumed
untrue)
- B = Basis:
Reasoning
Data
- S = Source:
A = speaker is author A, explicit
IA = speaker author, A, implicit
N = other author N, explicit
NN = other author NN, implicit
92. D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for:
Google Scholar, Google,..., Scopus... (in that order?)
25
93. D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for:
Google Scholar, Google,..., Scopus... (in that order?)
• But it is very hard to find if something was done about a
certain topic (e.g. ‘citances’)
25
94. D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for:
Google Scholar, Google,..., Scopus... (in that order?)
• But it is very hard to find if something was done about a
certain topic (e.g. ‘citances’)
• And it’s impossible to know if nothing was done on a
topic
25
95. D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for:
Google Scholar, Google,..., Scopus... (in that order?)
• But it is very hard to find if something was done about a
certain topic (e.g. ‘citances’)
• And it’s impossible to know if nothing was done on a
topic
• Why aren’t more people working on this?
25
96. D1: Searching collections of papers
• It is relatively easy to find a paper you are looking for:
Google Scholar, Google,..., Scopus... (in that order?)
• But it is very hard to find if something was done about a
certain topic (e.g. ‘citances’)
• And it’s impossible to know if nothing was done on a
topic
• Why aren’t more people working on this?
• What happened to the semantic desktop??
25
98. D2: How do we connect papers?
• Papers exist within a con-text: preceding knowledge,
succeeding knowledge, knowledge in your head or on
your computer
26
99. D2: How do we connect papers?
• Papers exist within a con-text: preceding knowledge,
succeeding knowledge, knowledge in your head or on
your computer
• How can we annotate these relations, maintain
connections, explore ones that others have made?
26
105. D3: Tracing the heritage of a statement
• On paper, you can’t see whether a claim or a
recommendation is valid
28
106. D3: Tracing the heritage of a statement
• On paper, you can’t see whether a claim or a
recommendation is valid
• E.g. required to check for clinical
recommendations:
–Is this statistically valid?
–Was it shown for my patient?
–Are there other things I need to know (side effects,
funding, etc)
28
107. D3: Experiment:
Linking Clinical Guidelines to Evidence
B.
Elsevier-‐published
.
Philips’
Electronic
PaNent
Records
Clinical
Guideline
C.
Elsevier
(or
other
publisher’s)
29
Research
Report
or
Data
108. D3: Experiment:
Linking Clinical Guidelines to Evidence
Step
1:
PaNent
data
+
diagnosis
link
to
Guideline
recommendaNon
B.
Elsevier-‐published
.
Philips’
Electronic
PaNent
Records
Clinical
Guideline
C.
Elsevier
(or
other
publisher’s)
29
Research
Report
or
Data
109. D3: Experiment:
Linking Clinical Guidelines to Evidence
Step
1:
PaNent
data
+
diagnosis
link
to
Guideline
recommendaNon
B.
Elsevier-‐published
.
Philips’
Electronic
PaNent
Records
Clinical
Guideline
Step
2:
Guideline
recommendaNon
links
to
research
report/data
C.
Elsevier
(or
other
publisher’s)
29
Research
Report
or
Data
110. D3: The reality of linking evidence:
Recommenda)on
in
Guideline Level Evidence
(in
the
text) Ref Recommenda)on
in
Reference
5.1.
Laboratory
tests
should
A-‐III No
evidence
in
text No
reference
include
a
CBC
count
with
differenNal
leukocyte
count
and
platelet
count;
5.2.
measurement
of
serum
levels
A-‐III CBC
counts
and
determinaNon
of
the
No
reference
of
creaNnine
and
blood
urea
levels
of
serum
creaNnine
and
urea
nitrogen;
nitrogen
are
needed
to
plan
supporNve
care
and
to
monitor
for
the
possible
occurrence
of
drug
toxicity.
5.3.
and
measurement
of
A-‐III No
evidence
in
text No
reference
electrolytes,
hepaNc
transaminase
enzymes,
and
total
bilirubin
(A-‐III).
Not
menNoned:
The
total
volume
of
blood
cultured
is
a
[47] Our
data,
together
with
an
GET
ENOUGH
BLOOD,
IN
TWO
crucial
determinant
of
detecNng
a
analysis
of
previous
studies,
SEPARATE
BOTTLES
bloodstream
infecNon
[47]. show
that
the
yield
of
blood
cultures
in
adults
increases
(a
‘‘set’’
consists
of
1
venipuncture
or
approximately
3%
per
millilitre
of
catheter
access
draw
of
20
mL
of
blood
blood
cultured.
divided
into
1
aerobic
and
1
anaerobic
blood
culture
bogle).
Not
menNoned:
REPEAT
TESTS These
tests
should
be
done
at
least
every
3
days
during
the
course
of
intensive
anNbioNc
therapy.
At
least
weekly
monitoring
of
serum
transaminase
levels
is
advisable
for
30
111. In summary:
Type Problems Experiments Issues
A. Paper format:
A1 Two-dimensional Utopia, Wolfram CDF Standards, tools
A2 Linear ABCDE Adoption?
A3 Not interactive Executable papers Adoption
B. Writing habits
B1 Reference to papers TAC: CItance summaries Need to start at author
B2 Inexact entity references NIF antibodies Need mandate!
B3 Methods post-mortem Data-centric publishing Change research recording!
C. Language:
C1 Coherent DIKB Hard to parse!
C2 Narrative CKUs Fractal nature of paper
C3 Abstract BEL Formalize knowledge level
D. Collections of papers:
D1 Can’t find Scientific search engines? Is anyone working on this?
D2 Can’t compare DOMEO/SWAN Manual, doesn’t scale
D3 Can’t combine Evidence-based guidelines Inconsistencies! 31
113. Have we solved the Big Problem?
1) Too many papers?
• Do not make publication numbers factor in evaluation
• Do not make conference attendance contingent on publication
• Write fewer papers! Limit yourself to write only what is
significant and profound (and entertaining!)
2)! Too little time to read?
• Collectively: change expectation of work in a day
• Make grant process less of a waste of time and talent
• Reduce burden of administration on (senior) scientists: reinstate
departmental administrators!
• Teach administration as a class: Lethbridge journal incubator
• Make time to read some new (or old!) interesting work!
32
114. So how do we tackle all this?
• DERI-Elsevier collaboration - define research projects?
• Perhaps under aegis of Force11?
• Dagstuhl Workshop in August of 2011: 35 invited
attendees from different parts of science, industry,
funding agencies, data centers
• Goal: map main obstacles preventing new models
of science publishing and develop ways to
overcome them
• Just received funding from Sloan foundation to:
–Start online community
–Hold next workshop
–Collaboratively work on next steps
• Any thoughts?
33
115. Acknowledgements/collaborations:
1.Executable papers: Juliana Freire, NYU & Matthias Troyer, ETH Zurich
(Vistrails); Micah Altman, Harvard SQSS (R), Gloriana St. Claire &
Mahadev Satyanarayanan, CMU (Olive) (pending IMLS grant)
2.Citance summaries: Lucy Vanderwende, Microsoft Research; Hoa
Trang, NIST; Eduard Hovy, ISI/USC
3.NIF antibodies: Maryann Martone, NIF/UCSD
4.Data-centric publishing: Phil Bourne, UCSD, Yolanda Gil, ISI/USC
(funded in part by Elsevier Labs)
5.DIKB: Rich Boyce, U Pittsburgh, Jodi Schneider, DERI, Maria Liakata,
EBI (looking for funding opportunities!)
6.CKUs: Agnes Sandor, Xerox Research Europe
7.BEL/knowledge attribution: Dexter Pratt, Selventa; Henk Pander Maat,
University Utrecht (funded in part by NWO)
8.DOMEO/SWAN:Paolo Ciccarese & Tim Clark, Harvard/MGH (funded in
part by Elsevier Labs)
9.Evidence-based guidelines: Paul Groth, Rinke Hoekstra, Frank van
Harmelen, VU; Richard Vdovjak, Philips Research (funded by STW)
10.Force11: Phil Bourne, UCSD; Eduard Hovy, ISI/USC; Tim Clark,
Harvard/MGH; Cameron Neylon, PLoS; Ivan Herman, W3C (funded in
part by Sloan Foundation) 34
116. Anything here we can work on?
Type Problems Experiments Issues
A. Paper format:
A1 Two-dimensional Utopia, Wolfram CDF Standards, tools
A2 Linear ABCDE Adoption?
A3 Not interactive Executable papers Adoption
B. Writing habits
B1 Reference to papers TAC: Citance summaries Need to start at author
B2 Inexact entity references NIF antibodies Need mandate!
B3 Methods post-mortem Data-centric publishing Change research recording!
C. Language:
C1 Coherent DIKB Hard to parse!
C2 Narrative CKUs Fractal nature of paper
C3 Abstract BEL Formalize knowledge level
D. Collections of papers:
D1 Can’t find Scientific search engines? Is anyone working on this?
D2 Can’t compare DOMEO/SWAN Manual, doesn’t scale
D3 Can’t combine Evidence-based guidelines Inconsistencies!
Writing less and reading more Force11, perhaps? Social/political/personal!35
117. What about writing completely differently?
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/
36
network-enabled-research/
118. What about writing completely differently?
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/
36
network-enabled-research/
119. What about writing completely differently?
Internet of things: (Bleecker, [1])
Interact with ‘objects that blog’ or ‘Blogjects’, that:
track where they are and where they’ve been;
have histories of their encounters and experiences
have agency - an assertive voice on the social web [2]
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/
36
network-enabled-research/
120. What about writing completely differently?
Internet of things: (Bleecker, [1])
Interact with ‘objects that blog’ or ‘Blogjects’, that:
track where they are and where they’ve been;
have histories of their encounters and experiences
have agency - an assertive voice on the social web [2]
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/
36
network-enabled-research/
121. What about writing completely differently?
Internet of things: (Bleecker, [1])
Interact with ‘objects that blog’ or ‘Blogjects’, that:
track where they are and where they’ve been;
have histories of their encounters and experiences
have agency - an assertive voice on the social web [2]
Research Objects: (Bechofer et al, [2])
Create semantically rich aggregations of resources,
that can possess some scientific intent or support
some research objective
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/
36
network-enabled-research/
122. What about writing completely differently?
Internet of things: (Bleecker, [1])
Interact with ‘objects that blog’ or ‘Blogjects’, that:
track where they are and where they’ve been;
have histories of their encounters and experiences
have agency - an assertive voice on the social web [2]
Research Objects: (Bechofer et al, [2])
Create semantically rich aggregations of resources,
that can possess some scientific intent or support
some research objective
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/
36
network-enabled-research/
123. What about writing completely differently?
Internet of things: (Bleecker, [1])
Interact with ‘objects that blog’ or ‘Blogjects’, that:
track where they are and where they’ve been;
have histories of their encounters and experiences
have agency - an assertive voice on the social web [2]
Research Objects: (Bechofer et al, [2])
Create semantically rich aggregations of resources,
that can possess some scientific intent or support
some research objective
Networked Knowledge: (Neylon, [3])
If we care about taking advantage of the web and
internet for research then we must tackle the building
of scholarly communication networks.
These networks will have two critical characteristics:
scale and a lack of friction. [3]
[[1] Bleecker, J. ‘A Manifesto for Networked Objects — Cohabiting with Pigeons, Arphids and Aibos in the Internet of Things
http://nearfuturelaboratory.com/2006/02/26/a-manifesto-for-networked-objects/
2] Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and
Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.
http://precedings.nature.com/documents/4626/version/1
[3] Neylon, C. ‘Network Enabled Research: Maximise scale and connectivity, minimise friction’, http://cameronneylon.net/blog/
36
network-enabled-research/
124. Networked science in action:
• Galaxy Zoo: citizen science: classify galaxies in the comfort
of your own home – like Hanny!
• Tim Gowers, Polymath: “This
is
to
normal
research
as
driving
is
to
pushing
a
car”
• Mathoverflow: virtual network of mathematicians working
collectively to answer big/small, clear/fuzzy questions
• Jean-Claude Bradley: ‘short-form chemistry’: tweet/blog
about an experiment, Storify into a narrative
• Read Cameron Neylon’s blog
on networked science!
37
125. Anything here we can work on?
Type Problems Experiments Issues
A. Paper format:
A1 Two-dimensional Utopia, Wolfram CDF Standards, tools
A2 Linear ABCDE Adoption?
A3 Not interactive Executable papers Adoption
B. Writing habits
B1 Reference to papers TAC: Citance summaries Need to start at author
B2 Inexact entity references NIF antibodies Need mandate!
B3 Methods post-mortem Data-centric publishing Change research recording!
C. Language:
C1 Coherent DIKB Hard to parse!
C2 Narrative CKUs Fractal nature of paper
C3 Abstract BEL Formalize knowledge level
D. Collections of papers:
D1 Can’t find Scientific search engines? Is anyone working on this?
D2 Can’t compare DOMEO/SWAN Manual, doesn’t scale
D3 Can’t combine Evidence-based guidelines Inconsistencies!
Networked science Mathoverflow, Bradley But is it science?
Writing less and reading more Force11, perhaps? Social/political/personal!38