Presented at Beyond the PDF2 in Amsterdam 2013 http://www.force11.org/beyondthepdf2. This talk describes preliminary data showing the lack of scientific reproducibility solely based on an inability to identify the material resources used in the research. Final work to be published soon!
Semantic phenotyping for disease diagnosis and discovery
On Reproducibility of Science: Half of Antibodies Not Identifiable
1. On the reproducibility
of science
Melissa Haendel
Beyond the PDF2
20 March 2013
@ontowonka
haendel@ohsu.edu
2. The
science
cycle
Slide
from
Gully
Burns
Do we know if the infrastructure is
actually broken?
3. The
science
cycle
Image:
h6p://www.joinchangena=on.org/blog/post/roadblocks-‐on-‐the-‐pathway-‐to-‐ci=zenship
This is a broken data story.
4. Reproducibility
is
dependent
at
a
minimum,
on
using
the
same
resources.
But…
“All
companies
from
which
materials
were
obtained
should
be
listed.”
-‐
A
well-‐known
journal
Journal guidelines for methods are
often poor and space is limited
5. Hypothesis:
AnAbodies
in
the
published
literature
are
not
uniquely
idenAfiable
Gather
journal
ar=cles
28
Journals
Iden=fying
ques=ons:
5
domains:
Is
the
an=body
iden=fiable
Immunology
119
papers
in
the
vendor
site?
Cell
biology
Neuroscience
Is
the
catalog
number
Developmental
biology
454
an=bodies
reported?
General
biology
408
commercial
an=bodies
Is
the
source
organism
3
impact
factors:
reported?
High
46
non-‐commercial
Medium
an=bodies
Low
Is
the
an=body
target
iden=fiable?
An experiment in reproducibility
6. Approximately
half
of
anAbodies
are
not
uniquely
idenAfiable
in
119
publicaAons
60%
n=46
50%
Percent
idenAfiable
n=408
40%
30%
20%
10%
0%
Commercial
an=body
Non-‐commerical
an=body
The data shows…
7. Unique
idenAficaAon
of
commercial
anAbodies
varies
across
discipline
and
impact
factor
100%
n=87
90%
80%
n=95
Percent
iden=fiable
70%
60%
n=94
High
50%
n=124
n=56
Medium
40%
Low
30%
20%
10%
0%
Immunology
Neuroscience
Dev
Bio
Cell
Bio
General
Bio
In some domains high impact journals have worse
reporting, and in others it is the opposite
12. 90%
80%
70%
Percent
idenAfiable
60%
50%
40%
30%
20%
10%
0%
Commerical
Ab
Non-‐commercial
Catalog
number
Source
organism
Target
uniquely
iden=fiable
Ab
iden=fiable
reported
reported
iden=fiable
Of 14 antibodies published in 45 articles,
only 38% were identifiable
15. Ø Promote
beJer
reporAng
guidelines
in
journals
Ø Include
reviewing
guidelines
Ø Provide
tools
to
reference
research
resources
with
unique
and
persistent
IDs/URIs
Ø Train
librarians
and
other
data
stewards
to
apply
data
standards
What are we going to do about it?