Presented at the Feb 2015, NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
http://www.niso.org/news/events/2015/virtual_conferences/sci_data_management/
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
Force11: Enabling transparency and efficiency in the research landscape
1. Melissa Haendel, PhD
Oregon Health & Science University
Future of Research Communications and E-Scholarship
Enabling transparency and efficiency
in the research landscape
@force11rescomm@ontowonka
3. The Research Life Cycle
TECHNIQUE
COLLABORATION
PUBLICATIONDATASET
GRANT
4. Impetus for change: Is our
current method serving science?
47/50 major preclinical
published cancer studies
could not be replicated
“The scientific community assumes
that the claims in a preclinical
study can be taken at face value-
that although there might be some
errors in detail, the main message
of the paper can be relied on and
the data will, for the most part,
stand the test of time.
Unfortunately, this is not always
the case.”
Begley and Ellis, 29 MARCH 2012 | VOL 483 |
NATURE | 531
5. Not all content is available for
synthesis and discovery
Search PubMed: Spinal
Muscular Atrophy
6. The scientific corpus is
fragmented
~25 million articles
total, each covering
a fragment of the
biomedical space
Each publisher owns
a fragment of a
particular field
The current process
is inefficient and
slow
Wiley
Elsevier
MacMillian
Oxford
Spinal Muscular Atrophy
7. Committee on Academic
Promotions
What Counts
Money
Grants
Papers
Teaching
Service
What Does Not
Sharing data
Sharing software
Open access
Collaboration
Patents
Startups
Getting Ahead as a Computational Biologist in Academia PLOS Comp Biol
doi:10.1371/journal.pcbi.1002001
8. Beyond the PDF
Conference/unconference
where all stakeholders come
together as equals to
discuss issues
– Publishers
– Technologists
– Scholars
– Library scientists
– Humanists
– Policy makers
– Funders
Incubator for change
What would you do to
change scholarly
communication?
San Diego, Jan 2011 ...... Amsterdam, March 2013........Oxford, 2015
http://www.force11.org/beyondthepdf2
9. FORCE11
Future of Research Communications and E-
Scholarship:
A grass roots effort to accelerate the pace and nature
of scholarly communications and e-scholarship through
technology, education and community
Why 11? We were born in 2011 in Dagstuhl,
Germany
Principles laid out in the FORCE11 Manifesto
FORCE11 launched in July 2012
www.force11.org @
10. Promote community, cross-
fertilization and interoperability
FORCE11 helps facilitate
communications across
disciplines and communities
Issues are not identical but we
can learn from each other
Community platform
– Meetings
– Discussions
– Tools and resources
– Blogs
– Event calendar
– Community projects
Working groups
– Data Citation
– Resource identification
initiative
– Attribution
– Data
standards/Biosharing
11. Data Citation Working Group
FORCE11 provides a neutral
space for bringing groups
together
35 individuals
representing > 20
organizations concerned
with data citation
Conducted a review of
current data citation
recommendations from
4 different organizations
Arrived at consensus
principles
http://www.force11.org/datacitation
12. Data Citation Principles
Consensus Data
Citation
principles ready
for comment
Designed to be
high level and
easy to
understand
1. Importance
2. Credit and
Attribution
3. Evidence
4. Unique
identifiers
5. Access
6. Persistence
7. Versioning
8. Interoperability
and flexibility
15. Challenge: Working with Web Data
Often have inadequate descriptions so we don’t know what they
are about or how they were constructed
Datasets change over time, but often don’t come with versioning
information
May have been constructed using other data, but it’s not clear
which version of data was used or whether these were modified
Data may be available in a variety of formats
There may be multiple copies of data from different providers,
but it’s unclear if they are exact copies or derivatives
Version of standard or vocabulary used not indicated
Data registries are not synchronized and can contain conflicting
information
16. W3C HCLS Dataset Description
Develop a guidance note for reusing existing
vocabularies to describe datasets with RDF
– Mandatory, recommended, optional descriptors
– Identifiers
– Versioning
– Attribution
– Provenance
– Content summarization
Recommend vocabulary-linked attributes and
value sets
Provide reference editor and validation
19. Journal guidelines for methods are often poor and
space is limited
“All companies from which materials were obtained should
be listed.” - A well-known journal
Reproducibility is dependent at a minimum, on
using the same resources. But…
24. Sample citation:
Polyclonal rabbit anti-
MAPK3
antibody, Abgent, Cat#
AP7251E,
RRID:AB_2140114
1.
Research
er
submits a
manuscri
pt for
publicatio
n
2. Editor or
Publisher
asks for
inclusion of
RRID
3. Author goes to
Research
Identification
Portal to locate
RRID
4. RRID is
included
in
Methods
section
and
as
Keyword
Publishing Workflow
25. What is the relationship of a
person to a publication?
26. Example Scenario
Melissa creates mouse1
David creates mouse2
Layne uses performs RNAseq analysis on
mouse1 and mouse2 to generate
dataset3, which he subsequently
curates and analyzes
Layne writes publication pmid:12345
about the results of his analysis
Layne explicitly credits Melissa as an
author but not David.
27. Credit is connected
=> Credit to Melissa is asserted, but credit to David can be inferred
28. Attribution Working Group
https://www.force11.org/group/attributionwg
Project CredIT
VIVO-ISF ontology
PROV
the Becker model
Transitive credit
The Scholarly Contributions and Roles ontology
Goal is catalyze rapid convergence on requirements, approaches, and
practical implementation of a system for tracking contributions to any
scholarly product.
29. The 1K Challenge
What would you do with £1k today to make
research communication better, anticipating
the increasing scale of people and
machines?
30. Starting at Ground Zero
CONSULTATIONS
Researcher + 2-3 from
Data Stewardship Team
31. Researchers DO need
assistance:
Finding and choosing data
standards
File versioning
Applying metadata to
facilitate data sharing
“Gummi Bear” themed
data management
exercise resonated well
with students
Lack of awareness of
services and expertise
offered by the Library
OHSU Library is
developing data
services for researchers
http://laughingsquid.com/the-anatomy-of-a-
gummy-bear-by-jason-freeny/
Conclusions and new
directions
DOI:10.6083/M4QC0273
33. FORCE11 Vision
• Modern technologies enable vastly improve knowledge transfer and far wider
impact; freed from the restrictions of paper, numerous advantages appear
• We see a future in which scientific information and scholarly communication more
generally become part of a global, universal and explicit network of knowledge
• To enable this vision, we need to create and use new forms of scholarly
publication that work with reusable scholarly artifacts
• To obtain the benefits that networked knowledge promises, we have to put in
place reward systems that encourage scholars and researchers to participate and
contribute
• To ensure that this exciting future can develop and be sustained, we have to
support the rich, variegated, integrated and disparate knowledge offerings
that new technologies enable
What is the 21st century equivalent of the library?
Science used to be pretty linear, and slow.
Clone by phone.
Now science is a web of interconnected resources and activities, only a portion of which is the scientific literature.
Should science be reproducible? Can it be? How would we make it so? How will we evaluate reproducibility? What does the scholarly article need to be or connect to to make it a venue for reproducibility?
First 6 results in Pub Med for SMA: Can’t access, 3 different publishers. Only one is freely available.
This WG came out of the first one. Example here are recommendations having to do with allowing metadata identifier systems.
Paper is in preprint and will be out soon.
NIH funded BD2K initiative to develop recommendations for a data discovery index.
RRID Working group, has numerous publishers and journals that have implemented.
We are working on determining how to deal with this longer term- is this a new data citation that goes alongside the paper. Needs to be in the keywords do it is mineable.
Not all contributions to a work end up in an authorship
A graph representing this scenario. Note also that we intentionally attributed melissa on the publication, but not david. David’s attribution could be inferred from the graph.
There are many contributors to the work presented.
Some of the slides in this deck are directly adapted or borrowed from the above people, thank you very much.
Maryann is currently the president of Force11.
Phil was instrumental in helping start Force11.
Michel is co-leading the HCLS data set description
Nicole did the research resource identification project
Stephanie keeps the Force in Force11