NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013
1. Honorary Academic Editor
Susanna-Assunta Sansone, PhD
(University of Oxford, UK)
Visit
nature.com/scientificdata
Managing Editor
Andrew L Hufton, PhD
Email
scientificdata@nature.com
Advisory Panel and Editorial
Board including senior researchers,
funders, librarians and curators
Tweet
@ScientificData
2. Now open for submissions!
Launching May 2014
Advisory Panel
Susanna-Assunta Sansone
Honorary Academic Editor
Andrew L Hufton
Managing Editor
Ruth Wilson
Publisher
Supported by
Michael Huerta ● National Institutes of Health, USA ● Mark Thorley ● Natural Environment
Research Council, UK ● Patricia Cruse ● University of California, USA ● Susan Gregurick ● Office
of Biological and Environmental Research, Department of Energy, USA ● Ioannis Xenarios ● Swiss
Institute of Bioinformatics, Switzerland ● Chris Bowler ● IBENS, France ● Mark Forster ● Syngenta,
UK ● Anthony Rowe ● Johnson & Johnson, USA ● Stephen Chanock ● National Cancer Institute,
USA ● Weida Tong ● National Center for Toxicological Research, FDA, USA ● Albert J. R. Heck ●
Utrecht University, The Netherlands ● Johanna McEntyre ● EMBL-EBI, European Bioinformatics
Institute, UK ● Simon Hodson ● CODATA, France ● Joseph R. Ecker ● Howard Hughes Medical
Institute & Salk Institute, USA ● Stephen Friend ● Sage Bionetworks, USA ● Jessica Tenenbaum ●
Duke Translational Medicine Institute, USA ● Anne-Claude Gavin ● EMBL, Germany ● David Carr ●
Wellcome Trust, UK ● Wolfram Horstmann ● University of Oxford, UK ● Piero Carninci ● RIKEN
Omics Science Center, Japan ● Pascale Gaudet ● Swiss Institute of Bioinformatics, Switzerland ●
Judith A. Blake ● The Jackson Laboratory, USA ● Richard H. Scheuermann ● J. Craig Venter
Institute, USA ● Caroline Shamu ● Harvard Medical School, USA
3. Now open for submissions!
Launching May 2014
Introducing a new content type:
Data Descriptor
Supported by
4. Data Descriptor vs. Traditional Article
● The data descriptor is only concerned with the facts behind the
methodology of data generation/collection and processing
● A data descriptor can be:
– submitted prior to journal article
– submitted at the same time as the journal article
– submitted after journal article
Interpretation
Synthesis
Analysis
Facts
What is the
sample?
Data Descriptor
Conclusions
Data Descriptor
What did I do to
generate the data?
How was the data
processed?
Where is the data?
Who did what when?
Summary
of DD
Journal article
5. Prior Publication Policy
“Nature-titled journals will not consider prior Data Descriptor publications to
compromise the novelty of new manuscript submissions as long as those
manuscripts go substantially beyond a descriptive analysis of the data, and
report important new scientific findings appropriate for the journal. This policy
does not necessarily extend to subsequent journal articles whose primary
purpose is to describe a new dataset or resource.”
See the full text in our Editorial Policies online
6. Barriers to data sharing and reuse
● Datasets are not released
● Datasets are not reusable or discoverable
● Lack of credit for sharing data and making it
reusable
8. Data Descriptor has 2 components
Article
or
narrative component
(PDF and HTML)
Supported by
Experimental metadata
or
structured component
(in-house curated, machine-readable formats)
8
9. Data Descriptor - article
Sections:
• Title
• Abstract
• Background & Summary
• Methods
• Technical Validation
• Data Records
• Usage Notes
• Figures & Tables
• References
In traditional publications this is
not provided in a sufficiently
detailed manner
However this information is
essential for understanding,
reusing, and reproducing
datasets
10. Data Descriptor – experimental metadata
Submit ISA-Tab* files directly
OR
Submission tools and simple templates
help authors provide the information
without special tools
In-house curator
standardizes the
structured content
*Sansone et al., Nature Genetics, 2012
10
11. Discover similar datasets
Structured content allows users to link, with one click, to other datasets
studying the same tissue, disease, organism, or using the same experimental
platform
SciData DD
SciData DD
SciData DD
Structured
content
Structured
content
Structured
content
SciData DD
Same tissue
Same organism
Structured
content
Same assay
SciData DD
SciData DD
SciData DD
Structured
content
Structured
content
Structured
content
SciData DD
SciData DD
SciData DD
Structured
content
Structured
content
Structured
content
11
12. Get Credit for Sharing Your Data
Publications will be listed in the major indexes and will be citeable
Open-access
Authors select from three Creative Commons licences for the main
Data Descriptor. Each publication supported by curated CC0 metadata
Focused on Data Reuse
All the information others need to reuse the data; no interpretative
analysis or hypothesis testing
Peer-reviewed
Rigorous peer-review managed by our Editorial Board of academic
researchers ensures data quality and standards
Promoting Community Data Repositories
Data stored in community data repositories
13. Complementary to both journal articles
and data repositories
Export to various formats
(ISA_tab, RDF, etc)
14. Scientific Data and GBIF: Roadmap
Partnership
between
GBIF and
NPG
Scientific
Data
Mapping the
DD article and
GBIF Metadata
Profile
Q4
2013
Q4
2013
Enhancement
to GBIF IPT to
export the DD
article
Call for
manuscript
submissions
1st set of
Data
Descriptors
published
Vishwas Chavan
PHASE 1
Q42
2014
Q43
2014
Q4
2014
Mapping the DD
experimental
metadata and
GBIF Metadata
Profile
Further
enhancements
to GBIF IPT
PHASE 2
The two components of the Data Descriptor (DD):
• DD article or narrative component
• DD experimental metadata or structured component (ISA-Tab format, progressively others e.g. RDF)