Workflows for Publishing Data; Scientific Data's experience as an early adopter
1. Workflows for Publishing Data
Varsha Khodiyar, PhD
Data Curation Editor, Scientific Data
Nature Publishing Group
varsha.khodiyar@nature.com
@varsha_khodiyar
@scientificdata
Scientific Data's experience as an early adopter
RDA P7, 1st to 3rd March 2016
2. Mandatory and recommended key components
of data publishing – WG results
Austin et al. in review. Report preprint doi:10.5281/zenodo.34542
Implemented by Scientific Data Under wider consideration
by Springer Nature
3. Implementation of required elements
• Data PID required to complete
manuscript submission
• Data Citation policy enforced by
editorial process
• Use of structured repositories which
capture subject-specific metadata
• Curation of discovery level metadata
(regardless of repository) by dedicated
Data Curation Editor
• Machine readable metadata aids
discovery
4. Additional elements - Context
4
• Data Descriptor designed to encourage
full documentation of data generation
• Articles analysing described data are
captured in machine readable metadata
(ISA format)
• Linked as associated publication to Data
Descriptor online
• Analysis articles published in Nature
Publishing Group journals link back to
Data Descriptor
• Software availability statement required
for previously unpublished software and
code
5. Additional elements - Quality
5
• Provision of manuscript (and metadata
templates) to help authors provide reuse level
metadata
• Dependant on repositories for curation by
domain experts
• Editorial Board selected based on expertise in
data generation/reuse in their field
• Ensure that peer reviewers can access data
easily and confidentially
• Encourage peer reviewers to view and comment
on the actual data as part of their assessment
• Editorial office regularly asked for advice on
data deposition and repository selection
6. • Data Descriptors aid visibility of data by
considering them as first class publications
• Data Descriptors discoverable via common
publication indices such as PubMed
• Discovery level machine readable metadata
(in ISA format) generated for every Data
Descriptor
• Currently trialling use of metadata for data
discovery (ISAexplorer)
• Open to suggestions for other uses of
Scientific Data’s machine readable metadata
Additional elements – Visibility / Accessibility
6
Notas do Editor
Published product in Scientific Data’s case is the data paper, which we call the Data Descriptor
Peer-reviewers are not expected to check every data file or "curate" the data. This is a task we feel is best performed by expert repositories, and with support from our in-house data curation support.
Rejections after review remain rare, but on at least a few occasions peer-reviewers have identified issues within the actual data files that ultimately led to rejection (e.g. evidence of data contamination or other serious quality issues).
We believe that making the data easily available to peer reviewers can actually save them time in these cases, because they do not need to "play detective" -- expects can often make an assessment more rapidly and more accurately when presented with the real data.
Data Descriptors are discoverable on nature.com, visited by millions, and via common publication indices (PubMed, MEDLINE, Google Scholar -- Scopus and Thomson Reuters to come soon). This also makes them amenable to tracking by traditional metrics, like citation.
Scientific Data also delivers progressively FAIR metadata (Findable, Accessible, Interoperable and Reusable)