This document discusses trends in publishing scientific data, including requirements to deposit data, citing data through identifiers like DOIs, considering data itself as a publication in data journals or databases, and including interactive data within publications. It also outlines new roles for working with scientific data, such as data scientists and curators who extract facts from literature to populate databases and ensure data quality.
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Publishing of Scientific Data - Science Foundation Ireland Summit 2010
1. Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
Publishing of Scientific Data
Jodi Schneider
jodi.schneider@deri.org
Twitter @jschneider
SFI Summit
2010-11-16
Athlone, Ireland
2. Digital Enterprise Research Institute www.deri.ie
Data deposit may be required
Community norms
Crystallography, astronomy, genomics, …
Peer-review and publication
Nature: “Supporting data must be made available to editors and
peer-reviewers at the time of submission…”
Funders
NSF proposals must include a 2-page Data Management Plan
3. Digital Enterprise Research Institute www.deri.ie
Data citation
“Cite this paper if you use my dataset”
DOI, handle, Repository ID
Tracking reuse is hard! http://bit.ly/doi-fail
Universal Numerical Fingerprint (UNF)
Changes when the data does
Cryptographic hash of the data content
Micah Altman, Gary King (2007). “A Proposed Standard for the
Scholarly Citation of Quantitative Data”. D-Lib
13(3/4)http://www.dlib.org/dlib/march07/altman/03altman.html
UNF:3:DaYlT6QSX9r0D50ye+tXpA==
4. Digital Enterprise Research Institute www.deri.ie
Data itself as publication?
Data-only journals
Earth System Science Data
Databases as a research product
Ph.D. curators extracting information from papers
Machine recording of experiments
Open Notebook Science
Integration of data into publications
Phil Bourne (2005) Will a Biological Database
Be Different from a Biological Journal? PLoS Comput Biol
1(3): e34. doi:10.1371/journal.pcbi.0010034
5. Digital Enterprise Research Institute www.deri.ie
Interactive Data inside the PDF
Teresa K. Attwood et al. (2009) Calling international rescue: knowledge lost in literature an
data landslide! Biochemical Journal. doi:10.1042/BJ20091474
6. Digital Enterprise Research Institute www.deri.ie
New jobs and roles
Ph.D. scientists: Extract facts, populate databases, …
Computer scientists: Semantic tech, data mining, …
Embedded librarians: Metadata, provenance, …
Data scientists: Data capture, visualization, stats, …
Engineers: Self-documenting apparatus, sensors, …
7. Digital Enterprise Research Institute www.deri.ie
Research Assoc./Sci Data Curator
Develop the biomedical ontology in OWL
Annotate biomed resource metadata w/ the ontology
Help with iterative design of annotation tools
Participate in working groups to define requirements
Determine database content
Implement the data model
Help with data load processes, data reconciliation, quality
assurance, and OWL ontology software integration.
8. Digital Enterprise Research Institute www.deri.ie
Scientific Data Curator
Curate morphological data from the literature
Populate a database
Contribute new terms, definitions, and relationships to
the ontologies where needed
Work with the community to ensure consistency
Review the data submitted by experts
Work closely with software developers to develop the
database, curatorial interface, web interface
Notas do Editor
Nature http://www.nature.com/authors/editorial_policies/availability.html
NSF http://www.nsf.gov/bfa/dias/policy/dmp.jsp
This new NSF policy takes effect 18 January 2011
Earth System Science Data http://www.earth-system-science-data.net/
One reason given for using DOIs for data is to track uptake and reuse. This is challenging with current tools, as Heather Piowowa has pointed out: http://bit.ly/doi-fail long URL is http://researchremix.wordpress.com/2010/11/09/tracking-dataset-citations-using-common-citation-tracking-tools-doesnt-work/
DataCite is a collaborative endeavor to explore and improve data citation: http://thedata.org/citation/standard
The Australian National Data Service has a nice page on data citation awareness:http://ands.org.au/guides/data-citation-awareness.html
Supplemental materials
Interactive data
T. K. Attwood, D. B. Kell, P. Mcdermott, J. Marsh, S. R. Pettifer, and D. Thorne. (2009) Calling international rescue: knowledge lost in literature and data landslide! Biochemical Journal. doi:10.1042/BJ20091474.
“colleagues and I published a computational method for distinguishing between two types of acute leukemia, based on large-scale gene expression profiles obtained from DNA microarrays ( 3). This paper generated hundreds of requests from scientists interested in replicating and extending the results. The method involved a complex pipeline of steps, including (i) preprocessing of the data, to eliminate likely artifacts; (ii) selection of genes to be used in the model; (iii) building the actual model and setting the appropriate parameters for it from the training data; (iv) preprocessing independent test data; and fi nally (v) applying the model to test its efficacy. The result was robust and replicable, and the original data were available online, but there was no standardized form in which to make available the various software components and the precise details of their use.” Jill P. Mesirov (2010). Accessible Reproducible Research (Science, 327:415). doi:10.1126/science.1179653, which describes the underlying philosophy: have a Reproducible Research System (RRS) made up of an environment for doing computational work (the Reproducible Research Environment or RRE) and an authoring environment (the Reproducible Research Publisher or RRP) which links back to the research system.
Based on eagle-I UPAR23331081710 https://www.eagle-i.org/
From http://jobs.climber.com/jobs/Education-Higher-Education/Portland-OR-USA/Research-Associate-Scientific-Data-Curator/6029259/Careers?source=simplyjobs&bid=6029259&cid=Research-Associate-Scientific-Data-Curator
Advertised 2010-08: http://sourceforge.net/mailarchive/forum.php?thread_name=8E1C6EA1-46C9-4D81-AF18-B50297D50A2C%40ohsu.edu&forum_name=obo-discuss