Identifying Appropriate Test Statistics Involving Population Mean
Managing, Sharing and Curating Your Research Data in a Digital Environment
1. Managing, Sharing and Curating
Your Research Data
in a Digital Environment
Sonia Barbosa, Manager of Data Curation, Harvard Dataverse
Philip Durbin, Developer, Harvard Dataverse
11. Bi-directional linking of data and
research articles is taking place!
If I find your data, I can find your article.
If I find your article, I can find your data!
15. 1. Visibility: Studies have shown that open access content attracts more attention than
non-open access content
Increased citation and usage, Greater public engagement
2. Make new discoveries: Open access data and papers accelerate the pace of scientific
enquiry
Faster impact, Wider collaboration , Increased interdisciplinary conversation
3. Comply with funder mandates: open access is increasingly required by funders around the
world
24. ● establish easier access to research data on the Internet
● increase acceptance of research data as legitimate, citable contributions to the scholarly record
● support data archiving that will permit results to be verified and re-purposed for future study.
25. DataCite
● Open Access standards for Datasets
● International in scope including universities, research institutions, data governance agencies,
government entities, etc…
● DataCite is a leading global non-profit organisation that provides persistent identifiers (DOIs) for
research data. Our goal is to help the research community locate, identify, and cite research
data with confidence. (Datacite.org)
35. The Scientific Community is Establishing Best Practices
for Data Publishing and Replication...
DA-RT Journal Policies
Goal: To increase transparency in social science
In 2016, the first group of DA-RT Journals began to post new data sharing and transparency policies:
American Journal of Political Science's Guidelines for Preparing Replication Materials
American Political Science Review's DA-RT Guidelines
Conflict Management and Peace Science DA-RT guidelines
The Italian Political Science Review's Replication Policy and Policy for Datasets and Supplemental Files
State Politics and Policy Quarterly's Guidelines for Preparing Replication Policies
41. The aim of Springer Nature data sharing policy...
These new policies and services aim to:
● improve author service and experience by standardising research data policies and
procedures between journals where appropriate
● improve reader service by providing more consistent links between publications and data
● improve editor and peer reviewer service by providing more consistent guidelines and support
for research data policies, and increased visibility of data in the peer-review process
● encourage publication of more open and reproducible research
● increase growth and innovation in research data sharing
● provide a dedicated Research Data Support helpdesk for Springer Nature authors and editors
http://blogs.nature.com/ofschemesandmemes/2016/07/05/promoting-research-data-sharing-at-springer-nature
47. Challenges include but are not limited to...
Meaningful data aggregation and analysis
Privacy and security demands
Missing integration of data sources and instruments
Complicated privacy laws (US and European)
Diverse stakeholders
Sandra Gesing Center for Research Computing, University of Notre Dame sandra.gesing@nd.edu 7th National Data Service Consortium
Workshop, Chicago 13 April 2017 Science Gateways: Addressing Data Management Challenges
49. Dataverse is an open source web application to share, preserve, cite, explore, and
analyze research data. It facilitates making data available to others, and allows you to
replicate others' work more easily. Researchers, data authors, publishers, data distributors,
and affiliated institutions all receive academic credit and web visibility.
https://dataverse.org/
Data Management Plan
Checklist for data management plan
Template for data management plans
http://best-practices.dataverse.org/data-management/index.html
55. Dataverse supports:
● Access and Sharing
● File Format Support
● Documentation, Metadata and Bibliographic Information
● Versioning
56. Dataverse facilitates data access by providing:
● descriptive and variable/question-level search;
● topical browsing;
● data extraction;
● re-formatting;
● on-line analysis
Dataverse performs:
● archival format migration;
● metadata extraction;
● validity checks;
The Dataverse application’s “templating” feature will be used for consistency of information across datasets.
The Dataverse repository automatically generates persistent identifiers, and Universal Numeric
Fingerprints (UNF) for datasets; extracts and indexes variable descriptions, missing-value codes and labels;
creates variable-level summary statistics; and facilitates open distribution
of metadata with a variety of standard formats (Data Cite, DDI v 2.5, Dublin Core, VO Resource,
and ISA-Tab) and protocols (OAI-PMH, SWORD)
57. Data Sharing Has Many Acceptable Levels
-Different levels of openness in sharing data
-Verification of reproducibility
-Replication data for, Data related to…
-Public version of a dataset vs restricted version
60. What is research data....?
● Observational: data captured in real time that is usually unique and irreplaceable. For example,
remote sensing data, survey data, field recordings, sample data
● Experimental: data captured from lab equipment that is often reproducible, but can be expensive.
For example, gene sequences, chromatograms, magnetic field data
● Models or simulations: data generated from test models where the model and metadata may be
more important than output data from the model. For example, climate models, economic models
● Derived or compiled: resulting from processing or combining ‘raw’ data, often reproducible, but may
be expensive. For example, text and data mining, compiled databases, 3D models
● Reference or canonical: a static or organic conglomeration or collection of datasets, probably
published and curated. For example, gene sequence databanks, collection of letters or archive of
historical images
http://libguides.ucd.ie/data/researchdata
61. The purpose of research data management...
● To ensure research integrity and validation of results.
● To increase research efficiency.
● To facilitate data security and minimise the risk of data loss.
● To ensure wider dissemination and increased impact.
● To enable research continuity through secondary data use.
● To ensure compliance with a funding agency’s requirements.
http://libguides.ucd.ie/data/researchdata
64. IQSS and the Dataverse Project
● Mission: "...enabling bigger, better, faster, and more
collaborative social science"
● Integrations powered by APIs
● Current and future efforts
● Community
● Transparency at all project levels
73. Archivematica: Getting Data out of Dataverse
https://www.slideshare.net/datascienceiqss/bell-trimble-dataverse-community-meeting-2015-final-presentation
79. Streaming Data
311Boston API
App for
Regular
Processing
● Citation
● Versioning
● File Appending
● R Scripts run at some
interval defined by
researcher
● Authentication to API
(if needed)
● Boston makes APIs
available for public
works data
● So do many others!
84. Dataverse Community
● 60+ code contributors
● Hundreds of members of the Dataverse Community -
developers, researchers, librarians, data scientists
○ Dataverse Google Group
○ Dataverse Community Calls
○ Dataverse Community Meeting
https://groups.google.com/d/forum/dataverse-community
85. Dev Efforts from the Community
https://github.com/IQSS/dataverse/blob/develop/CONTRIBUTING.md
88. References
Teplitzky, S. (2017). Open Data, [Open] Access: Linking Data Sharing and Article Sharing in the Earth
Sciences. Journal of Librarianship and Scholarly Communication, 5(General Issue), eP2150.
https://doi.org/10.7710/2162-3309.2150
Lee DJ, Stvilia B (2017) Practices of research data curation in institutional repositories: A qualitative view from repository staff. PLoS ONE 12(3):
e0173987. https://doi.org/10.1371/journal.pone.0173987
Drachen, T.M. et al. , (2016). Sharing data increases citations . LIBER Quarterly . 26 ( 2 ) , pp . 67–82 . DOI: http://doi.org/10.18352/lq.10149
Open Access and the Future of Scholarly Communication: Policy and Infrastructure
By Kevin L. Smith, Katherine A. Dickson