Jonathan David Crabtree - The Dataverse Community: Supporting Open Science and Reproducibility

Open Science is a movement to make scientific research, its data and dissemination accessible to all levels of society. This movement considers aspects such as Open Access, Open Data, Reproducible Research and Open Software.

Each of these aspects presents discreteness that need to be evaluated and discussed by the scientific community so that guidelines are established that facilitate the dissemination of scientific information.

The great challenge is to establish effective and efficient practices that allow journals to add these demands in their editorial processes, so as not only to allow data, software and methods to be accessible, but also to encourage the community to do so.

Considering these questions, this panel has as a proposal to discuss important aspects about the advancement of research communication. Some of these aspects are placed in the SciELO indexing criteria, as is the case of referencing research materials in favor of transparency and reproducibility.

FAIR criteria, concepts and implementation; challenges for the publication of data and methods; institutional policies for open data; adoption of TOP guidelines (Transparency and Openness Promotion); software repositories; thematic areas data repositories.

  1. 1. SciELO International Conference 2018 Jonathan Crabtree Director of Cyberinfrastructure Odum Institute
  2. 2. Founded in 1924, the Odum Institute provides core research infrastructure for the social sciences to support the research, teaching, and service mission of UNC. We define social science broadly to include the health sciences, and we serve faculty and students from every corner of UNC’s campus. Home of the Lou Harris Data Center and the UNC Dataverse
  3. 3. An ongoing 12 year collaboration around repository solutions and tools Partnering on projects to promote data sharing and publication Leading efforts to promote Open and Reproducible Science
  4. 4. An open-source platform to share and archive data Developed at Harvard’s Institute for Quantitative Social Science since 2006 Gives credit and control to data authors and producers Builds a community to define standards and best practices and foster new research in data sharing and research reproducibility Has brought data publishing into the hands of data authors
  5. 5. ๏ Data Citation with global persistent IDs: ๏ generate DOI automatically attribution to data authors and repository registration to DataCite ๏ ๏ ๏ Rich Metadata: ๏ citation metadata domain-specific descriptive metadata variable and file metadata (extracted automatically) ๏ ๏ ๏ Access and usage controls: ๏ open data as default, with CC0 waiver custom terms of use and licenses, when needed data can be restricted, but citation & metadata always publicly accessible ๏ ๏ ๏ APIs and standards: ๏ SWORD, OAI-PMH, native API to search and get data and metadata Dublin Core and DDI metadata standards PROV ontology standard to capture provenance of a dataset (coming soon) ๏ ๏
  6. 6. Has grown to 33 installations around the world Thousands of scientific studies archived Across many disciplines Supporting many metadata standards
  7. 7. The Global Dataverse Community Consortium (GDCC) is dedicated to providing international organization to existing Dataverse community efforts, and will provide a collaborative venue for institutions to leverage economies of scale in support of Dataverse repositories around the world. http://DataverseCommunity.Global
  8. 8. But, are these shared data reusable? For this, we need well-documented, well-organized data and code as well as tools to facilitate the replication and reuse
  9. 9. More than 50% of the top 50 journals in anthropology, economics, psychology, and political sciences have data policies that either encourage or require to share the data associated with the article. Crosas, Gautier, Karcher, Kirilova, Otalora, Schwartz, 2018. Data Policies of highly-ranked social science journals
  10. 10. With funding from the Sloan Foundation, our organizations plan to address data reuse and reproducibility by: – Improving curation through educational materials, friendly user-interface, and services – Integrating replication tools with Dataverse repositories: • Encapsulator to pack your data and code in a self-contained, documented capsule (IQSS Harvard) • Code Ocean to easily run scientific code online (IQSS Harvard) • CoRe2 to connect systems in order to streamline the verification workflow (ODUM Institute)
  11. 11. The Confirmable Reproducible Research (CoRe2) Environment Linking Tools to Promote Computational Reproducibility Support for this research was provided by the Alfred P. Sloan Foundation (2018-11121). The views expressed here do not necessarily reflect the views of the Foundation.
  13. 13. < >AUTHOR EDITOR VERIFIER CURATOR 1 2 3 4 Manuscript Publication & Data Curation + Verification
  14. 14. < >AUTHOR EDITOR VERIFIER CURATOR 1 2 3 4 Manuscript Publication & Data Curation + Verification
  15. 15. Manuscript Publication & Data Curation + Verification < >AUTHOR EDITOR VERIFIER CURATOR 1 2 3 4
  16. 16. < >AUTHOR EDITOR VERIFIER CURATOR 1 2 3 4 Manuscript Publication & Data Curation + Verification
  17. 17. < >AUTHOR EDITOR VERIFIER CURATOR 1 2 3 4 Manuscript Publication & Data Curation + Verification
  18. 18. Manuscript Publication & Data Curation + Verification
  19. 19. Given current constraints and the need for iterative review, data curation and successful verification of a replication package for a single manuscript requires six hours of labor on average. COMPUTATION COORDINATION ADMINISTRATION
  21. 21. Promote and support computational reproducibility by integrating and streamlining manuscript publication and data curation + verification workflows
  22. 22. ● Facilitate access to and adoption of tools and platforms to support scientific reproducibility ● Coordinate manuscript submission and data curation + verification workflow processes across key stakeholders ● Promote the adoption of standards and best practices for data access and transparency as part of normative research practice.
  23. 23. AUTHOR EDITOR VERIFIER CURATOR binder < > encapsulator
  24. 24. More Information at: http://dataverse.org http://dataversecommunity.global http://www.odum.unc.edu
  25. 25. Merce Crosas at IQSS https://scholar.harvard.edu/mercecrosas/home Odum Co-PI Thu-Mai Christian and Visual Arts Specialist Kasha Ely