Date: Apr 4, 2018
Speaker: Hyoungjoo Park, PhD candidate, School of Information Studies, University of Wisconsin-Milwaukee, and Dietmar Wolfram, PhD
Overview: It is increasingly common for researchers to make their data freely available. This is often a requirement of funding agencies but also consistent with the principles of open science, according to which all research data should be shared and made available for reuse. Once data is reused, the researchers who have provided access to it should be acknowledged for their contributions, much as authors are recognised for their publications through citation. Hyoungjoo Park and Dietmar Wolfram have studied characteristics of data sharing, reuse, and citation and found that current data citation practices do not yet benefit data sharers, with little or no consistency in their format. More formalised citation practices might encourage more authors to make their data available for reuse.
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
Research Data Sharing and Re-Use: Practical Implications for Data Citation Practice that Benefit Researchers
1. Digital Scholar
Webinar
April 4, 2018
Hosted by the Southern California Clinical and Translational Science Institute (SC CTSI)
University of Southern California (USC) and Children’s Hospital Los Angeles (CHLA)
6. Today’s Learning Objectives
Describe the characteristics and strengths of digital forms of data
sharing, reuse, and citation
Describe methods to implement data citation practice that benefit
your research
Describe potential weaknesses of digital research data sharing
practices
7. Hyoungjoo Park
Today’s Speakers
Hyoungjoo Park, PhD candidate, School of
Information Studies, University of Wisconsin-
Milwaukee
AND
Dietmar Wolfram, PhD, Professor, School of
Information Studies, University of Wisconsin-Milwaukee
Dietmar Wolfram
9. Questions: Please use the Q&A
Feature
1. Click on the tab here to
access Q&A
2. Ask and post question here
1
2
10. Research Data Sharing and Re-Use:
Practical Implications for
Researchers
Hyoungjoo Park
Dietmar Wolfram
School of Information Studies
University of Wisconsin-Milwaukee
Presentation for Digital Scholar Webinar series on May 2nd
11. Introduction: The Open Science Movement
From the FOSTER Project:
https://www.fosteropenscience.eu/content/what-open-science-introduction
12. Introduction: The Open Science Movement
From the FOSTER Project:
https://www.fosteropenscience.eu/content/what-open-science-introduction
Open Data
Datasets become publicly
available to others
Problem: How do you give
credit to data sharers?
13. What Can be Considered Research Data?
“…recorded factual material commonly accepted in the scientific
community as necessary to validate research findings” OMB Circ. A-110
• Datasets: physical world, human subject
• Images
• Samples
• Genetic material
• Software
• Field notes
• … many more
14. Discovery
• Important for depositing and accessing data
• Data repositories
• Institutional: e.g., U. of Michigan’s ICPSR (https://www.icpsr.umich.edu/icpsrweb/)
• Government: NIH
(https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html)
• Registry of Research Data Repositories (www.re3data.org/)
• Data citation sources
• Clarivate Analytics Data Citation Index -
indexes > 300 repositories
• DataCite - leading open data initiatives
(datacite.org)
15. Open
• Data sharers/providers & users need to understand acceptable data usage
• Federal funding mandates & high-profile journals
since 2003 since 2011
• Creative Commons licenses may dictate acceptable usage
• See Open Data Handbook (http://opendatahandbook.org/)
16. Quality
• Caution
• Datasets in repositories may not be refereed
• Data may not be appropriately documented
• Data journals provide some quality control through peer review
• Trust judgment, such as validity of data, is important for data reusers
• See FOSTER website for examples
(https://www.fosteropenscience.eu/foster-taxonomy/open-data-journals)
17. Reuse
• Challenges to date
• Scalability, granularity
• Infrastructure
• Dynamics: frequent updates, evolving data
• Qualitative data sharing/reuse
• Standardization is not a current practice
• Most data repositories only require simple metadata for data description
• Many repositories do not provide DOIs
18. Credit
• The need for data citation
• Data scooping, planarization, misuse
• Insufficient credit to data authors
• Assign credit, document evidence, support discovery
• Issues with data citation
• Inconsistent practice
• Invisible citations
• Courtesy authorship
19. Why Data Citation is Important
• Current status
• Indexers (Web of Science, Scopus, Google Scholar) currently lack support for data
citation
• New “data and software availability” section in some journals (e.g., F1000)
• Research studies on data citation
• 69% increase in bib. citations when description of data is shared (Piwowar et al.,
2007)
• Informal data citation is more widely found than formal data sharing (Park &
Wolfram, 2017)
20. Our Studies on Data Citation
(Park & Wolfram, 2017; Park, You & Wolfram, Accepted)
• We examined sets of full text articles in biomedical disciplines to
determine prevalence of formal & informal data citation
• Key findings
• Data citation is most common in biomedical fields
• Informal data citation is far more common than formal citation
• Authors are more likely to informally cite datasets outside of the references
• Data citation indexing services don’t pick these up
• Self-citation is somewhat common
21. Some Examples
• Example of formal citation
• Examples of informal data citation for sharing and reuse
22. Where are Authors Acknowledging Data?
Citing articles Total citations
Data reuse Main text 29
References 17
Supplementary information 16
Acknowledgment 4
Data sharing Main text 173
References 71
Supplementary information 60
Acknowledgment 12
Formal
Formal
23. Recommendations for Best Practices: General
Need for standardized approaches for citation
• DataCite, W3C PROV, DC, or W3C DCAT
• Metadata: data name, primary author/contributors (name and ORCID), DOI or
other unique and persistent identifier, and location where the data has been
published/archived
Data citation sources need to be more comprehensive
• Need broader coverage of data repositories
• Granularity of sources
24. Recommendations for Best Practices: Authors
Authors need to be encouraged to share their data
• Rely on repositories that are indexed by citation databases
(Web of Science, master data repository list, >300 indexed repositories
http://wokinfo.com/products_tools/multidisciplinary/dci/repositories/search/ )
• Use repositories that provide DOIs to promote discovery & credit
(e.g., zenodo)
Authors need to be familiar with data citation practices
• Formally cite the data sources you use, and not just in passing
(bibliographic reference, identifier, link)
Journals need to get on board to encourage author data citation
• Journal policies to require formal data citation
25. Elements of Data Citation (ICPSR)
Minimum elements required for dataset identification and retrieval.
Fewer or additional elements may be requested by author guidelines or style manuals.
• Author: Name(s) of individuals or entities responsible for the creation of the dataset.
• Date of Publication: Year the dataset was published or disseminated.
• Title: Complete title of the dataset, including the edition or version number
• Publisher and/or Distributor: Organizational entity that makes the dataset available
by archiving, producing, publishing, and/or distributing the dataset.
• Electronic Location or Identifier: Web address or unique, persistent, global identifier
used to locate the dataset (such as a DOI). Append the date retrieved if the title and
locator are not specific to the exact instance of the data you used.
https://www.icpsr.umich.edu/files/ICPSR/enewsletters/iassist.html
26. Style Guidelines
• APA (6th edition)
• Smith, T.W., Marsden, P.V., & Hout, M. (2011). General social survey, 1972-2010 cumulative
file(ICPSR31521-v1) [data file and codebook]. Chicago, IL: National Opinion Research Center
[producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research
[distributor]. doi: 10.3886/ICPSR31521.v1
• MLA (7th edition)
• Smith, Tom W., Peter V. Marsden, and Michael Hout. General Social Survey, 1972-2010 Cumulative
File. ICPSR31521-v1. Chicago, IL: National Opinion Research Center [producer]. Ann Arbor, MI:
Inter-university Consortium for Political and Social Research [distributor], 2011. Web. 23 Jan 2012.
doi:10.3886/ICPSR31521.v1
• Chicago (16th edition) (author-date)
• Smith, Tom W., Peter V. Marsden, and Michael Hout. 2011. General Social Survey, 1972-2010
Cumulative File. ICPSR31521-v1. Chicago, IL: National Opinion Research Center. Distributed by Ann
Arbor, MI: Inter-university Consortium for Political and Social Research.
doi:10.3886/ICPSR31521.v1
https://www.icpsr.umich.edu/files/ICPSR/enewsletters/iassist.html
27. References
• Christenhusz, G. M., Devriendt, K., & Dierickx, K. (2013). To tell or not to tell? A
systematic review of ethical reflections on incidental findings arising in genetics contexts.
European Journal of Human Genetics, 21, 248-255.
• Park, H., & Wolfram, D. (2017). An examination of research data sharing and re-use:
implications for data citation practice. Scientometrics, 111(1), 443-461.
• Park, H., You, S., & Wolfram, D. (Accepted). Informal data citation for data sharing and re-
use is more common than formal data citation in biomedical fields. Journal of the
Association for Information Science and Technology.
• Piwowar, H. A., Day, R. S., & Fridsma, D. B. (2007). Sharing detailed research data is
associated with increased citation rate. PloS one, 2(3), e308.
• Tucker, K., Branson, J., Dilleen, M., Hollis, S., Loughlin, P., Nixon, M. J., & Williams, Z.
(2016). Protecting patient privacy when sharing patient-level data from clinical trials.
BMC Medical Research Methodology, 16(1), 77.
28. Q u e s t i o n s
Program director: Katja Reuter, PhD
Email: katja.reuter@usc.edu
Twitter: @dmsci
Next Digital Scholar Webinar
I n f o r m a t i o n a b o u t
t h e p r o g r a m
http://sc-ctsi.org/digital-scholar/
May 2nd, 2018 | 12-1PM PST
Topic: Leveraging Medical Health Record Data for Identifying Research Study
Participants: Practical Guidance on Using Clinical Research Informatics
Applications in Your Research
Speakers: Juan Espinoza, MD, FAAP, Assistant Professor of Clinical
Pediatrics, Keck School of Medicine of USC, Physician and Director of Clinical
Research Informatics, Children’s Hospital Los Angeles; and Mark Abajian,
Applications Lead, Clinical Research Informatics, SC CTSI
Register at: https://bit.ly/2GvT8sa