SlideShare uma empresa Scribd logo
1 de 1
Baixar para ler offline
Mark A. Parsons and Peter A. Fox
19 December 2014
Why Data Citation Currently Misses the Point
References:
1Joint Declaration of Data Citation Principles, https://www.force11.org/datacitation
2Chawla D S. 2014. Could digital badges clarify the roles of co-authors? ScienceInsider
http://news.sciencemag.org/scientific-community/2014/11/could-digital-badges-clarify-roles-
co-authors. See also http://projectcredit.net.
3ESIP Data Stewardship Committee http://wiki.esipfed.org/index.php/
Preservation_and_Stewardship
4Donovan C and S Hanney. 2011. The payback framework explained. Research Evaluation
20 (3): 181-183. http://dx.doi.org/10.3152/095820211X13118583635756
What’s the use case?
In recent years, the data management community has begun to codify data
citation practices and associated technologies, especially persistent
identifiers. Nonetheless, digital data sets are rarely cited formally or
specifically. Moreover, citation has done little to open up data in the
unindexed deep Web. There are many reasons for this, but we believe one
problem is that data managers expect too much from classic bibliographic-
style data citation and assume false parallels between data and literature.
The idea of formalizing data citation emerged in the 1990s, primarily as a
mechanism to encourage and reward data sharing by giving credit to data
“authors”. The approach seemed a logical extension of the publishing
incentive, but it has not provided a strong incentive to share or done much
to expose hidden data. We need to reconsider the use case, but instead
work-around efforts such as “data journals” have emerged to try and make
data publishing more akin to literature publishing. Meanwhile, the
community is recognizing other purposes of citation, notably to help ensure
a scientific result can be verified. After much discussion amongst competing
views, the community converged around a core set of data citation
principles1. These principles are an important step forward, but they are
primarily oriented toward formal, scholarly citation. They hint at, but do not
fully consider, the myriad ways data are used.
In this poster, we take a broader view on what we are trying to accomplish
with data citation by exploring several use cases around attribution,
provenance, and impact. We seek to start a conversation on how we can
robustly address the myriad use cases that begin to uncover the deep Web.
We suggest that we need more sophisticated, diverse, and nuanced
approaches to actually address the many use cases of identifying,
tracking, and enhancing data use.
1. Attribution and Credit
Provide fair and recognized attribution for all
personnel involved in creating a data set.
Some concerns:
• Who is the
“author” of a
data set?
• What is the
appropriate
credit
mechanism for
all involved?
• Who gets credit
for what?
Some ideas:
Project CRediT has defined a taxonomy of contributor roles for
publications and suggests using digital badges that detail what
each author did for the work and link to their profiles elsewhere on
the Web2. Can we do this for data?
Project CRediT (Contributor Roles Taxonomy)
why not change the world? ®
2. Tracking and Provenance
Identify and trace all observations used in
forcing and constraining a model run.
Some concerns:
• How to capture the
purpose of the data in
the model e.g. forcing,
assimilation, boundary
conditions?
• How to reference the
precise version and
subset used.
• How far back does one need to go (see figure below).
• What are references (PIDs) pointing too?
Some ideas:
This is really an issue of provenance not just reference. Full
reproducibility requires being able to trace data, processes, and
tools. While persistent identifiers are crucial they are insufficient. A
fuller semantic description of the provenance is required as well
as richer context description. See provenance work of ESIP3.
3. Impact and Return on Investment
Provide a means to track the use, impact, and
value of a data set.
Some concerns:
• Data are used in many contexts that do not result in a formal
article, e.g. land use planning, disaster response, agricultural
prediction, policy analysis, education, etc.
• How to attribute a particular outcome to a particular person.
• Qualitative impact may
be as important as
quantitative, but it is
hard to measure.
Consider the impact of
this Apollo 8 image on
public consciousness.
Some ideas:
In health and social sciences
researchers have developed
a “Payback Framework” with
a logical model of the complete research process and categories
of payback from research4. Can we extend this and apply it to
data?
If we assign credit badges as suggested in use case 1, can we
aggregate the links to those badges through search engines
rather than relying on constrained citation indices?
“Payback Categories”
1. Knowledge
2. Benefits to future research and research use
3. Benefits from informing policy and product development
4. Environmental and public sector benefits
5. Broader economic benefits
Rensselaer Polytechnic Institute — rpi.edu
Data citation in theory and practice
The Data Citation Principles cover purpose,
function and attributes of citations. These principles
recognize the dual necessity of creating
citation practices that are both human
understandable and machine-
actionable.
1.Importance

Data should be considered legitimate, citable
products of research. Data citations should be
accorded the same importance in the scholarly
record as citations of other research objects,
such as publications.
2.Credit and Attribution

Data citations should facilitate giving scholarly
credit and normative and legal attribution to all
contributors to the data, recognizing that a single
style or mechanism of attribution may not be
applicable to all data.
3.Evidence

In scholarly literature, whenever and wherever
a claim relies upon data, the corresponding data
should be cited.
4.Unique Identification

A data citation should include a persistent
method for identification that is machine
actionable, globally unique, and widely used by a
community.
5.Access

Data citations should facilitate access to the data
themselves and to such associated
metadata, documentation, code, and other
materials, as are necessary for both
humans and machines to make informed
use of the referenced data.
6.Persistence

Unique identifiers, and metadata describing the
data, and its disposition, should persist -- even
beyond the lifespan of the data they describe.
7.Specificity and Verifiability 

Data citations should facilitate identification of,
access to, and verification of the specific data
that support a claim. Citations or citation
metadata should include information about
provenance and fixity sufficient to facilitate
verifying that the specific timeslice, version and/
or granular portion of data retrieved
subsequently is the same as was originally cited.
8.Interoperability and Flexibility

Data citation methods should be sufficiently
flexible to accommodate the variant practices
among communities, but should not differ so
much that they compromise interoperability of
data citation practices across communities.
Joint Declaration of Data Citation Principles1
Figure courtesy Curt Tilmes, NASA
1. Conceptualization
2. Methodology
3. Software
4. Validation
5. Formal analysis
6. Investigation
7. Resources
8. Data curation
9. Writing – original draft
10.Writing – review &
editing
11. Visualization
12.Supervision
13.Project administration
14.Funding acquisition
Initial Conclusions
• Much data use and production occur outside of the regular scholarly discourse (i.e. the literature).
• The principles of data citation are strong, and the increasing use of persistent identifiers is a
significant advance, but we must think beyond bibliographic-style citation.
• It is important to have a citation approach that can readily be accepted by scholarly publishers, but we
should not assume that that approach addresses other concerns. We must separate the various
concerns around citation by considering multiple use cases.
• Indeed we must consider use cases in the first place! What problem are we truly trying to solve?
• Other disciplines are taking a more nuanced look at these issues outside the realm of publication.
Geosciences should too.
Stage 0
ID Topic
Stage 1
Inputs
Stage 2
Research
process
Stage 3
Primary
outputs
Stage 4
Secondary
outputs:
policy,
products
Stage 5
Adoption
Stage 6
Final
outcomes
Stock or reservoir of knowledge
Interface A
project
specification
Interface B
dissemination
direct impact from processes
and outputs to adoption
The political , professional, and industrial environment and wider society
direct feedback paths
The logical model of the Payback Framework
PROJECTCREDITNET
(CONTRIBUTOR
TAXONOMY): J. SCOTT, L.
ALLEN, A. BRAND ET AL.;
BIOMED CENTRAL DESIGN
(BADGE DESIGNS)/
CREATIVE COMMONS 4.0

Mais conteúdo relacionado

Mais procurados

From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipICPSR
 
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...ICPSR
 
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...ASIS&T
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel ASIS&T
 
Guidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access PlansGuidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access PlansICPSR
 
Introduction to Data Management Planning
Introduction to Data Management PlanningIntroduction to Data Management Planning
Introduction to Data Management PlanningSarah Jones
 
DCC and FAIR initiatives
DCC and FAIR initiativesDCC and FAIR initiatives
DCC and FAIR initiativesSarah Jones
 
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...ASIS&T
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data CitationMicah Altman
 
Practical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object PreservationPractical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object PreservationSEAD
 
2013 ICPSR Data Services
2013 ICPSR Data Services2013 ICPSR Data Services
2013 ICPSR Data ServicesICPSR
 
RDAP14: DataNet Federal Consortium Update
RDAP14: DataNet Federal Consortium Update RDAP14: DataNet Federal Consortium Update
RDAP14: DataNet Federal Consortium Update ASIS&T
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesSEAD
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsVivien Bonazzi
 
Data management plan template
Data management plan templateData management plan template
Data management plan template501 Commons
 

Mais procurados (20)

From Data Sharing to Data Stewardship
From Data Sharing to Data StewardshipFrom Data Sharing to Data Stewardship
From Data Sharing to Data Stewardship
 
Levine - Data Curation; Ethics and Legal Considerations
Levine - Data Curation; Ethics and Legal ConsiderationsLevine - Data Curation; Ethics and Legal Considerations
Levine - Data Curation; Ethics and Legal Considerations
 
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
Understanding ICPSR - An Orientation and Tours of ICPSR Data Services and Edu...
 
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
RDAP 16: DMPs and Public Access: An NIH Perspective (Panel 5, DMPs and Public...
 
FAIR data
FAIR dataFAIR data
FAIR data
 
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
 
Guidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access PlansGuidelines for OSTP Data Access Plans
Guidelines for OSTP Data Access Plans
 
Introduction to Data Management Planning
Introduction to Data Management PlanningIntroduction to Data Management Planning
Introduction to Data Management Planning
 
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-researchUc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research
 
DCC and FAIR initiatives
DCC and FAIR initiativesDCC and FAIR initiatives
DCC and FAIR initiatives
 
Tijerina-RDA-NISO-Task Groups-sept11
Tijerina-RDA-NISO-Task Groups-sept11Tijerina-RDA-NISO-Task Groups-sept11
Tijerina-RDA-NISO-Task Groups-sept11
 
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
RDAP 16: Perspective on DMPs, Funders and Public Access (Panel 5: DMPs and Pu...
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data Citation
 
Practical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object PreservationPractical and Conceptual Considerations of Research Object Preservation
Practical and Conceptual Considerations of Research Object Preservation
 
2013 ICPSR Data Services
2013 ICPSR Data Services2013 ICPSR Data Services
2013 ICPSR Data Services
 
RDAP14: DataNet Federal Consortium Update
RDAP14: DataNet Federal Consortium Update RDAP14: DataNet Federal Consortium Update
RDAP14: DataNet Federal Consortium Update
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Presentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research SeriesPresentation to the UM Library Emergent Research Series
Presentation to the UM Library Emergent Research Series
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Data management plan template
Data management plan templateData management plan template
Data management plan template
 

Destaque

Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkResearch Data Alliance
 
Parsons scidatacon2016
Parsons scidatacon2016Parsons scidatacon2016
Parsons scidatacon2016Mark Parsons
 
Parsons citation geodata2014
Parsons citation geodata2014Parsons citation geodata2014
Parsons citation geodata2014Mark Parsons
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open ScienceMark Parsons
 
CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...
CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...
CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...Research Data Alliance
 
Stories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global InfrastructureStories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global InfrastructureResearch Data Alliance
 
Cultural Heritage: when data are much worst than one can believe
Cultural Heritage: when data are much worst than one can believe Cultural Heritage: when data are much worst than one can believe
Cultural Heritage: when data are much worst than one can believe Research Data Alliance
 
Removing Barriers to Data Sharing: the Research Data Alliance
Removing Barriers to Data Sharing: the Research Data AllianceRemoving Barriers to Data Sharing: the Research Data Alliance
Removing Barriers to Data Sharing: the Research Data AllianceResearch Data Alliance
 
Efficient and effective: can we combine both to realize high-value, open, sca...
Efficient and effective: can we combine both to realize high-value, open, sca...Efficient and effective: can we combine both to realize high-value, open, sca...
Efficient and effective: can we combine both to realize high-value, open, sca...Research Data Alliance
 
SoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningSoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningResearch Data Alliance
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
Rda in a_nutshell_february_2017_updated
Rda in a_nutshell_february_2017_updatedRda in a_nutshell_february_2017_updated
Rda in a_nutshell_february_2017_updatedResearch Data Alliance
 

Destaque (17)

Open Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing WorkOpen Data is Not Enough: Making Data Sharing Work
Open Data is Not Enough: Making Data Sharing Work
 
Research Data Alliance Overview
Research Data Alliance OverviewResearch Data Alliance Overview
Research Data Alliance Overview
 
Parsons scidatacon2016
Parsons scidatacon2016Parsons scidatacon2016
Parsons scidatacon2016
 
Parsons citation geodata2014
Parsons citation geodata2014Parsons citation geodata2014
Parsons citation geodata2014
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open Science
 
CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...
CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...
CoBRA guideline : a tool to facilitate sharing, reuse, and reproducibility of...
 
Stories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global InfrastructureStories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global Infrastructure
 
Cultural Heritage: when data are much worst than one can believe
Cultural Heritage: when data are much worst than one can believe Cultural Heritage: when data are much worst than one can believe
Cultural Heritage: when data are much worst than one can believe
 
Removing Barriers to Data Sharing: the Research Data Alliance
Removing Barriers to Data Sharing: the Research Data AllianceRemoving Barriers to Data Sharing: the Research Data Alliance
Removing Barriers to Data Sharing: the Research Data Alliance
 
Efficient and effective: can we combine both to realize high-value, open, sca...
Efficient and effective: can we combine both to realize high-value, open, sca...Efficient and effective: can we combine both to realize high-value, open, sca...
Efficient and effective: can we combine both to realize high-value, open, sca...
 
Research Data Alliance Overview
Research Data Alliance OverviewResearch Data Alliance Overview
Research Data Alliance Overview
 
SoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social MiningSoBigData. European Research Infrastructure for Big Data and Social Mining
SoBigData. European Research Infrastructure for Big Data and Social Mining
 
"Cool" metadata for FAIR data
"Cool" metadata for FAIR data"Cool" metadata for FAIR data
"Cool" metadata for FAIR data
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
Whole brain optical imaging
Whole brain optical imagingWhole brain optical imaging
Whole brain optical imaging
 
Rda in a_nutshell_february_2017_updated
Rda in a_nutshell_february_2017_updatedRda in a_nutshell_february_2017_updated
Rda in a_nutshell_february_2017_updated
 
Data curator: who is s/he?
Data curator: who is s/he?Data curator: who is s/he?
Data curator: who is s/he?
 

Semelhante a Why Data Citation Currently Misses the Point

Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharingJisc RDM
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystemMaryann Martone
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...ResearchSpace
 
Data modeling techniques used for big data in enterprise networks
Data modeling techniques used for big data in enterprise networksData modeling techniques used for big data in enterprise networks
Data modeling techniques used for big data in enterprise networksDr. Richard Otieno
 
Summary of data citation synthesis activity & Review
Summary of data citation synthesis activity & ReviewSummary of data citation synthesis activity & Review
Summary of data citation synthesis activity & ReviewMicah Altman
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsThe University of Edinburgh
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 
27 LIMITATIONS AND OPPORTUNITIES OF SYSTEM DEVELOPMENT METHODS IN WEB INFORMA...
27 LIMITATIONS AND OPPORTUNITIES OF SYSTEM DEVELOPMENT METHODS IN WEB INFORMA...27 LIMITATIONS AND OPPORTUNITIES OF SYSTEM DEVELOPMENT METHODS IN WEB INFORMA...
27 LIMITATIONS AND OPPORTUNITIES OF SYSTEM DEVELOPMENT METHODS IN WEB INFORMA...Amy Isleb
 
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...PhD Assistance
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4
 
ASA conference Feb 2013
ASA conference Feb 2013ASA conference Feb 2013
ASA conference Feb 2013mrkwr
 
06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docx
06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docx06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docx
06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docxsmithhedwards48727
 
A Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdfA Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdfAmber Ford
 
Data ecosystems: turning data into public value
Data ecosystems:  turning data into public valueData ecosystems:  turning data into public value
Data ecosystems: turning data into public valueSlim Turki, Dr.
 

Semelhante a Why Data Citation Currently Misses the Point (20)

Recognising data sharing
Recognising data sharingRecognising data sharing
Recognising data sharing
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystem
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...
 
Data modeling techniques used for big data in enterprise networks
Data modeling techniques used for big data in enterprise networksData modeling techniques used for big data in enterprise networks
Data modeling techniques used for big data in enterprise networks
 
Summary of data citation synthesis activity & Review
Summary of data citation synthesis activity & ReviewSummary of data citation synthesis activity & Review
Summary of data citation synthesis activity & Review
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflows
 
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
27 LIMITATIONS AND OPPORTUNITIES OF SYSTEM DEVELOPMENT METHODS IN WEB INFORMA...
27 LIMITATIONS AND OPPORTUNITIES OF SYSTEM DEVELOPMENT METHODS IN WEB INFORMA...27 LIMITATIONS AND OPPORTUNITIES OF SYSTEM DEVELOPMENT METHODS IN WEB INFORMA...
27 LIMITATIONS AND OPPORTUNITIES OF SYSTEM DEVELOPMENT METHODS IN WEB INFORMA...
 
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
Selection of Articles using Data Analytics for Behavioral Dissertation Resear...
 
Assessing Digital Output in New Ways
Assessing Digital Output in New WaysAssessing Digital Output in New Ways
Assessing Digital Output in New Ways
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Paradigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the tableParadigm4 Research Report: Leaving Data on the table
Paradigm4 Research Report: Leaving Data on the table
 
ASA conference Feb 2013
ASA conference Feb 2013ASA conference Feb 2013
ASA conference Feb 2013
 
06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docx
06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docx06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docx
06877 Topic Implicit Association TestNumber of Pages 1 (Doub.docx
 
Digital Curation 101 - Taster
Digital Curation 101 - TasterDigital Curation 101 - Taster
Digital Curation 101 - Taster
 
A Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdfA Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdf
 
Data ecosystems: turning data into public value
Data ecosystems:  turning data into public valueData ecosystems:  turning data into public value
Data ecosystems: turning data into public value
 

Último

convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 

Último (20)

convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 

Why Data Citation Currently Misses the Point

  • 1. Mark A. Parsons and Peter A. Fox 19 December 2014 Why Data Citation Currently Misses the Point References: 1Joint Declaration of Data Citation Principles, https://www.force11.org/datacitation 2Chawla D S. 2014. Could digital badges clarify the roles of co-authors? ScienceInsider http://news.sciencemag.org/scientific-community/2014/11/could-digital-badges-clarify-roles- co-authors. See also http://projectcredit.net. 3ESIP Data Stewardship Committee http://wiki.esipfed.org/index.php/ Preservation_and_Stewardship 4Donovan C and S Hanney. 2011. The payback framework explained. Research Evaluation 20 (3): 181-183. http://dx.doi.org/10.3152/095820211X13118583635756 What’s the use case? In recent years, the data management community has begun to codify data citation practices and associated technologies, especially persistent identifiers. Nonetheless, digital data sets are rarely cited formally or specifically. Moreover, citation has done little to open up data in the unindexed deep Web. There are many reasons for this, but we believe one problem is that data managers expect too much from classic bibliographic- style data citation and assume false parallels between data and literature. The idea of formalizing data citation emerged in the 1990s, primarily as a mechanism to encourage and reward data sharing by giving credit to data “authors”. The approach seemed a logical extension of the publishing incentive, but it has not provided a strong incentive to share or done much to expose hidden data. We need to reconsider the use case, but instead work-around efforts such as “data journals” have emerged to try and make data publishing more akin to literature publishing. Meanwhile, the community is recognizing other purposes of citation, notably to help ensure a scientific result can be verified. After much discussion amongst competing views, the community converged around a core set of data citation principles1. These principles are an important step forward, but they are primarily oriented toward formal, scholarly citation. They hint at, but do not fully consider, the myriad ways data are used. In this poster, we take a broader view on what we are trying to accomplish with data citation by exploring several use cases around attribution, provenance, and impact. We seek to start a conversation on how we can robustly address the myriad use cases that begin to uncover the deep Web. We suggest that we need more sophisticated, diverse, and nuanced approaches to actually address the many use cases of identifying, tracking, and enhancing data use. 1. Attribution and Credit Provide fair and recognized attribution for all personnel involved in creating a data set. Some concerns: • Who is the “author” of a data set? • What is the appropriate credit mechanism for all involved? • Who gets credit for what? Some ideas: Project CRediT has defined a taxonomy of contributor roles for publications and suggests using digital badges that detail what each author did for the work and link to their profiles elsewhere on the Web2. Can we do this for data? Project CRediT (Contributor Roles Taxonomy) why not change the world? ® 2. Tracking and Provenance Identify and trace all observations used in forcing and constraining a model run. Some concerns: • How to capture the purpose of the data in the model e.g. forcing, assimilation, boundary conditions? • How to reference the precise version and subset used. • How far back does one need to go (see figure below). • What are references (PIDs) pointing too? Some ideas: This is really an issue of provenance not just reference. Full reproducibility requires being able to trace data, processes, and tools. While persistent identifiers are crucial they are insufficient. A fuller semantic description of the provenance is required as well as richer context description. See provenance work of ESIP3. 3. Impact and Return on Investment Provide a means to track the use, impact, and value of a data set. Some concerns: • Data are used in many contexts that do not result in a formal article, e.g. land use planning, disaster response, agricultural prediction, policy analysis, education, etc. • How to attribute a particular outcome to a particular person. • Qualitative impact may be as important as quantitative, but it is hard to measure. Consider the impact of this Apollo 8 image on public consciousness. Some ideas: In health and social sciences researchers have developed a “Payback Framework” with a logical model of the complete research process and categories of payback from research4. Can we extend this and apply it to data? If we assign credit badges as suggested in use case 1, can we aggregate the links to those badges through search engines rather than relying on constrained citation indices? “Payback Categories” 1. Knowledge 2. Benefits to future research and research use 3. Benefits from informing policy and product development 4. Environmental and public sector benefits 5. Broader economic benefits Rensselaer Polytechnic Institute — rpi.edu Data citation in theory and practice The Data Citation Principles cover purpose, function and attributes of citations. These principles recognize the dual necessity of creating citation practices that are both human understandable and machine- actionable. 1.Importance
 Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications. 2.Credit and Attribution
 Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data. 3.Evidence
 In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited. 4.Unique Identification
 A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community. 5.Access
 Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data. 6.Persistence
 Unique identifiers, and metadata describing the data, and its disposition, should persist -- even beyond the lifespan of the data they describe. 7.Specificity and Verifiability 
 Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice, version and/ or granular portion of data retrieved subsequently is the same as was originally cited. 8.Interoperability and Flexibility
 Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities. Joint Declaration of Data Citation Principles1 Figure courtesy Curt Tilmes, NASA 1. Conceptualization 2. Methodology 3. Software 4. Validation 5. Formal analysis 6. Investigation 7. Resources 8. Data curation 9. Writing – original draft 10.Writing – review & editing 11. Visualization 12.Supervision 13.Project administration 14.Funding acquisition Initial Conclusions • Much data use and production occur outside of the regular scholarly discourse (i.e. the literature). • The principles of data citation are strong, and the increasing use of persistent identifiers is a significant advance, but we must think beyond bibliographic-style citation. • It is important to have a citation approach that can readily be accepted by scholarly publishers, but we should not assume that that approach addresses other concerns. We must separate the various concerns around citation by considering multiple use cases. • Indeed we must consider use cases in the first place! What problem are we truly trying to solve? • Other disciplines are taking a more nuanced look at these issues outside the realm of publication. Geosciences should too. Stage 0 ID Topic Stage 1 Inputs Stage 2 Research process Stage 3 Primary outputs Stage 4 Secondary outputs: policy, products Stage 5 Adoption Stage 6 Final outcomes Stock or reservoir of knowledge Interface A project specification Interface B dissemination direct impact from processes and outputs to adoption The political , professional, and industrial environment and wider society direct feedback paths The logical model of the Payback Framework PROJECTCREDITNET (CONTRIBUTOR TAXONOMY): J. SCOTT, L. ALLEN, A. BRAND ET AL.; BIOMED CENTRAL DESIGN (BADGE DESIGNS)/ CREATIVE COMMONS 4.0