SlideShare uma empresa Scribd logo
1 de 44
Baixar para ler offline
Varsha Khodiyar, PhD
Data Curation Editor, Scientific Data
Nature Publishing Group
@varsha_khodiyar
@scientificdata
Tweet with #SDJPN16
Gaining credit for sharing research data
Data publishing with Scientific Data
RIKEN Center for Life Science Technologies 4th March 2016
My background
• Joined Scientific Data in October 2014
• Professional data curator since 2003
• PhD in Molecular Biology from the University of
Leicester
• Contributed to the Human Genome Project as
member of the Human Gene Nomenclature
Committee (HGNC)
• Gene Ontology curator for 8 years, at University
College London, UK
• 3 years of open data publishing experience
2
Why share research data?
Generating research data is expensive
Just 18.1% NIH grant applications funded in 2014*
• Hours spent writing grants?
• Hours spent reviewing grants?
Resources are finite/expensive
• Modified animals
• Specialized reagents
Time and effort taken in the laboratory to generate
good, valid data
* report.nih.gov/success_rates/Success_ByIC.cfm
Irreproducibility of published science
Figure 1 - Ioannidis JPA. et al. Repeatability of published microarray gene
expression analyses. Nature Genetics 41, 149–55 (2009) doi:10.1038/ng.295
Withholding data impacts on human health
Clinical study reports, detailed data and software code available at Dryad
Digital Repository doi:10.5061/dryad.bv8j6 and www.Study329.org
• Diversity of analyses and opinion
• New research
• testing of new hypotheses
• new analysis methods
• meta-analyses to create new
datasets
• studies on data collection methods
• Education of new researchers
• Increased return on investment in
research
Vickers AJ: Whose data set is it anyway? Sharing raw
data from randomized trials. Trials 2006, 7:15
Hrynaszkiewicz I, Altman DG: Towards agreement on
best practice for publishing raw clinical trial data.
Trials 2009, 10:17
Sharing data promotes
Researchers already share data
• Most researchers are sharing
data, and using the data of
others
• Direct contact between
researchers (on request) is a
common way of sharing data
• Repositories are second most
common method of sharing
Kratz and Strasser (2015) doi: 10.1371/journal.pone.0117619 9
Some problems…
• Sharing upon request relies heavily on trust
• Informally stored data associated with published works disappears at a
rate of ~17% per year (Vines et al. 2014; doi: 10.1016/j.cub.2013.11.014)
• Datasets not referenced in a manuscript are essentially invisible (a.k.a
“Dark data”)
• If data are available, they are often not interpretable or reusable
because sufficient detail is not included
• Data producers do not get appropriate credit for their work
10
www.nature.com/scientificdata
Credit – Scholarly credit for publishing data; all publications are indexed
and citeable.
Reuse – Standardized and detailed descriptions enables easier reuse of
published research data.
Quality – Rigorous peer-review on technical quality and reusability.
Editorial Board of experts in their field maintain community standards.
Discovery – Curated, machine-readable metadata for dataset discovery.
Validated links to published data in each article.
Open – Use of CC-BY licence for articles and CC0 for metadata. Promote
use of open licences for published data.
Service – Commitment to excellent service for authors and readers.
What is a Data Descriptor?
Data Descriptors have human and machine readable
components
13
Human readable
representation of
study
i.e. article (HTML &
PDF)
Human readable
representation of
study
i.e. article (HTML
& PDF)
Machine
readable
representation
of study
i.e. metadata
Synthesis
Analysis
Conclusions
What did I do to generate the data?
How was the data processed?
Where is the data?
Who did what and when?
Methods and technical analyses supporting the quality of the measurements.
Do not contain tests of new scientific hypotheses
Comparison of Data Descriptor to traditional article
What types of data can be published?
15
Decades
old
dataset
Standalone
dataset
Data that has been
used in an analysis
article
Large
consortium
dataset
Data from a
single
experiment
Data that the
researcher finds
valuable and that
others might find
useful too
Data associated
with a high impact
analysis article
When can a Data Descriptor be published?
16
After data
analysis has
been
published
Before analysis
has been
published
Authors not
intending to
analyse data
Data Descriptors can be
submitted and published
at any point in the
research workflow, i.e.
whenever it makes most
sense for your data
After data
analysis has
been
published
Before the
analysis has
been published
Publication
alongside analysis
article
Scientific Data accepts submissions from all quantitative
research disciplines
17
Helping authors find the right
place for their data
Scientific Data’s Repository List
Browse our recommended data repositories online.
• We currently list almost 80 repositories, across biological, medical,
physical and social sciences
• When required, we provide guidance to authors on the best place to
store their data
www.nature.com/sdata/data-policies/repositories
Generation of machine readable
metadata
• We want to capture metadata about the dataset being described in each Data
Descriptor
• The manuscript captures human readable metadata needed for data reuse
• The curated metadata records capture machine readable metadata needed for
machine based data discovery
Metadata at Scientific Data
ISA-Tab format for machine readable metadata
22
• Study workflow
• Key sample characteristics
needed for data discovery
• Relates samples to data files
• Shows location of dataset
• Uses controlled vocabularies
and ontologies (where
possible)
Use of community endorsed ontologies and controlled
vocabularies
23
Controlled vocabulary = list of standardized phrases of scientific concepts
Ontology = controlled vocabulary with defined relationships between terms
Structured Summary table from curated metadata
24
Investigation file
Study file
Sample characteristics reported in Structured Summary table:
Organism
Organism part
Cell line
Geographical location
Environment type
Viewing the metadata
25
1.
2.
3.
Metadata for data discovery
Search by:
• Data Repositories
• Experiment design
• Measurements made
• Technologies used
• Factor types
• Sample Characteristics
• Organism
• Environment types
• Geographic locations
scientificdata.isa-explorer.org
Citing Data
Citing my own data
1. In the
article text
2. In the Data
Citation section
Citing data I’ve reused
1. In the
article text
2. In the
References
section
Clinical researchers support sharing, but…
Rathi V, Dzara K, Gross CP, Hrynaszkiewicz I, Joffe S, Krumholz HM, Strait KM, Ross JS:
Sharing of clinical trial data among trialists: a cross sectional survey. BMJ 2012;345:e7570
• Sharing de-identified data via repositories should be
required (236 respondents, 74%)
• Investigators should share de-identified data on request
(229 respondents, 72%)
…clinical data producers have specific concerns
Rathi V, Dzara K, Gross CP, Hrynaszkiewicz I, Joffe S, Krumholz HM, Strait KM, Ross JS: Sharing of
clinical trial data among trialists: a cross sectional survey. BMJ 2012;345:e7570
Example initiatives for sharing clinical data
Yale Open Data Access (YODA) & Clinical Study Data
Request (CSDR) projects:
• Data Use Agreements (DUAs)
• Controlled access environment
• Scientific validity of reanalysis checked
• Independent governance
• Data anonymisation checks
http://yoda.yale.edu/
https://www.clinicalstudydatarequest.com/
Clinical data publication at Scientific Data
• Identify repositories able to archive clinical data
• Work with identified repositories to establish workflows for
peer review and publication, whilst maintaining patient
privacy
• Facilitate specialist peer review process for clinical data, for
example ensure peer reviewers have agreed to terms of data
use agreement
Hrynaszkiewicz, I., Khodiyar, V., Hufton, A. & Sansone, S. A. Publishing descriptions of non-
public clinical datasets: guidance for researchers, repositories, editors and funding
organisations. BioRxiv http://dx.doi.org/10.1101/021667 (2015).
A robust data-on-request workflow?
Published Data Descriptor with clinical data
Data Records
section details
how to access
the data
Links to restricted access data Data
Citations link
to repository
Data files
requiring
permission
to access
Freely
accessible
data files
Data Reuse stories
Data reuse by (some of) the same researchers
38
Data reuse by other researchers in the same field
39
“The Data Descriptor made it easier
to use the data, for me it was critical
that everything was there…all the
technical details like voxel size.”
Professor Daniele Marinazzo
According to Google
Scholar, cited 43 times!
(February 2016)
Data reuse and citation by researchers
41
www.bbc.co.uk/news/science-environment-33057402
Data reuse by the non-research community
Data reuse by the non-research community
42
http://www.nytimes.com/interactive/2014/12/30/science/history-of-ebola-in-24-outbreaks.html
Data Descriptors…
• …enable you to gain scholarly credit for your data gathering
efforts.
• …are human AND machine readable.
• …can be published with, or independently of, an analysis article.
• …can be published point in the research workflow.
• …allow the publication and discovery of clinical data, whilst
maintaining your patients privacy.
• …result in greater reuse and citation by fellow members of your
research community.
• …extend the impact of your research data by enabling access to
and reuse by the non-research community.
43
Get more
from
your data
Preserve
it
Encourage
reuse
Get credit
for it
Visit nature.com/sdata
Email scientificdata@nature.com
Tweet @ScientificData #SDJPN16

Mais conteúdo relacionado

Mais procurados

Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds
 

Mais procurados (20)

Clinical Data Publishing at Scientific Data
Clinical Data Publishing at Scientific DataClinical Data Publishing at Scientific Data
Clinical Data Publishing at Scientific Data
 
Research data: publishers, policies and patient privacy
Research data: publishers, policies and patient privacyResearch data: publishers, policies and patient privacy
Research data: publishers, policies and patient privacy
 
Roche_open_science_NIOO_KNAW_workshop_NL
Roche_open_science_NIOO_KNAW_workshop_NLRoche_open_science_NIOO_KNAW_workshop_NL
Roche_open_science_NIOO_KNAW_workshop_NL
 
Gaining credit for sharing research data: Viewpoints on Data Publishing
Gaining credit for sharing research data: Viewpoints on Data PublishingGaining credit for sharing research data: Viewpoints on Data Publishing
Gaining credit for sharing research data: Viewpoints on Data Publishing
 
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
Oxford DTP - Sansone - Data publications and Scientific Data - Dec 2014
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystem
 
Why should researchers care about data curation?
Why should researchers care about data curation?Why should researchers care about data curation?
Why should researchers care about data curation?
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOS
 
THOR Workshop - Data Publishing
THOR Workshop - Data PublishingTHOR Workshop - Data Publishing
THOR Workshop - Data Publishing
 
2015 12 ebi_ganley_final
2015 12 ebi_ganley_final2015 12 ebi_ganley_final
2015 12 ebi_ganley_final
 
UWA Research Week 2016
UWA Research Week 2016UWA Research Week 2016
UWA Research Week 2016
 
Workflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopterWorkflows for Publishing Data; Scientific Data's experience as an early adopter
Workflows for Publishing Data; Scientific Data's experience as an early adopter
 
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
Jonathan Tedds Distinguished Lecture at DLab, UC Berkeley, 12 Sep 2013: "The ...
 
Keeping up to date with information retrieval research: Summarized Research i...
Keeping up to date with information retrieval research: Summarized Research i...Keeping up to date with information retrieval research: Summarized Research i...
Keeping up to date with information retrieval research: Summarized Research i...
 
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...
OpenAIRE-COAR conference 2014: Allowing research data to shine: providing tan...
 
Privacy and Publication: challenges and opportunities for clinical data
Privacy and Publication: challenges and opportunities for clinical dataPrivacy and Publication: challenges and opportunities for clinical data
Privacy and Publication: challenges and opportunities for clinical data
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
Scott Edmunds: Channeling the Deluge: Reproducibility & Data Dissemination in...
 
Peer Reviewing Data: experiences from a data journal
Peer Reviewing Data: experiences from a data journalPeer Reviewing Data: experiences from a data journal
Peer Reviewing Data: experiences from a data journal
 

Destaque

Destaque (12)

Open science: your questions answered
Open science: your questions answeredOpen science: your questions answered
Open science: your questions answered
 
Open for science to support replication
Open for science to support replicationOpen for science to support replication
Open for science to support replication
 
Millenium Smoke Shop_Vaporizers
Millenium Smoke Shop_VaporizersMillenium Smoke Shop_Vaporizers
Millenium Smoke Shop_Vaporizers
 
Share & Flourish workshop, Leiden, August 2014
Share & Flourish workshop, Leiden, August 2014Share & Flourish workshop, Leiden, August 2014
Share & Flourish workshop, Leiden, August 2014
 
Shopmillenium - Smoke Shop
Shopmillenium - Smoke ShopShopmillenium - Smoke Shop
Shopmillenium - Smoke Shop
 
Funcion o
Funcion oFuncion o
Funcion o
 
Design Workshop at UI/UX Summit, Esri User Conference 2014
Design Workshop at UI/UX Summit, Esri User Conference 2014Design Workshop at UI/UX Summit, Esri User Conference 2014
Design Workshop at UI/UX Summit, Esri User Conference 2014
 
Sahara Hookahs
Sahara HookahsSahara Hookahs
Sahara Hookahs
 
Funcion si
Funcion si Funcion si
Funcion si
 
Funcion y
Funcion yFuncion y
Funcion y
 
VizEx View HTML5 Workshop
VizEx View HTML5 WorkshopVizEx View HTML5 Workshop
VizEx View HTML5 Workshop
 
Expo sgop-t1
Expo sgop-t1Expo sgop-t1
Expo sgop-t1
 

Semelhante a Gaining credit for sharing research data

Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Susanna-Assunta Sansone
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producers
Incisive_Events
 
NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013
NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013
NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013
Susanna-Assunta Sansone
 
Journal Data Sharing Policies rscd2018
Journal Data Sharing Policies rscd2018Journal Data Sharing Policies rscd2018
Journal Data Sharing Policies rscd2018
SusanMRob
 

Semelhante a Gaining credit for sharing research data (20)

Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014Open Access Week - Oxford, 20-24 Oct 2014
Open Access Week - Oxford, 20-24 Oct 2014
 
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
Research Data Sharing and Re-Use: Practical Implications for Data Citation Pr...
 
Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015 Scientific Data and peer review session at Dryad event, May 2015
Scientific Data and peer review session at Dryad event, May 2015
 
The challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can helpThe challenge of sharing data well, how publishers can help
The challenge of sharing data well, how publishers can help
 
Publishing perspectives on data management & future directions
Publishing perspectives on data management & future directionsPublishing perspectives on data management & future directions
Publishing perspectives on data management & future directions
 
FAIR for the future: embracing all things data
FAIR for the future: embracing all things dataFAIR for the future: embracing all things data
FAIR for the future: embracing all things data
 
Open Science Incentives/Veerle van den Eynden
Open Science Incentives/Veerle van den EyndenOpen Science Incentives/Veerle van den Eynden
Open Science Incentives/Veerle van den Eynden
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
 
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATOMetadata challenges research and re-usable data - BioSharing, ISA and STATO
Metadata challenges research and re-usable data - BioSharing, ISA and STATO
 
INSERM - Data Management & Reuse of Health Data - May 2017
INSERM - Data Management & Reuse of Health Data - May 2017INSERM - Data Management & Reuse of Health Data - May 2017
INSERM - Data Management & Reuse of Health Data - May 2017
 
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
 
ODIN Final Event - The Care and Feeding of Scientific Data
ODIN Final Event - The Care and Feeding of Scientific DataODIN Final Event - The Care and Feeding of Scientific Data
ODIN Final Event - The Care and Feeding of Scientific Data
 
Data publication: Discover, Explore, Visualise
Data publication: Discover, Explore, VisualiseData publication: Discover, Explore, Visualise
Data publication: Discover, Explore, Visualise
 
Research Integrity Advisor and Data Management
Research Integrity Advisor and Data ManagementResearch Integrity Advisor and Data Management
Research Integrity Advisor and Data Management
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producers
 
NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013
NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013
NPG Scientific Data Overview for GBIF - TDWG meeting Oct 2013
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
 
Rscd 2018 Journal policies - natasha simons
Rscd 2018 Journal policies - natasha simonsRscd 2018 Journal policies - natasha simons
Rscd 2018 Journal policies - natasha simons
 
Journal Data Sharing Policies rscd2018
Journal Data Sharing Policies rscd2018Journal Data Sharing Policies rscd2018
Journal Data Sharing Policies rscd2018
 

Mais de Varsha Khodiyar

Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...
Varsha Khodiyar
 
New approaches to data management: supporting FAIR data sharing at Springer N...
New approaches to data management: supporting FAIR data sharing at Springer N...New approaches to data management: supporting FAIR data sharing at Springer N...
New approaches to data management: supporting FAIR data sharing at Springer N...
Varsha Khodiyar
 
The value of data curation as part of the publishing process
The value of data curation as part of the publishing processThe value of data curation as part of the publishing process
The value of data curation as part of the publishing process
Varsha Khodiyar
 

Mais de Varsha Khodiyar (14)

Digital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data scienceDigital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data science
 
Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...Lessons from the UK: Data access, patient trust & real-world impact with heal...
Lessons from the UK: Data access, patient trust & real-world impact with heal...
 
COVID-19 variants, vaccines and tests
COVID-19 variants, vaccines and testsCOVID-19 variants, vaccines and tests
COVID-19 variants, vaccines and tests
 
COVID-19 variants and vaccines
COVID-19 variants and vaccinesCOVID-19 variants and vaccines
COVID-19 variants and vaccines
 
Data citation and sharing during article publication
Data citation and sharing during article publicationData citation and sharing during article publication
Data citation and sharing during article publication
 
The importance of research data repositories
The importance of research data repositoriesThe importance of research data repositories
The importance of research data repositories
 
What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?What role can publishers play in the open data ecosystem?
What role can publishers play in the open data ecosystem?
 
Five essentials factors for unlocking the potential for Open Research Data
Five essentials factors for unlocking the potential for Open Research Data Five essentials factors for unlocking the potential for Open Research Data
Five essentials factors for unlocking the potential for Open Research Data
 
New approaches to data management: supporting FAIR data sharing at Springer N...
New approaches to data management: supporting FAIR data sharing at Springer N...New approaches to data management: supporting FAIR data sharing at Springer N...
New approaches to data management: supporting FAIR data sharing at Springer N...
 
The value of data curation as part of the publishing process
The value of data curation as part of the publishing processThe value of data curation as part of the publishing process
The value of data curation as part of the publishing process
 
Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...Facilitating good research data management practice as part of scholarly publ...
Facilitating good research data management practice as part of scholarly publ...
 
Practical challenges for researchers in data sharing
Practical challenges for researchers in data sharingPractical challenges for researchers in data sharing
Practical challenges for researchers in data sharing
 
Update from Data policy standardisation and implementation IG
Update from Data policy standardisation and implementation IGUpdate from Data policy standardisation and implementation IG
Update from Data policy standardisation and implementation IG
 
Data Publishing and Institutional Repositories
Data Publishing and Institutional RepositoriesData Publishing and Institutional Repositories
Data Publishing and Institutional Repositories
 

Último

Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 

Último (20)

Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

Gaining credit for sharing research data

  • 1. Varsha Khodiyar, PhD Data Curation Editor, Scientific Data Nature Publishing Group @varsha_khodiyar @scientificdata Tweet with #SDJPN16 Gaining credit for sharing research data Data publishing with Scientific Data RIKEN Center for Life Science Technologies 4th March 2016
  • 2. My background • Joined Scientific Data in October 2014 • Professional data curator since 2003 • PhD in Molecular Biology from the University of Leicester • Contributed to the Human Genome Project as member of the Human Gene Nomenclature Committee (HGNC) • Gene Ontology curator for 8 years, at University College London, UK • 3 years of open data publishing experience 2
  • 4. Generating research data is expensive Just 18.1% NIH grant applications funded in 2014* • Hours spent writing grants? • Hours spent reviewing grants? Resources are finite/expensive • Modified animals • Specialized reagents Time and effort taken in the laboratory to generate good, valid data * report.nih.gov/success_rates/Success_ByIC.cfm
  • 5. Irreproducibility of published science Figure 1 - Ioannidis JPA. et al. Repeatability of published microarray gene expression analyses. Nature Genetics 41, 149–55 (2009) doi:10.1038/ng.295
  • 6. Withholding data impacts on human health Clinical study reports, detailed data and software code available at Dryad Digital Repository doi:10.5061/dryad.bv8j6 and www.Study329.org
  • 7. • Diversity of analyses and opinion • New research • testing of new hypotheses • new analysis methods • meta-analyses to create new datasets • studies on data collection methods • Education of new researchers • Increased return on investment in research Vickers AJ: Whose data set is it anyway? Sharing raw data from randomized trials. Trials 2006, 7:15 Hrynaszkiewicz I, Altman DG: Towards agreement on best practice for publishing raw clinical trial data. Trials 2009, 10:17 Sharing data promotes
  • 8. Researchers already share data • Most researchers are sharing data, and using the data of others • Direct contact between researchers (on request) is a common way of sharing data • Repositories are second most common method of sharing Kratz and Strasser (2015) doi: 10.1371/journal.pone.0117619 9
  • 9. Some problems… • Sharing upon request relies heavily on trust • Informally stored data associated with published works disappears at a rate of ~17% per year (Vines et al. 2014; doi: 10.1016/j.cub.2013.11.014) • Datasets not referenced in a manuscript are essentially invisible (a.k.a “Dark data”) • If data are available, they are often not interpretable or reusable because sufficient detail is not included • Data producers do not get appropriate credit for their work
  • 11. Credit – Scholarly credit for publishing data; all publications are indexed and citeable. Reuse – Standardized and detailed descriptions enables easier reuse of published research data. Quality – Rigorous peer-review on technical quality and reusability. Editorial Board of experts in their field maintain community standards. Discovery – Curated, machine-readable metadata for dataset discovery. Validated links to published data in each article. Open – Use of CC-BY licence for articles and CC0 for metadata. Promote use of open licences for published data. Service – Commitment to excellent service for authors and readers.
  • 12. What is a Data Descriptor?
  • 13. Data Descriptors have human and machine readable components 13 Human readable representation of study i.e. article (HTML & PDF) Human readable representation of study i.e. article (HTML & PDF) Machine readable representation of study i.e. metadata
  • 14. Synthesis Analysis Conclusions What did I do to generate the data? How was the data processed? Where is the data? Who did what and when? Methods and technical analyses supporting the quality of the measurements. Do not contain tests of new scientific hypotheses Comparison of Data Descriptor to traditional article
  • 15. What types of data can be published? 15 Decades old dataset Standalone dataset Data that has been used in an analysis article Large consortium dataset Data from a single experiment Data that the researcher finds valuable and that others might find useful too Data associated with a high impact analysis article
  • 16. When can a Data Descriptor be published? 16 After data analysis has been published Before analysis has been published Authors not intending to analyse data Data Descriptors can be submitted and published at any point in the research workflow, i.e. whenever it makes most sense for your data After data analysis has been published Before the analysis has been published Publication alongside analysis article
  • 17. Scientific Data accepts submissions from all quantitative research disciplines 17
  • 18. Helping authors find the right place for their data
  • 19. Scientific Data’s Repository List Browse our recommended data repositories online. • We currently list almost 80 repositories, across biological, medical, physical and social sciences • When required, we provide guidance to authors on the best place to store their data www.nature.com/sdata/data-policies/repositories
  • 20. Generation of machine readable metadata
  • 21. • We want to capture metadata about the dataset being described in each Data Descriptor • The manuscript captures human readable metadata needed for data reuse • The curated metadata records capture machine readable metadata needed for machine based data discovery Metadata at Scientific Data
  • 22. ISA-Tab format for machine readable metadata 22 • Study workflow • Key sample characteristics needed for data discovery • Relates samples to data files • Shows location of dataset • Uses controlled vocabularies and ontologies (where possible)
  • 23. Use of community endorsed ontologies and controlled vocabularies 23 Controlled vocabulary = list of standardized phrases of scientific concepts Ontology = controlled vocabulary with defined relationships between terms
  • 24. Structured Summary table from curated metadata 24 Investigation file Study file Sample characteristics reported in Structured Summary table: Organism Organism part Cell line Geographical location Environment type
  • 26. Metadata for data discovery Search by: • Data Repositories • Experiment design • Measurements made • Technologies used • Factor types • Sample Characteristics • Organism • Environment types • Geographic locations scientificdata.isa-explorer.org
  • 28. Citing my own data 1. In the article text 2. In the Data Citation section
  • 29. Citing data I’ve reused 1. In the article text 2. In the References section
  • 30. Clinical researchers support sharing, but… Rathi V, Dzara K, Gross CP, Hrynaszkiewicz I, Joffe S, Krumholz HM, Strait KM, Ross JS: Sharing of clinical trial data among trialists: a cross sectional survey. BMJ 2012;345:e7570 • Sharing de-identified data via repositories should be required (236 respondents, 74%) • Investigators should share de-identified data on request (229 respondents, 72%)
  • 31. …clinical data producers have specific concerns Rathi V, Dzara K, Gross CP, Hrynaszkiewicz I, Joffe S, Krumholz HM, Strait KM, Ross JS: Sharing of clinical trial data among trialists: a cross sectional survey. BMJ 2012;345:e7570
  • 32. Example initiatives for sharing clinical data Yale Open Data Access (YODA) & Clinical Study Data Request (CSDR) projects: • Data Use Agreements (DUAs) • Controlled access environment • Scientific validity of reanalysis checked • Independent governance • Data anonymisation checks http://yoda.yale.edu/ https://www.clinicalstudydatarequest.com/
  • 33. Clinical data publication at Scientific Data • Identify repositories able to archive clinical data • Work with identified repositories to establish workflows for peer review and publication, whilst maintaining patient privacy • Facilitate specialist peer review process for clinical data, for example ensure peer reviewers have agreed to terms of data use agreement Hrynaszkiewicz, I., Khodiyar, V., Hufton, A. & Sansone, S. A. Publishing descriptions of non- public clinical datasets: guidance for researchers, repositories, editors and funding organisations. BioRxiv http://dx.doi.org/10.1101/021667 (2015).
  • 35. Published Data Descriptor with clinical data Data Records section details how to access the data
  • 36. Links to restricted access data Data Citations link to repository Data files requiring permission to access Freely accessible data files
  • 38. Data reuse by (some of) the same researchers 38
  • 39. Data reuse by other researchers in the same field 39 “The Data Descriptor made it easier to use the data, for me it was critical that everything was there…all the technical details like voxel size.” Professor Daniele Marinazzo
  • 40. According to Google Scholar, cited 43 times! (February 2016) Data reuse and citation by researchers
  • 42. Data reuse by the non-research community 42 http://www.nytimes.com/interactive/2014/12/30/science/history-of-ebola-in-24-outbreaks.html
  • 43. Data Descriptors… • …enable you to gain scholarly credit for your data gathering efforts. • …are human AND machine readable. • …can be published with, or independently of, an analysis article. • …can be published point in the research workflow. • …allow the publication and discovery of clinical data, whilst maintaining your patients privacy. • …result in greater reuse and citation by fellow members of your research community. • …extend the impact of your research data by enabling access to and reuse by the non-research community. 43
  • 44. Get more from your data Preserve it Encourage reuse Get credit for it Visit nature.com/sdata Email scientificdata@nature.com Tweet @ScientificData #SDJPN16