SlideShare uma empresa Scribd logo
1 de 29
Baixar para ler offline
The State of
Open Research Data
Ross Mounce, Ph.D. (@RMounce)
Postdoc, University of Bath
November 15, 2014
bit.ly/stateofdata
These slides are on Slideshare here:
All textual content is
Disclaimer
Summarising the state of open data is HARD
I'd love to have better data
& better evidence for this talk.
Disclaimer #2
Whenever I talk about data in this talk, assume I'm
talking about non-sensitive data e.g.
NOT medical data
NOT bio-weapons research data
et cetera...
Outline
●
What is open data?
●
The evolution of data availability
●
Where are we now?
●
Some goals & aspirations for the future
What exactly is open data?
From http://opendefinition.org/,
see http://opendefinition.org/od/ for more detail
Open means anyone can
freely access, use, modify,
and share for any purpose
(subject, at most, to requirements
that preserve provenance and
openness)
Centralised Data Centres
The Cambridge Crystallographic Data Centre, est. 1965
It maintains the Cambridge Structural Database **
** Not open data sensu stricto …but I'll leave that to Peter Murray-Rust to explain
Data Sharing (by snail mail)
e.g. “The full profile listings are on floppy disks
which are available upon request”
Fernholz et al (1989) A survey of measurements and measuring
techniques in rapidly distorted compressible turbulent boundary layers.
Bilofsky & Burks (1988)
Nucleic Acids Research v16 n5
“The author will provide the
accession number to the
PROCEEDINGS [PNAS]
office to be included in a
footnote to the published
paper.”
1989
Reproducible research
Jon Claerbout,
Jon Buckheit & David Donoho, 1995
Community agreements to share data
the Bermuda Principles for sharing DNA seq. data
● Automatic release of sequence
assemblies larger than 1 kb
(preferably within 24 hours).
● Immediate publication of finished
annotated sequences.
● Aim to make the entire sequence
freely available in the public domain
Supplementary Data (Online)
Chen et al (1999)
Fluorescence Polarization in
Homogeneous Nucleic Acid
Analysis. Genome Research
“Numerical values for the
data are available as online
supplementary material at
http://www.genome.org.”
“Each custodian of data on plant traits will retain the
right to be informed of any TRY activity that may involve
his/her data, and will have the opportunity to negotiate
whether his/her data can be used, and whether general
guidelines of authorship need to be modified in that
particular case
Custodians retain the rights to withdraw their data
at any time.”
Your data is NOT 'too big' to share
http://gigadb.org/dataset/100124
39 Gigabytes (GB)
of MRI scans
By sharing data we can see further
Data (& code) are the building
blocks of science
Shared, re-used data allow us to
more rigorously test hypotheses;
“to see further”
...and to do it all more quickly and
easily.
Real problems of non-open data:
GBIF & biodiversity data
Desmet, P. (2013) Showing you this map of aggregated bullfrog occurrences
would be illegal http://peterdesmet.com/posts/illegal-bullfrogs.html
Many many options for open'ing data
Genbank,
SRA,
1000's more!
http://www.crystallography.net/
...and getting more credit for it with
'Data paper' journals
http://www.mdpi.com/journal/data/about
Intelligent data papers allow databases
to automatically pull-in your data
Many publishers (e.g. Pensoft) intelligently
markup data papers so that the data can
be automatically ingested into appropriate
db's on the day of publication!
Data
data
Data sharing benefits authors & re-users
Piwowar HA, Vision TJ. (2013)
Data reuse and the open data
citation advantage. PeerJ
1:e175
“...open data citation
benefit for this sample
to be 9%”
relative to papers
providing no public
data, for gene
expression microarray
data
10.7717/peerj.175/fig-2
See also previous work by
Piwowar:
10.1371/journal.pone.0000308
Citation
Advantage
Those who share data, do better science
Wicherts, J. M., Bakker, M. & Molenaar, D. (2011)
Willingness to share research data is related to the
strength of the evidence and the quality of reporting of
statistical results. PLoS ONE 6, e26828+ URL
http://dx.doi.org/10.1371/journal.pone.0026828
The authors examined psychological papers for the quality of statistical
reporting & asked the authors of those papers for the full data underlying
the reported results. Generally, those who shared, had more statistically
robust, reproducible results.
“Email the author for data” - doesnt work
Wicherts JM, Borsboom D,
Kats J, Molenaar D (2006)
The poor availability of
psychological research
data for reanalysis.
American Psychologist 61:
726–728 link
A well-known problem, which
I myself have also faced
many times!!!
Many legacy journals
unfortunately still pretend
that “email the author” is
still acceptable.
Best practice open data is time consuming
(but still worth the extra effort!)
Emilio M. Bruna recently provided an estimate of the amount of
time it took him to prepare & upload open data related to
publication to figshare & dryad.
http://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35-
hours-690/
11
Hours
& $90
(for Dryad)
Providing open-source code was the most time consuming part (25.5 hours),
and Open Access publication the most expensive ($600).
THIS IS WHERE WE ARE (mostly)
Most research data would get
ZERO (not available online)
Or just ONE star
http://5stardata.info/
3-star open research data
is achievable and desirable
This is where research data publication
should be aiming for in the short term.
Publishing .csv / non-proprietary open data is
NOT actually that hard!
http://5stardata.info/
Imagine a world where no-one shared
their data (post-publication)
How would we know what was truth & what was lies / fraud / error?
Imagine the waste of time & resources
if everyone had to re-generate data de novo every time
How would we make progress?
Predictions for the (near) future
● Research funding bodies will tighten-up their rules to ensure
immediate post-publication data sharing. No embargoes, no bullshit.
● If no published data comes from your funded research, it will negatively
effect your future chances of funding
● Research institutions will significantly improve research data
management training for ALL staff & students, old and new alike
● Good journals will strictly enforce mandatory data sharing.
Journals that don't will get a bad reputation for irreprodcible research
● CC0 for data will become the de facto standard. Everyone will realise
that legal protection under copyright is completely the wrong tool for
ensuring the ethical use of data & appropriate authorship assignment.
Thank you!
Happy to answer all questions
ross@righttoresearch.org
@RMounce
www.righttoresearch.org
www.sparc.arl.org

Mais conteúdo relacionado

Mais procurados

Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Natsuko Nicholls
 

Mais procurados (20)

DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?DataONE Education Module 01: Why Data Management?
DataONE Education Module 01: Why Data Management?
 
Data science as a science
Data science as a scienceData science as a science
Data science as a science
 
Fixing the leaks in the pipeline from public genomics data to the clinic
Fixing the leaks in the pipeline from public genomics data to the clinicFixing the leaks in the pipeline from public genomics data to the clinic
Fixing the leaks in the pipeline from public genomics data to the clinic
 
A basic course on Research data management, part 1: what and why
A basic course on Research data management, part 1: what and whyA basic course on Research data management, part 1: what and why
A basic course on Research data management, part 1: what and why
 
Data wranglers in LibraryLand: Finding opportunities in the changing policy l...
Data wranglers in LibraryLand: Finding opportunities in the changing policy l...Data wranglers in LibraryLand: Finding opportunities in the changing policy l...
Data wranglers in LibraryLand: Finding opportunities in the changing policy l...
 
On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...On community-standards, data curation and scholarly communication - BITS, Ita...
On community-standards, data curation and scholarly communication - BITS, Ita...
 
DataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management PlanningDataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management Planning
 
Open Research Data
Open Research DataOpen Research Data
Open Research Data
 
A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...
 
Reproducible research: First steps.
Reproducible research: First steps. Reproducible research: First steps.
Reproducible research: First steps.
 
Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how Reproducibility and Scientific Research: why, what, where, when, who, how
Reproducibility and Scientific Research: why, what, where, when, who, how
 
Opportunistic Persistent Data Storage
Opportunistic Persistent Data StorageOpportunistic Persistent Data Storage
Opportunistic Persistent Data Storage
 
Reproducibility
ReproducibilityReproducibility
Reproducibility
 
Original Google Patent by Lawrence Page | notes
Original Google Patent by Lawrence Page | notesOriginal Google Patent by Lawrence Page | notes
Original Google Patent by Lawrence Page | notes
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge GraphsCombining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
Combining Explicit and Latent Web Semantics for Maintaining Knowledge Graphs
 
Best Practices for Managing Your Data
Best Practices for Managing Your DataBest Practices for Managing Your Data
Best Practices for Managing Your Data
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
 
Scott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data PublishingScott Edmunds ISMB talk on Big Data Publishing
Scott Edmunds ISMB talk on Big Data Publishing
 

Semelhante a The State of Open Research Data - OpenCon 2014

Seven common objections to data sharing
Seven common objections to data sharingSeven common objections to data sharing
Seven common objections to data sharing
pkdoorn
 

Semelhante a The State of Open Research Data - OpenCon 2014 (20)

Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?Open Data in a Big Data World: easy to say, but hard to do?
Open Data in a Big Data World: easy to say, but hard to do?
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 
Open science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, PotsdamOpen science, open data - FOSTER training, Potsdam
Open science, open data - FOSTER training, Potsdam
 
Why should researchers care about data curation?
Why should researchers care about data curation?Why should researchers care about data curation?
Why should researchers care about data curation?
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
 
ANDS presentation from Menzies HIQ Symposium: The Future of Data Sharing in a...
ANDS presentation from Menzies HIQ Symposium: The Future of Data Sharing in a...ANDS presentation from Menzies HIQ Symposium: The Future of Data Sharing in a...
ANDS presentation from Menzies HIQ Symposium: The Future of Data Sharing in a...
 
Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data Using Open Science to advance science - advancing open data
Using Open Science to advance science - advancing open data
 
RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015RDA Scholarly Infrastructure 2015
RDA Scholarly Infrastructure 2015
 
Biomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital EnterpriseBiomedical Research as Part of the Digital Enterprise
Biomedical Research as Part of the Digital Enterprise
 
Seven common objections to data sharing
Seven common objections to data sharingSeven common objections to data sharing
Seven common objections to data sharing
 
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
How Data Commons are Changing the Way that Large Datasets Are Analyzed and Sh...
 
Introduction to open-data
Introduction to open-dataIntroduction to open-data
Introduction to open-data
 
Diabetes Data Science
Diabetes Data ScienceDiabetes Data Science
Diabetes Data Science
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?
 
In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
In Search of a Missing Link in the Data Deluge vs. Data Scarcity DebateIn Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
In Search of a Missing Link in the Data Deluge vs. Data Scarcity Debate
 
Acting as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decadeActing as Advocate? Seven steps for libraries in the data decade
Acting as Advocate? Seven steps for libraries in the data decade
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015
 

Mais de Right to Research

Assessing Current Practices in Academic Review, Promotion, and Tenure across ...
Assessing Current Practices in Academic Review, Promotion, and Tenure across ...Assessing Current Practices in Academic Review, Promotion, and Tenure across ...
Assessing Current Practices in Academic Review, Promotion, and Tenure across ...
Right to Research
 

Mais de Right to Research (20)

OpenCon 2017 Opening Remarks
OpenCon 2017 Opening RemarksOpenCon 2017 Opening Remarks
OpenCon 2017 Opening Remarks
 
OpenUP: Rethinking the review, dissemination and assessment of research - Ton...
OpenUP: Rethinking the review, dissemination and assessment of research - Ton...OpenUP: Rethinking the review, dissemination and assessment of research - Ton...
OpenUP: Rethinking the review, dissemination and assessment of research - Ton...
 
Open Access Development in Asia - Vrushali Dandawate - OpenCon 2017
Open Access Development in Asia - Vrushali Dandawate - OpenCon 2017Open Access Development in Asia - Vrushali Dandawate - OpenCon 2017
Open Access Development in Asia - Vrushali Dandawate - OpenCon 2017
 
FlourishOA - Ashley Farley - OpenCon 2017
FlourishOA - Ashley Farley - OpenCon 2017FlourishOA - Ashley Farley - OpenCon 2017
FlourishOA - Ashley Farley - OpenCon 2017
 
Open Access Bangladesh - M. Monirul Islam - OpenCon 2017
Open Access Bangladesh - M. Monirul Islam - OpenCon 2017Open Access Bangladesh - M. Monirul Islam - OpenCon 2017
Open Access Bangladesh - M. Monirul Islam - OpenCon 2017
 
PREreview - Samantha Hindle - OpenCon 2017
PREreview - Samantha Hindle - OpenCon 2017PREreview - Samantha Hindle - OpenCon 2017
PREreview - Samantha Hindle - OpenCon 2017
 
Whose Knowledge is Reliable? - Siko Bouterse - OpenCon 2017
Whose Knowledge is Reliable? - Siko Bouterse - OpenCon 2017Whose Knowledge is Reliable? - Siko Bouterse - OpenCon 2017
Whose Knowledge is Reliable? - Siko Bouterse - OpenCon 2017
 
The (Unconscious?) Neocolonial Face of Open Access: Dynamics of Power in Fran...
The (Unconscious?) Neocolonial Face of Open Access: Dynamics of Power in Fran...The (Unconscious?) Neocolonial Face of Open Access: Dynamics of Power in Fran...
The (Unconscious?) Neocolonial Face of Open Access: Dynamics of Power in Fran...
 
Who is Missing? - Tara Robertson - OpenCon 2017
Who is Missing? - Tara Robertson - OpenCon 2017Who is Missing? - Tara Robertson - OpenCon 2017
Who is Missing? - Tara Robertson - OpenCon 2017
 
Power and Inequality in Open Science Discourses - Denisse Albornoz - OpenCon ...
Power and Inequality in Open Science Discourses - Denisse Albornoz - OpenCon ...Power and Inequality in Open Science Discourses - Denisse Albornoz - OpenCon ...
Power and Inequality in Open Science Discourses - Denisse Albornoz - OpenCon ...
 
The Subtle Art of Persuasion - Luc Henry - OpenCon 2017
The Subtle Art of Persuasion - Luc Henry - OpenCon 2017The Subtle Art of Persuasion - Luc Henry - OpenCon 2017
The Subtle Art of Persuasion - Luc Henry - OpenCon 2017
 
Student Advocacy in Canada - Brady Yano - OpenCon 2017
Student Advocacy in Canada - Brady Yano - OpenCon 2017Student Advocacy in Canada - Brady Yano - OpenCon 2017
Student Advocacy in Canada - Brady Yano - OpenCon 2017
 
Regional Models for Open Research and Education in Latin America - Guillermin...
Regional Models for Open Research and Education in Latin America - Guillermin...Regional Models for Open Research and Education in Latin America - Guillermin...
Regional Models for Open Research and Education in Latin America - Guillermin...
 
Kyrgyz Mountains Environmental Education and Citizen Science Project (KMEECS)...
Kyrgyz Mountains Environmental Education and Citizen Science Project (KMEECS)...Kyrgyz Mountains Environmental Education and Citizen Science Project (KMEECS)...
Kyrgyz Mountains Environmental Education and Citizen Science Project (KMEECS)...
 
The African Story of Open Research - Nozuko Zukie Hlwatika
The African Story of Open Research - Nozuko Zukie HlwatikaThe African Story of Open Research - Nozuko Zukie Hlwatika
The African Story of Open Research - Nozuko Zukie Hlwatika
 
The Data to Policy Project - Shea Swauger - OpenCon 2017
The Data to Policy Project - Shea Swauger - OpenCon 2017The Data to Policy Project - Shea Swauger - OpenCon 2017
The Data to Policy Project - Shea Swauger - OpenCon 2017
 
IN_VISIBLE PROJECT - Alexis Johnson - OpenCon 2017
IN_VISIBLE PROJECT - Alexis Johnson - OpenCon 2017 IN_VISIBLE PROJECT - Alexis Johnson - OpenCon 2017
IN_VISIBLE PROJECT - Alexis Johnson - OpenCon 2017
 
Overview of the Journal of European Psychology Students - Fabian Dablander & ...
Overview of the Journal of European Psychology Students - Fabian Dablander & ...Overview of the Journal of European Psychology Students - Fabian Dablander & ...
Overview of the Journal of European Psychology Students - Fabian Dablander & ...
 
Translating Open Agricultural Research to Local & World languages for Ethiopi...
Translating Open Agricultural Research to Local & World languages for Ethiopi...Translating Open Agricultural Research to Local & World languages for Ethiopi...
Translating Open Agricultural Research to Local & World languages for Ethiopi...
 
Assessing Current Practices in Academic Review, Promotion, and Tenure across ...
Assessing Current Practices in Academic Review, Promotion, and Tenure across ...Assessing Current Practices in Academic Review, Promotion, and Tenure across ...
Assessing Current Practices in Academic Review, Promotion, and Tenure across ...
 

Último

Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Cara Gugurkan Pembuahan Secara Alami Dan Cepat ABORSI KANDUNGAN 087776558899
Cara Gugurkan Pembuahan Secara Alami Dan Cepat ABORSI KANDUNGAN 087776558899Cara Gugurkan Pembuahan Secara Alami Dan Cepat ABORSI KANDUNGAN 087776558899
Cara Gugurkan Pembuahan Secara Alami Dan Cepat ABORSI KANDUNGAN 087776558899
Cara Menggugurkan Kandungan 087776558899
 
Competitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptxCompetitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptx
ScottMeyers35
 

Último (20)

Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and NumberCall Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
Call Girls Mehsana / 8250092165 Genuine Call girls with real Photos and Number
 
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Haldia [ 7014168258 ] Call Me For Genuine Models We...
 
1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS1935 CONSTITUTION REPORT IN RIPH FINALLS
1935 CONSTITUTION REPORT IN RIPH FINALLS
 
Panchayath circular KLC -Panchayath raj act s 169, 218
Panchayath circular KLC -Panchayath raj act s 169, 218Panchayath circular KLC -Panchayath raj act s 169, 218
Panchayath circular KLC -Panchayath raj act s 169, 218
 
2024 UNESCO/Guillermo Cano World Press Freedom Prize
2024 UNESCO/Guillermo Cano World Press Freedom Prize2024 UNESCO/Guillermo Cano World Press Freedom Prize
2024 UNESCO/Guillermo Cano World Press Freedom Prize
 
74th Amendment of India PPT by Piyush(IC).pptx
74th Amendment of India PPT by Piyush(IC).pptx74th Amendment of India PPT by Piyush(IC).pptx
74th Amendment of India PPT by Piyush(IC).pptx
 
2024: The FAR, Federal Acquisition Regulations, Part 30
2024: The FAR, Federal Acquisition Regulations, Part 302024: The FAR, Federal Acquisition Regulations, Part 30
2024: The FAR, Federal Acquisition Regulations, Part 30
 
The NAP process & South-South peer learning
The NAP process & South-South peer learningThe NAP process & South-South peer learning
The NAP process & South-South peer learning
 
NGO working for orphan children’s education
NGO working for orphan children’s educationNGO working for orphan children’s education
NGO working for orphan children’s education
 
3 May, Journalism in the face of the Environmental Crisis.
3 May, Journalism in the face of the Environmental Crisis.3 May, Journalism in the face of the Environmental Crisis.
3 May, Journalism in the face of the Environmental Crisis.
 
2024: The FAR, Federal Acquisition Regulations, Part 31
2024: The FAR, Federal Acquisition Regulations, Part 312024: The FAR, Federal Acquisition Regulations, Part 31
2024: The FAR, Federal Acquisition Regulations, Part 31
 
Finance strategies for adaptation. Presentation for CANCC
Finance strategies for adaptation. Presentation for CANCCFinance strategies for adaptation. Presentation for CANCC
Finance strategies for adaptation. Presentation for CANCC
 
31st World Press Freedom Day Conference in Santiago.
31st World Press Freedom Day Conference in Santiago.31st World Press Freedom Day Conference in Santiago.
31st World Press Freedom Day Conference in Santiago.
 
AHMR volume 10 number 1 January-April 2024
AHMR volume 10 number 1 January-April 2024AHMR volume 10 number 1 January-April 2024
AHMR volume 10 number 1 January-April 2024
 
Cara Gugurkan Pembuahan Secara Alami Dan Cepat ABORSI KANDUNGAN 087776558899
Cara Gugurkan Pembuahan Secara Alami Dan Cepat ABORSI KANDUNGAN 087776558899Cara Gugurkan Pembuahan Secara Alami Dan Cepat ABORSI KANDUNGAN 087776558899
Cara Gugurkan Pembuahan Secara Alami Dan Cepat ABORSI KANDUNGAN 087776558899
 
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'IsraëlAntisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
 
Sustainability by Design: Assessment Tool for Just Energy Transition Plans
Sustainability by Design: Assessment Tool for Just Energy Transition PlansSustainability by Design: Assessment Tool for Just Energy Transition Plans
Sustainability by Design: Assessment Tool for Just Energy Transition Plans
 
World Press Freedom Day 2024; May 3rd - Poster
World Press Freedom Day 2024; May 3rd - PosterWorld Press Freedom Day 2024; May 3rd - Poster
World Press Freedom Day 2024; May 3rd - Poster
 
Competitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptxCompetitive Advantage slide deck___.pptx
Competitive Advantage slide deck___.pptx
 
Call Girls in Moti Bagh (delhi) call me [8448380779] escort service 24X7
Call Girls in Moti Bagh (delhi) call me [8448380779] escort service 24X7Call Girls in Moti Bagh (delhi) call me [8448380779] escort service 24X7
Call Girls in Moti Bagh (delhi) call me [8448380779] escort service 24X7
 

The State of Open Research Data - OpenCon 2014

  • 1. The State of Open Research Data Ross Mounce, Ph.D. (@RMounce) Postdoc, University of Bath November 15, 2014
  • 2. bit.ly/stateofdata These slides are on Slideshare here: All textual content is
  • 3. Disclaimer Summarising the state of open data is HARD I'd love to have better data & better evidence for this talk.
  • 4. Disclaimer #2 Whenever I talk about data in this talk, assume I'm talking about non-sensitive data e.g. NOT medical data NOT bio-weapons research data et cetera...
  • 5. Outline ● What is open data? ● The evolution of data availability ● Where are we now? ● Some goals & aspirations for the future
  • 6. What exactly is open data? From http://opendefinition.org/, see http://opendefinition.org/od/ for more detail Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness)
  • 7. Centralised Data Centres The Cambridge Crystallographic Data Centre, est. 1965 It maintains the Cambridge Structural Database ** ** Not open data sensu stricto …but I'll leave that to Peter Murray-Rust to explain
  • 8. Data Sharing (by snail mail) e.g. “The full profile listings are on floppy disks which are available upon request” Fernholz et al (1989) A survey of measurements and measuring techniques in rapidly distorted compressible turbulent boundary layers.
  • 9. Bilofsky & Burks (1988) Nucleic Acids Research v16 n5 “The author will provide the accession number to the PROCEEDINGS [PNAS] office to be included in a footnote to the published paper.” 1989
  • 10. Reproducible research Jon Claerbout, Jon Buckheit & David Donoho, 1995
  • 11. Community agreements to share data the Bermuda Principles for sharing DNA seq. data ● Automatic release of sequence assemblies larger than 1 kb (preferably within 24 hours). ● Immediate publication of finished annotated sequences. ● Aim to make the entire sequence freely available in the public domain
  • 12. Supplementary Data (Online) Chen et al (1999) Fluorescence Polarization in Homogeneous Nucleic Acid Analysis. Genome Research “Numerical values for the data are available as online supplementary material at http://www.genome.org.”
  • 13. “Each custodian of data on plant traits will retain the right to be informed of any TRY activity that may involve his/her data, and will have the opportunity to negotiate whether his/her data can be used, and whether general guidelines of authorship need to be modified in that particular case Custodians retain the rights to withdraw their data at any time.”
  • 14. Your data is NOT 'too big' to share http://gigadb.org/dataset/100124 39 Gigabytes (GB) of MRI scans
  • 15.
  • 16. By sharing data we can see further Data (& code) are the building blocks of science Shared, re-used data allow us to more rigorously test hypotheses; “to see further” ...and to do it all more quickly and easily.
  • 17. Real problems of non-open data: GBIF & biodiversity data Desmet, P. (2013) Showing you this map of aggregated bullfrog occurrences would be illegal http://peterdesmet.com/posts/illegal-bullfrogs.html
  • 18. Many many options for open'ing data Genbank, SRA, 1000's more! http://www.crystallography.net/
  • 19. ...and getting more credit for it with 'Data paper' journals http://www.mdpi.com/journal/data/about
  • 20. Intelligent data papers allow databases to automatically pull-in your data Many publishers (e.g. Pensoft) intelligently markup data papers so that the data can be automatically ingested into appropriate db's on the day of publication! Data data
  • 21. Data sharing benefits authors & re-users Piwowar HA, Vision TJ. (2013) Data reuse and the open data citation advantage. PeerJ 1:e175 “...open data citation benefit for this sample to be 9%” relative to papers providing no public data, for gene expression microarray data 10.7717/peerj.175/fig-2 See also previous work by Piwowar: 10.1371/journal.pone.0000308 Citation Advantage
  • 22. Those who share data, do better science Wicherts, J. M., Bakker, M. & Molenaar, D. (2011) Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PLoS ONE 6, e26828+ URL http://dx.doi.org/10.1371/journal.pone.0026828 The authors examined psychological papers for the quality of statistical reporting & asked the authors of those papers for the full data underlying the reported results. Generally, those who shared, had more statistically robust, reproducible results.
  • 23. “Email the author for data” - doesnt work Wicherts JM, Borsboom D, Kats J, Molenaar D (2006) The poor availability of psychological research data for reanalysis. American Psychologist 61: 726–728 link A well-known problem, which I myself have also faced many times!!! Many legacy journals unfortunately still pretend that “email the author” is still acceptable.
  • 24. Best practice open data is time consuming (but still worth the extra effort!) Emilio M. Bruna recently provided an estimate of the amount of time it took him to prepare & upload open data related to publication to figshare & dryad. http://brunalab.org/blog/2014/09/04/the-opportunity-cost-of-my-openscience-was-35- hours-690/ 11 Hours & $90 (for Dryad) Providing open-source code was the most time consuming part (25.5 hours), and Open Access publication the most expensive ($600).
  • 25. THIS IS WHERE WE ARE (mostly) Most research data would get ZERO (not available online) Or just ONE star http://5stardata.info/
  • 26. 3-star open research data is achievable and desirable This is where research data publication should be aiming for in the short term. Publishing .csv / non-proprietary open data is NOT actually that hard! http://5stardata.info/
  • 27. Imagine a world where no-one shared their data (post-publication) How would we know what was truth & what was lies / fraud / error? Imagine the waste of time & resources if everyone had to re-generate data de novo every time How would we make progress?
  • 28. Predictions for the (near) future ● Research funding bodies will tighten-up their rules to ensure immediate post-publication data sharing. No embargoes, no bullshit. ● If no published data comes from your funded research, it will negatively effect your future chances of funding ● Research institutions will significantly improve research data management training for ALL staff & students, old and new alike ● Good journals will strictly enforce mandatory data sharing. Journals that don't will get a bad reputation for irreprodcible research ● CC0 for data will become the de facto standard. Everyone will realise that legal protection under copyright is completely the wrong tool for ensuring the ethical use of data & appropriate authorship assignment.
  • 29. Thank you! Happy to answer all questions ross@righttoresearch.org @RMounce www.righttoresearch.org www.sparc.arl.org