Talk given at the Data Visualisation and the Future of Academic Publishing event. https://www.eventbrite.com/e/data-visualisation-and-the-future-of-academic-publishing-tickets-25372801733?password=dataviz
Data Publication:
Discover, Explore, Visualise
Alejandra Gonzalez-Beltran, PhD
Research Lecturer
Oxford e-Research Centre
University of Oxford
Data Visualisation and the Future of Academic Publishing
University of Oxford and Oxford University Press
June 10th 2016
@alegonbel
Philippe
Rocca-Serra, PhD
Senior Research Lecturer
Alejandra
Gonzalez-Beltran, PhD
Research Lecturer
Milo
Thurston, DPhD
Research Software Engineer
Massimiliano
Izzo, PhD
Research Software Engineer
Peter
McQuilton, PhD
Knowledge Engineer
Our main areas of research and activity:
• Enabling reproducible research through…
• Data collection, curation, representation etc.
• Data publication
• Data provenance
• Development of software, infrastructure
• Open, community ontologies and standards
• Semantic web / linked data
• Training
Communities we work with/for:
Allyson
Lister, PhD
Knowledge Engineer
Eamonn
Maguire, DPhil
Software Engineer contractor
David
Johnson, PhD
Research Software Engineer
Susanna-Assunta Sansone, PhD
Principal Investigator, Associate Director
OutlineOutline
• Challenges associated to scholarly data
• Importance of all research outputs / metadata
• Reproducibility crisis
• Experiments description
• Data availability
• Data publication
• Springer Nature Scientific Data
• Discover, Explore, Visualise Scholarly Data
• Scientific Data ISA-explorer
• Challenges associated to scholarly data
• Importance of all research outputs / metadata
• Reproducibility crisis
• Experiments description
• Data availability
• Data publication
• Springer Nature Scientific Data
• Discover, Explore, Visualise Scholarly Data
• Scientific Data ISA-explorer
• Outputs are multi-dimensional, diverse, not always well cited /
stored
o Software, codes, workflows etc.; hard(er) to get hold of
• Data often distributed and fragmented to fit (siloed)
databases
o Without enough information for others to understand it
• Uneven level of details and annotation across different
databases
o Specialized, generalist, public and institutional
• Data curation activities are perceived as time consuming
o Collection and harmonization of detailed methods and experimental
steps is done/rushed at publication stage
But… shared data is not always
understandable, reusable
But… shared data is not always
understandable, reusable
Importance of
- avoid selective reporting
- experimental design
- statistical power
- statistical analysis
- code/methods availability
- data availability
Importance of
- avoid selective reporting
- experimental design
- statistical power
- statistical analysis
- code/methods availability
- data availability
• Incentive, credit for sharing
o Big and small data
o Unpublished data
o Long tail of data
o Curated aggregation
• Peer review of data
• Value of data vs. analysis
• Discoverability and reusability
o Complementing community
databases
Growing number of data papers and data journalsGrowing number of data papers and data journals
nature.com/scientificdataHonorary Academic Editor
Susanna-Assunta Sansone, PhD
Managing Editor
Andrew L Hufton, PhD
Editorial Curator
Varsha Khodiyar
Publisher
Iain Hrynaszkiewicz
A new open-access, online-only publication for
descriptions of scientifically valuable datasets
Supported by
nature.com/scientificdataHonorary Academic Editor
Susanna-Assunta Sansone, PhD
Managing Editor
Andrew L Hufton, PhD
Editorial Curator
Varsha Khodiyar
Publisher
Iain Hrynaszkiewicz
A new open-access, online-only publication for
descriptions of scientifically valuable datasets
Supported by
Scientific hypotheses:
Synthesis
Analysis
Conclusions
Methods and technical analyses supporting the
quality of the measurements:
What did I do to generate the data?
How was the data processed?
Where is the data?
Who did what when
Relation with traditional articles – contentRelation with traditional articles – content
Citation of and links to data files and databasesCitation of and links to data files and databases
Citation of and links to data files and databasesCitation of and links to data files and databases
Credit for data producersCredit for data producers
A new article typeA new article type
A new category of publication that provides detailed
descriptors of scientifically valuable datasets
Mandates open data, without unnecessary
restrictions, as a condition of submission
Links to data repositories to access the data
Assays details
SummarySummary
• Challenges associated to scholarly data
• Importance of all research outputs / metadata
• Reproducibility crisis
• Experiments description
• Data availability
• Data publication
• Springer Nature Scientific Data
• Discover, Explore, Visualise Scholarly Data
• Scientific Data ISA-explorer
• Challenges associated to scholarly data
• Importance of all research outputs / metadata
• Reproducibility crisis
• Experiments description
• Data availability
• Data publication
• Springer Nature Scientific Data
• Discover, Explore, Visualise Scholarly Data
• Scientific Data ISA-explorer
Philippe
Rocca-Serra, PhD
Senior Research Lecturer
Alejandra
Gonzalez-Beltran, PhD
Research Lecturer
Milo
Thurston, DPhD
Research Software Engineer
Massimiliano
Izzo, PhD
Research Software Engineer
Peter
McQuilton, PhD
Knowledge Engineer
Communities we work with/for:
Allyson
Lister, PhD
Knowledge Engineer
Eamonn
Maguire, DPhil
Software Engineer contractor
David
Johnson, PhD
Research Software Engineer
Susanna-Assunta Sansone, PhD
Principal Investigator, Associate Director
Our main areas of research and activity:
• Enabling reproducible research through…
• Data collection, curation, representation etc.
• Data publication
• Data provenance
• Development of software, infrastructure
• Open, community ontologies and standards
• Semantic web / linked data
• Training