Presentation to Sport Data Valley given at TU Delft Library meeting on value of Data Stewardship and Curation for those working with data from elite and public sport
May 2016
Research Data Management & Data Stewardship in Sport Science
1. Presentation to Sport Data Valley meeting
May 2016
Alastair Dunning
3TU.Datacentrum hosted at TU Delft Library
@alastairdunning, a.c.dunning@tudelft.nl
Winning the Tour de France, Research Data
and Data Stewardship
2. In the 2015 Tour de
France Chris Froome won
the Bastille Day Stage 10,
with a 1.610m Hors
Categorie climb, by 59
seconds
4. Such criticism has been
around since Froome
shot to fame in 2012, and
then as winner of the
Tour de France in 2013
5. As a response, Froome’s
TeamSky published the
‘power data’ behind his
performance
6. Later in the year, Froome
underwent more testing
and the lab data was
released
7. Results showed that
much of Froome’s
improvement was down
to weight loss (>5 kilos)
Since then, criticism of
Froome has diminished.
8. What happened to TeamSky
and Chris Froome is
happening across scientific
study.
9. How does any scientist look
after their data? Not just to
prove arguments to others
but to themselves at a later
time.
10. In a digital age, with data
readily available, how does
science verify and
reproduce the claims it
makes ?
11. This has led to the fields
of research data
management and data
stewardship
12. I would urge anybody
creating or using data as
evidence to start thinking
about these issues
13. The safe storage and
protection of intellectual
capital developed by
scientists
Best practice in ensuring
scientific arguments are
replicable in the long term
Better exposure of work of
scientists and improved
citation rates
Improved practices for meeting
the demands of funders,
publishers and others in
respect to research data
Shared values behind Data Stewardship
14. Around 1 in 6 researchers at
Erasmus University had no
idea if their data is backed up
56 professors in the USA agreed
to have their data practices
analysed: “a majority of them
had experienced the loss of at
least one work-related digital
object that they considered
to be important in the course
of their professional career.”
Safe storage and protection of intellectual capital
15. Safe storage and protection of intellectual capital
Study in Cell: The Availability of
Research Data Declines Rapidly with
Article Age
“We examined the availability of data from 516 studies
between 2 and 22 years old”
“The odds of a data set being reported as extant fell by 17%
per year”
“Policies mandating data archiving at publication are clearly
needed”
17. Disproving Einstein’s Theory of Locality -
Professor Ronald Hanson and his team,
including featured Ph.D. student Bas
Hansen. Published in Nature
Best practice in ensuring scientific arguments are replicable in the long term
Hanson and Hensen knew they were
working on a high impact paper. So they
realised there would be requests for the
raw data so that the experiment could be
validated and the data checked for
consistency. Given that scientists had been
using this experimental method since the
1960s, and results had always been
contested, there was a tradition of sharing
data related to this experiment. So they
knew from the start they would open up
the data.
A couple of months since its publication and the dataset is already
gaining interest. In the first six months since its deposit, the first
dataset has been viewed 650 times. The second dataset has been
viewed 56 times in the first three weeks. This is according to
Hensen’s expectations. Hensen reckons that this shows that nearly
all of the world’s other research groups involved in experimental
quantum mechanics have accessed the dataset.
18. “The Citation Advantage presently (at
the least since 2009) amounts to papers
with links to data receiving on the
average 50% more citations per paper
per year, than the papers without links to
data.”
(Astrophysics, 2012)
“Publicly available data was
significantly (p = 0.006) associated with
a 69% increase in citations,
independently of journal impact factor,
date of publication.”
(Cancer microarray trials, 2007)
“Findings suggest that all three data sets
are highly cited, with estimated citation
counts in most cases higher than 99% of
all the journal articles published in
Oceanography during the same years”
(Oceanography, 2014)
Better exposure of academic work of scientists
19. Improved practices for meeting the demands of funders,
publishers and others in respect to research data
21. 21
Services of 3TU.Datacentrum data repository
http://data.3tu.nl/repository/
• ‘Frozen’ dataset (version) for future
use & long term storage
• ‘Published’ data: visible
• Open (max. 2 years embargo):
shareable
• Persistent digital object identifier
(DOI): findable and citable
• Sustainable formats: readable
• Data Seal of Approval: safe and
secure
23. Every researcher can upload up
to 10 GB of data to
3TU.Datacentrum a year free of
charge. For depositing
additional data there is a one
off cost of € 4.50 per GB.
25. Presentation to Sport Data Valley meeting
May 2016
Alastair Dunning, Research Data
TU Delft & 3TU.Datacentrum
@alastairdunning, a.c.dunning@tudelft.nl
Winning the Tour de France, Research Data
and Data Stewardship
26. Slide 2 - https://en.wikipedia.org/wiki/2015_Tour_de_France,_Stage_1_to_Stage_11#Stage_10
Slide 3 - http://www.independent.co.uk/sport/cycling/tour-de-france-2015-doping-claims-dampen-the-mood-as-chris-froome-triumphs-10417336.html
Slide 5 - http://www.teamsky.com/teamsky/home/article/59618#vYKyzhBzAIYy7BKH.97
Slide 6 - http://chrisfroome.esquire.co.uk/
Slide 14 - https://www.fosteropenscience.eu/sites/default/files/pdf/919.pdf (Erasmus); http://www.ijdc.net/index.php/ijdc/article/view/10.2.96 (Intellectual
Capital at Risk, US Study) https://www.flickr.com/groups/2121762@N23/
Slide 15 - http://www.cell.com/current-biology/abstract/S0960-9822(13)01400-0; https://www.flickr.com/groups/2121762@N23/
Slide 16 - various. Type ‘Fire Lab University’ into Google !
Slide 17 - http://datacentrum.3tu.nl/en/researchers-about-3tudatacentrum/ (forthcoming);
http://www.nature.com/nature/journal/v526/n7575/full/nature15759.html
Slide 18 - Belter CW (2014) Measuring the Value of Research Data: A Citation Analysis of Oceanographic Data Sets. PLoS ONE 9(3): e92590.
doi:10.1371/journal.pone.0092590; Piwowar HA, Day RS, Fridsma DB (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE
2(3): e308. doi:10.1371/journal.pone.0000308, Bertil Dorch. On the Citation Advantage of linking to data: Astrophysics. 2012. <hprints-00714715v2>
Slide 19 - http://ec.europa.eu/newsroom/dae/document.cfm?doc_id=15266 (EU) , http://www.nwo.nl/en/policies/open+science/data+management (NWO)
Slide 21 - http://data.3tu.nl/repository/
Citations