O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Tim Osborn: Research Integrity: Integrity of the published record

1.053 visualizações

Publicada em

Tim Osborn, Reader, University of East Anglia

Publicada em: Tecnologia
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Tim Osborn: Research Integrity: Integrity of the published record

  1. 1. Climate research data and research integrity Dr Tim Osborn Climatic Research Unit School of Environmental Sciences University of East Anglia JISC Research Integrity Conference: the Importance of Good Data Management 13 September 2011
  2. 2. Integrity of the published research record <ul><li>Why is it important for climate research and why now? </li></ul><ul><ul><ul><li>(Of course it’s always been important and not just for this discipline) </li></ul></ul></ul><ul><li>The global warming issue: </li></ul><ul><ul><li>Scientifically challenging </li></ul></ul><ul><ul><li>Politically, socially and economically contentious </li></ul></ul><ul><ul><li>High stakes (economic and non-economic) </li></ul></ul><ul><ul><li>Under intense scrutiny </li></ul></ul>
  3. 3. Climate change hacked emails controversy <ul><li>The integrity of our research was severely questioned </li></ul><ul><ul><li>What role did research data issues (management, sharing, etc.) play in this? </li></ul></ul><ul><ul><ul><li>Need to distinguish research integrity from perceptions of research integrity </li></ul></ul></ul><ul><ul><li>These issues probably played a rather small role </li></ul></ul><ul><ul><ul><li>Our research data and the research record were preserved </li></ul></ul></ul><ul><ul><ul><li>We “created” very little raw data and we have an excellent record in preserving and publishing for re-use our derived data </li></ul></ul></ul><ul><ul><li>Instead, the perception of doubt arose very much more from the contents of the hacked emails and their interpretation </li></ul></ul>
  4. 4. Climate change hacked emails controversy <ul><li>Improved research data management and sharing would have made little difference to the attacks on our integrity </li></ul><ul><ul><li>Not to our critics, perhaps a small role in the cross-over to the main-stream media </li></ul></ul><ul><li>Nevertheless, there are areas where we can improve and we received some criticism in these areas </li></ul><ul><li>The climate science community as a whole should improve </li></ul><ul><ul><li>Data sharing for openness, for re-use </li></ul></ul><ul><ul><li>Improved data management for preserving workflows and linking articles to analysis to data (e.g. JISC ACRID) </li></ul></ul>
  5. 5. Managing and sharing research data: why should we improve? <ul><ul><li>Supports reproducibility (necessary) and repeatability (desirable) </li></ul></ul><ul><ul><ul><li>Maintains (actual and perceived) integrity of research </li></ul></ul></ul><ul><ul><ul><li>Essential because high-stake decisions must be informed by sound scientific assessment </li></ul></ul></ul><ul><ul><li>Supports further exploration of scientific findings </li></ul></ul><ul><ul><ul><li>Scientific findings that are not clear cut (e.g. in the vicinity of the statistical significance) are more sensitive to variations in data, methodological choices, assumptions, etc. </li></ul></ul></ul><ul><ul><li>Supports data re-use for other studies </li></ul></ul><ul><ul><ul><li>We are data poor (despite > 10,000 TB) relative to the complexity of the climate system </li></ul></ul></ul>
  6. 6. <ul><ul><li>Estimated numbers of climate change articles: </li></ul></ul><ul><ul><li>Total > 100,000 </li></ul></ul><ul><ul><li>Just 2009 > 13,000 which is > 1 / hour </li></ul></ul>Grieneisen & Zhang (2011) doi: 10.1038/nclimate1093 Sharing climate data: some challenges
  7. 7. <ul><ul><li>Data volume is already large (> 10,000 TB) </li></ul></ul><ul><ul><li>Projected to grow tenfold by end of this decade </li></ul></ul>Overpeck et al. (2011) doi: 10.1126/science.1197869 Sharing climate data: some challenges
  8. 8. Sharing climate data: some limitations <ul><li>Data with non-disclosure agreements </li></ul><ul><ul><li>Formal or informal agreements </li></ul></ul><ul><li>Holding back for future exploitation </li></ul><ul><ul><li>Controlling use, getting recognition </li></ul></ul><ul><li>Time and resources </li></ul><ul><ul><li>Costs may be obvious, benefits may be unrealised </li></ul></ul><ul><ul><li>Standards, meta-data and software increase the value in re-use, but can increase the time needed </li></ul></ul>
  9. 9. Non-disclosure agreements: real or excuse? <ul><li>Example 1: UK climate data </li></ul><ul><ul><li>Data sets must not be passed on to third parties under any circumstances... Once the project work using the data has been completed, copies of the datasets held by the end user should be deleted ... The introduction of sanctions against individuals or Departments may be considered if breaches occur. </li></ul></ul><ul><ul><ul><li>http://badc.nerc.ac.uk/conditions/ukmo_agreement.html </li></ul></ul></ul>
  10. 10. Non-disclosure agreements: real or excuse? <ul><li>Example 2: Global precipitation data </li></ul><ul><ul><li>One of the most widely used analyses of variations in precipitation across the global land surface is “based on the complete GPCC monthly rainfall station data-base (the largest monthly precipitation station database of the world with data from ca. 85,000 different stations)... Corresponding to international agreement, station data provided by Third Parties are protected .” </li></ul></ul><ul><ul><ul><li>http://gpcc.dwd.de </li></ul></ul></ul>
  11. 11. Non-disclosure agreements: real or excuse? <ul><li>Informal agreements exist too </li></ul><ul><ul><li>Especially with newly collected data provided in advance of its formal publication </li></ul></ul><ul><ul><li>These agreements with colleagues, and the consequences of breaching them, are genuine (regardless of what the ICO might decide if tested under FOI/EIR legislation!) </li></ul></ul>
  12. 12. Holding back data for future exploitation <ul><li>Traditionally, climate data itself aren’t published </li></ul><ul><li>Instead, a journal article is published reporting findings arising from some analysis of the data </li></ul><ul><ul><li>Provides a citable outcome for which the scientist gains credit </li></ul></ul><ul><li>This could take many months to a few years </li></ul><ul><ul><li>Because publishable findings may only arise from extensive analysis of the data or from a collection of multiple records </li></ul></ul><ul><ul><li>and it has to go through peer-review system </li></ul></ul><ul><li>In the meantime, the data may have been shared and used under non-disclosure restrictions </li></ul>
  13. 13. Ways forward…1 <ul><li>Providing data (and other materials) with a publication to allow it to be reproduced (or perhaps repeated) </li></ul><ul><ul><ul><li>E.g. supplementary online materials </li></ul></ul></ul><ul><ul><li>Seen as a burden for all 13,000 climate change articles per year </li></ul></ul><ul><ul><ul><li>Co-benefits must be evident to make this worthwhile </li></ul></ul></ul><ul><ul><ul><li>Citation and data re-use </li></ul></ul></ul><ul><ul><li>Potential proliferation of copies of identical (or perhaps not!) copies of datasets </li></ul></ul><ul><ul><ul><li>Better to provide a unique identifier to existing data that have been used, rather than a copy of the data </li></ul></ul></ul>
  14. 14. Ways forward…2 <ul><li>Data publication </li></ul><ul><ul><li>Newly collected (observed, simulated, derived) datasets published in their own right, not as part of scientific paper </li></ul></ul><ul><ul><li>Meta-data and other accompanying information </li></ul></ul><ul><ul><ul><li>But could speed up the lag from data collection to data publication, and much lighter-touch peer review </li></ul></ul></ul><ul><ul><li>Citable (e.g. DOI) allows due credit </li></ul></ul><ul><ul><li>Identifiable (long-lasting URI) allows unique identification </li></ul></ul><ul><ul><ul><li>Should be unique – updates or modifications to the data should have separate unique identifier (how to link between versions – considered in our JISC ACRID project) </li></ul></ul></ul>
  15. 15. Preferred data archives…1 <ul><li>Storing data with publisher, linked directly to article </li></ul><ul><ul><li>Useful (not essential) for a strong link between article and data </li></ul></ul><ul><ul><li>Not ideal for long term preservation, large datasets, tools for exploring data, searches of databases etc. </li></ul></ul><ul><ul><li>Not ideal for re-use </li></ul></ul><ul><li>University archiving possible, but similar disadvantages </li></ul><ul><li>Discipline-specific, dedicated data centres are preferable </li></ul><ul><ul><li>E.g. World Data Center system ( http://www.icsu-wds.org/ ) </li></ul></ul><ul><ul><li>WDC-Climate, WDC-Paleoclimate, BADC, BODC, ITRDB, CMIP5 </li></ul></ul>
  16. 16. Preferred data archives…2 <ul><li>Sub-discipline specific archives superior to broader archives </li></ul><ul><ul><li>More generalised approaches provide a steeper barrier for submission (e.g. describing all environmental data sets via one standard meta-data model – very large model, much to learn etc.) </li></ul></ul><ul><ul><li>Approaches tailored to sub-disciplines avoid irrelevant structures, formats, meta-data </li></ul></ul><ul><ul><li>Sometimes expertise is needed rather than extra meta-data </li></ul></ul>
  17. 17. Summary points <ul><li>Improved data sharing and links to published findings are needed across the climate science community, to increase the pace of knowledge creation and to support the integrity of published work </li></ul><ul><li>New approaches to publishing newly constructed datasets should be encouraged and adopted where possible </li></ul><ul><ul><li>Bringing benefits of citations, credit and unique identification </li></ul></ul><ul><li>Published articles should identify data used, preferably via citation/identification of already published data rather than providing a further copy of the data </li></ul><ul><li>Subject-specific data archives are preferred, offering better support for data re-use </li></ul><ul><li>Other issues (non-disclosure agreements, time and resources) need to be considered – benefits must be clear to encourage them to be overcome </li></ul>
  18. 19. Global warming issue: high stakes <ul><li>Easy contexts for decision making: </li></ul><ul><ul><ul><li>Cost of reducing GHGs low, adverse impact of not doing so is high </li></ul></ul></ul><ul><ul><ul><li>Cost of reducing GHGs high, adverse impact of not doing so is low </li></ul></ul></ul><ul><li>Decision making in the actual context is much harder: </li></ul><ul><ul><ul><li>Significantly reducing GHGs may prove difficult with moderate to high costs </li></ul></ul></ul><ul><ul><ul><li>Net effects of not reducing GHGs are very uncertain and could range from fairly moderate to very severe adverse impact </li></ul></ul></ul>
  19. 20. Global warming issue: high stakes <ul><li>Easy contexts for decision making: </li></ul><ul><ul><ul><li>Cost of reducing GHGs low, adverse impact of not doing so is high </li></ul></ul></ul>
  20. 21. Global warming issue: high stakes <ul><li>Easy contexts for decision making: </li></ul><ul><ul><ul><li>Cost of reducing GHGs low, adverse impact of not doing so is high </li></ul></ul></ul>
  21. 22. Global warming issue: high stakes <ul><li>Easy contexts for decision making: </li></ul><ul><ul><ul><li>Cost of reducing GHGs low, adverse impact of not doing so is high </li></ul></ul></ul>
  22. 23. Global warming issue: high stakes <ul><li>Easy contexts for decision making: </li></ul><ul><ul><ul><li>Cost of reducing GHGs low, adverse impact of not doing so is high </li></ul></ul></ul><ul><ul><ul><li>Cost of reducing GHGs high, adverse impact of not doing so is low </li></ul></ul></ul>
  23. 24. Global warming issue: high stakes <ul><li>Decision making in the actual context is much harder: </li></ul><ul><ul><ul><li>Significantly reducing GHGs may prove difficult with moderate to high costs </li></ul></ul></ul><ul><ul><ul><li>Net effects of not reducing GHGs are very uncertain and could range from fairly moderate to very severe adverse impact </li></ul></ul></ul>
  24. 25. Time and resources <ul><li>Must not mistake reluctance to commit time and resources with desire to avoid disclosure </li></ul><ul><li>There is a real cost involved </li></ul><ul><ul><li>Standards, meta-data and software increase the value in re-use, but can increase the time needed </li></ul></ul><ul><li>The answer is not simply to obtain funding </li></ul><ul><ul><li>Even with specific funding, unless the benefits of sharing data, meta-data are clear there will be pressure to do things with more obvious benefits </li></ul></ul>
  25. 26. 14/09/11 Wellcome Collection Conference Centre, 13 September 2011 slide Research Integrity Conference The importance of good data management

×