Data management: international challenges, national infrastructure, and institutional responses
1. Data Management: International challenges, National Infrastructure, and Institutional Responses - an Australian Perspective Dr Andrew Treloar Director of Technology Australian National Data Service
8. 8 Summary Not a first class object Unmanaged Disconnected Unfindable Unreusable
9. Why re-use data? Efficiency Validation Integrity Value for money Self-interest
10. Astronomy case study Hubble Space Telescope (HST) operating since 1990 Observations are proposed, and if accepted, data is collected and made available to the proposers – who then write a research paper Each year around 1,000 proposals are reviewed and approximately 200 are selected, for a total of 20,000 individual observations Data is stored at the Space Telescope Science Institute and made available after embargo period There are now more research papers written by “second use” of the research data, than by the use initially proposed 10
12. Cancer micro-array trial case study Piwowar, et. al., “Sharing Detailed Research Data Is Associated with Increased Citation Rate” http://www.plosone.org/article/info:doi/10.1371/journal.pone.0000308 Looked at the citation history of cancer microarray clinical trial publications Found that publicly available data was associated with a 69% increase in citations, independent of journal impact factor, date of publication, and author country of origin 12
13. Alzheimer’s Disease NeuroImaging Initiative Collaborative effort to find brain biomarkers for Alzheimer’s disease Key: All brain scans and other data freely available to scientific community without embargo. Over 3K full downloads and 1M scan downloads by over 400 investigators world-wide Over 100 publications 13 Institut Douglas CC BY-NC-ND http://www.fnih.org/work/areas/chronic-disease/adni
15. National approaches Number of different countries: UK, US, DE, NL Different environments => different ecosystems and so some local tradeoffs But some common themes emerging: Do the things that only you can do Be the ‘voice for data’ Prime the pump
21. ANDS is enabling the transformation of: Data that are: Unmanaged Disconnected Invisible Single use 17 Collections that are: Managed Connected Findable Reusable so that Australian researchers can easily discover, access and re-use data
22. 18 Defining characteristics of ANDS Building national services Engaging with institutions not researchers (mostly) Working within funding constraints use, not amount! Building the Australian Research Data Commons
23.
24. 20 ANDS Programs Frameworks and Capability Seeding the Commons Data Capture Metadata Stores ARDC Core Public Sector Data Applications
28. 24 Driven by Australian Code for Responsible Conduct of Research Equivalent of UKRIO’s Code of Practice for Research: Promoting good practice and preventing misconduct Takes significant time to get accepted ANDS providing models of good practice Seeding the Commons U->M Data management policy and planning
29. 25 Retrospective data description Different selection mechanisms Seeding the Commons U->M Fixing the past
30. 26 Improving internal CRIS systems Better integration Moving beyond publications Better links to data collection descriptions Seeding the Commons, Metadata Stores D->C
31. 27 Facilitating easier/better capture of data and metadata from selected ‘instruments’ Making the right thing easier Improving quality of metadata Data Capture U->M S->R Fixing the future
32. 28 Describing institutions research data assets Series of metadata stores rollouts plus some ancillary activity Metadata Stores, Seeding the Commons, Data Capture D->C I->F
So, let’s look at the state of data in scholarly communication. Unfortunately, it’s inconvenient, imprisoned, invisible, inaccessible, and incomprehensible
Need to retype
Near impossible to liberate. Talk about ChemXSeer example and DataThief Java application
Too transformed
Discipline scientist may know how to get these data but I don’t
NOTE: Some of these arguments are at individual, national, global levelEfficiency for researcher – don’t reinvent wheelValidation – repeatability of researchIntegrity – of scholarly recordValue for Money for funder – public money funded it, it should be available to public (ClimateGate!)Self-interest – sharing with a future self, greater visibility, more citationsSo, what are some good stories around data sharing?
Number of initiatives around the world working to do a better job on data: NSF DataNet (Sayeed/Bill later in conference), JISC Managing Research Data, NL SURF/DANS
I’m going to take a programmatic view (because that explains how we are funding stuff), while recognising that the issues don’t necessarily fit neatly inside those boundaries
And thank you for the opportunity to speak to you this afternoon.