Curating the Scholarly Record: Data Management and Research Libraries
Introduction to Research Data Management
1. Introduction to
Research Data Management
Oxford Brookes University
Faculty of Technology, Design & Environment
Dr Angus Whyte, DCC
27thSept 2012
This work is licensed under a Creative Commons Attribution 2.5 UK: Scotland License
2. The Digital Curation Centre
• Consortium of 3 units in Universities of Bath
(UKOLN), Edinburgh (DCC Centre) and Glasgow (HATII)
• Launched 1st March 2004
• National centre since 2004 – address challenges in digital
curation that cross institutions or disciplines
• Funded by JISC, plus HEFCE funding from 2011 for
• support to national cloud services
• targeted institutional development
3. DCC Mission
“Helping to build
capacity, capability and skills in
data management and curation
across the UK’s higher education
DCC Phase 3
research community”
Business Plan
4. “What’s it got to do with me?”
Drivers and benefits to HEI’s developing
infrastructure and services
to support research data management.
4
5. Introduction
• What is research data management?
• Why is it important?
• What risks does it address?
• What benefits does it provide?
• What is good practice?
5
6. What is Research Data Management?
Caring for, facilitating access
Preserving and Adding value
to research data throughout
its lifecycle.
Organisation, Resources and
Technology required to
support and sustain.
6
7. What Kinds of Data?
…whatever is produced in research or evidences its outputs
7
8. RDM… data centred project management
• Planning data management
• Creating data
• Naming and describing
• Storing active data
• Selecting or disposing
• Depositing and sharing
• Protecting sensitive data
• Licensing access
8
9. An emerging art for institutions
A design space bounded by two principles…
Best way to make your
data work for yourself is
to make it work for
someone else
“Coolest things to do with
your data will be thought of
by someone else”*
*Jo Walsh & Rufus Pollock Open Knowledge Foundation
http://www.okfn.org/files/talks/xtech_2007/
9
10. An emerging art for institutions
A design space bounded by two principles… and constraints
Best way to make your
data work for yourself is
to make it work for
someone else
£££
“Coolest things to do with
your data will be thought of
by someone else”*
*Jo Walsh & Rufus Pollock Open Knowledge Foundation
http://www.okfn.org/files/talks/xtech_2007/
10
11. An emerging art for institutions
A design space bounded by two principles… and constraints
Best way to make your
data work for yourself is
to make it work for
someone else
REF
£££
“Coolest things to do with
your data will be thought of
by someone else”*
*Jo Walsh & Rufus Pollock Open Knowledge Foundation
http://www.okfn.org/files/talks/xtech_2007/
11
12. Why is RDM Important?
Convergence in research policy
“Rapid and pervasive technological
change has created new ways of
acquiring, storing, manipulating
and transmitting vast data
volumes, as well as stimulating
new habits of communication and
collaboration amongst scientists.
These changes challenge many
existing norms of scientific
behaviour”
12
13. Why is RDM Important?
Convergence in research policy
“We have opened up much public
data already, but need to go much
further in making this data
accessible. We believe publicly
funded research should be freely
available. We have commissioned
independent groups of academics
and publishers to review the
availability of published
research, and to develop action
plans for making this freely
available”
13
14. Policy moves towards openness
Organisation for Economic Co-operation and
Development describes data as a public good
that should be made available
Research Councils UK in its code of good
research conduct says data should be preserved
and accessible for 10 years +
ResearchFunder data policies increasingly
demanding of institutional commitment and
provisions...
14
15. RCUK Common Principles on Data Policy
Public good: Publicly funded research data are produced in the public interest
should be made openly available with as few restrictions as possible
Planning for preservation: Institutional and project specific data management
policies and plans needed to ensure valued data remains usable
Discovery: Metadata should be available and discoverable; Published results
should indicate how to access supporting data
Confidentiality: Research organisation policies and practices to ensure
legal, ethical and commercial constraints assessed; research process should not
be damaged by inappropriate release
First use: Provision for a period of exclusive use, to enable research teams to
publish results
Recognition: Data users should acknowledge data sources and terms &
conditions of access
Public funding: Use of public funds for RDM infrastructure is appropriate and
must be efficient and cost-effective.
http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
16. Funder Expectations
EPSRC expects all those institutions it funds
• to develop a roadmap that aligns their
policies and processes with EPSRC’s
expectations by 1st May 2012;
• to be fully compliant with these
expectations by 1st May 2015.
• Compliance will be monitored and non-
compliance investigated.
• Failure to share research data could result
in the imposition of sanctions.
16
18. Funder Expectations
Applications submitted on or after 1st November 2012 will need to take account of the
new guidance and application form requirements.
The key changes are that:
All proposals will be required to contain …a new ‘Technical Summary’
Those with digital outputs or digital technologies that are essential to their
planned research outcomes will be expected to submit a technical
attachment.
Current technical appendix section of the Je-S form will be removed.
http://www.ahrc.ac.uk/News-and-Events/News/Pages/Changes-to-all-AHRC-Research-Grant-and-Fellowships-applications.aspx
18
19. Data Policies by Funder
http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies
19
20. It’s not just top-down!
• Data intensive research
• Demand from public to engage, criticise
• Citizen science – new stakeholders in
research
• Digital engagement and open data in
creative industries and built environment
• Demands more planning and support
20
24. Public demand for data & engagement
“We have opened up much public
data already, but need to go much
further in making this data
accessible. We believe publicly
funded research should be freely
available. We have commissioned
independent groups of academics
and publishers to review the
availability of published
research, and to develop action
plans for making this freely
available”
24
27. Open data in art and design
bus routes data sculpture
• “a 3D data sculpture of the Sunday Minneapolis / St. Paul
public transit system, where the horizontal axes represent
directional movement and the vertical represents time.
the piece titled "bus structure 2am-2pm" is constructed
of 47 horizontal layers, each forming a map of the bus
routes that run during a given interval of time. looking
down from the top, one sees the Sunday bus map of the
Twin Cities, while looking from the side, the times
appears as strata building upwards. within each
layer, every transit route that operates at that time is
Reusing public data to create an object represented by wood balls placed at its scheduled
stops, connected by the horizontal copper rods. each
with reuse value? route moves through time and space differently, carving
out its own trail that may or may not meet conveniently
with other routes.
• in total 42 routes, 47 intervals of time & 296 bus stops
are depicted by about a half-mile of copper rod & 6,000
wood balls, suspended in the air by hundreds of blue
threads
http://infosthetics.com/archives/2008/05/bus_routes_data_sculpture.html
27
29. …. and scholarly publication
1. Deposit data in
repository
2. Submit data paper
•Context
•Method
•Data scope
•Data description
3. Peer reviewed
4. Data & paper DOIs
DOI plus citation = career reward for data mgmt
http://openarchaeologydata.metajnl.com/
29
30.
31. Common practice in Universities
‘Departments typically don’t have guidelines or norms for personal back-up
and researcher procedure, knowledge and diligence varies tremendously.
Many have experienced moderate to catastrophic data loss.’
Incremental Project Scoping Study and Implementation Plan
http://www.lib.cam.ac.uk/preservation/incremental/documents/Incremental_
Scoping_Report_170910.pdf
‘The current environment is such that responsibility for good data
management is devolved to individual researchers and in practice PIs set the
'rules' and establish the cultural practices of the research groups and this
means there is good data management practice going on in pockets but no
consistency across groups. There is also consequently a high risk of data
losses by a number of means’.
MaDAM Project Requirements Analysis
http://www.merc.ac.uk/sites/default/files/MaDAM_Requirements%20_%20ga
p%20analysis-v1.4-FINAL.pdf
31
32. Risks if you don’t address…
• Loss of funding
• Legal non-compliance DPA, FOI…etc.
• Research integrity, reputation
• Inability to verify, scrutinise
• Loss of data or (re)usability
• Outputs lack visibility
• Diminished public communication
32
33. Risks if you don’t…
• Loss of funding
• Legal non-compliance DPA, FOI…etc.
• Research integrity, reputation
• Inability to verify, scrutinise
• Loss of data or (re)usability
• Outputs lack visibility
• Diminished public communication
33
34. Benefits if you do…
• Secure storage for sensitive data
• Improved access for scholarly communication
• Scrutiny and verification of research
• Research integrity, reputation
• Secondary use and data mining
• Opportunities for collaboration
• Increased visibility, citation
• Knowledge transfer, public communication
Benefits from Infrastructure Projects in JISC MRD
http://www.jisc.ac.uk/media/documents/programmes/m
rd/RDM_Benefits_FinalReport-Sept.pdf
34
35. E.g. MaDAM project
Pilot project offering secure storage, description, flexible sharing
•“I can put my hands straight on my data, through one application”
•“I can easily share & find data within my research group”
•“I have support in data management planning”
•“I can publish my data, under my control, with the wider community”
•“I’m not repeating experiments unnecessarily”
•“I’m freed up from some of my data management duties to
concentrate on my research”
Researchers spending less time managing data, getting more value
for their efforts and freeing more time for research.
Benefits from Infrastructure Projects in JISC MRD
http://www.jisc.ac.uk/media/documents/programmes/m
rd/RDM_Benefits_FinalReport-Sept.pdf
35
36. Collaborationopportunities from data integration
HALOGEN(History, Archaeology, Linguistics, On
omastics, GENetics):
Throwing light on the past through cross-disciplinary databasing
Portable Antiquities
Scheme (British Museum)
Place-names
(Nottingham)
Surnames
Genetics
IT hosting and GIS
Best practice:
#JISCMRD, UKRDS, DCC, RIN
, internatlional
http://www.le.ac.uk/halogen
37. Collaborationopportunities from data integration
HALOGEN(History, Archaeology, Linguistics, On
omastics, GENetics):
Throwing light on the past through cross-disciplinary databasing
Portable Antiquities
Scheme (British Museum)
Place-names
(Nottingham)
Surnames
Genetics
IT hosting and GIS
Best practice:
#JISCMRD, UKRDS, DCC, RIN
, internatlional
http://www.le.ac.uk/halogen
38. Direct benefits from HALOGEN
• New research opportunities
– Cross database work – seed new research samples
• Verification, re-purposing, re-use of data
– Cleaning & enhancing private research datasets for reuse & correlation
– Increased transparency
– excellent training for best practice in research data management
• Increasing research productivity
– Build in cleaning, annotation, enhancement into normal research
workflows
– research datasets may immediately be reusable and interoperable
• Impact & Knowledge Transfer
– Reuse IT infrastructure: EU FP7 Mintweld (industrial engineering) &
BRICCS National Health Service/University Trust data sharing.
• Increasing skills base of researchers/students/staff
39. Data access raises visibility
Data with DOI = citeable research output
39
40. Taking it step by step…
• Awareness and training
• ‘Audits’ to assess current assets, practices and
requirements, gaps in provision
• Identifying quick wins while developing long-
term plan
• Not reinventing:
integrating, adapting, augmenting
– e.g. policies, doctoral training, storage
40
41. Who to involve?
• Researcher(s) • Funders
• Research support officers / • Archive / long-term data
project staff repository
• Lab technicians • Senior management
• Librarians / Data Centre staff • Others...
• Faculty ethics committees
• Institutional legal/IP advisors
• FOI officer / DPA officer /
records manager
• Computing support
• Institutional compliance
officers
41
43. DCC support activities
Needs assessment
CARDIO Tool– collaborative assessment & benchmarking of
RDM strengths/weaknesses
Data Asset Framework– interviews to scope current RDM
practice and recommend improvements
Workflow assessment – methodology for analysing current RDM Developing strategic institutional RDM framework
workflows
Strategy development – getting key people together to discuss/plan for
RDM
Policy development – scoping, defining, embedding research data policies
Delivering support
Costing - assist with the development of costing and pricing for RDM
CustomisedData Management Plans – templates / guidance to
services
be added to DMP Online
Risk management - identify risks in RDM practice and recommend
Training – institutional/disciplinary tailored courses, online
mitigations
resources
Institutional data catalogues - recommend options for exposing metadata
Incremental – repackaging existing support to raise awareness
about your research data via CRIS systems, repositories, or a mix of these
and make guidance more meaningful to researchers
43
44. Roles & responsibilities
Liz Lyon “The Informatics Transform: Re-Engineering Libraries for the Data
Decade” International Journal of Digital Curation Volume 7, Issue 1 | 2012
44