Marieke Guy, Institutional Support Officer, Digital Curation Centre, UKOLN, University of Bath, UK presents on Supporting Libraries in Leading the Way in Research Data Management at Online Information, London 20th -21st November 2012
Supporting Libraries in Leading the Way in Research Data Management
1. Supporting Libraries in
Leading the Way in
Research Data Management
Marieke Guy, Institutional Support Officer,
Digital Curation Centre, UKOLN, University of Bath, UK
Email: m.guy@ukoln.ac.uk
Twitter Id: mariekeguy
Web: http://www.dcc.ac.uk
Online Information, 20th -21st November 2012
UKOLN is supported by:
This work is licensed under a Creative Commons Licence
Attribution-ShareAlike 2.0
1
2. Who Am I?
• Have worked for UKOLN for over 12 years
• Worked on variety of projects:
Subject portals project, IMPACT, Good APIs,
JISC Observatory, cultural heritage work,
digital preservation work, …etc
• Remote worker, into amplified events
• Co-chair of IWMW for a number of years
• Now working for Digital Curation Curation
• Institutional Support Officer helping HEIs with their RDM
2
3. Today’s Talk
• Research data and why is it so important?
• How research data is managed
• What the DCC does
• The role libraries are currently playing
• The role libraries could be playing in the future
3
4. http://www.google.co.uk/imgres?q=illumina+bgi&hl=en&client=firefox-
a&hs=Jl2&rls=org.mozilla:en-GB:official&biw=1366&bih
Research Data
http://www.flickr.com/photos/think
mulejunk/352387473/
http://www.flickr.com/photos/usf
sregion5/4546851916
http://www.flickr.com/photos/wasp
http://www.flickr.com/photos/charleswelch/3 _barcode/4793484478/
4 597432481/
5. What is Research Data?
…whatever is produced in research or evidences its outputs
• Facts
• Statistics
• qualitative
• quantitative
• Not published
research output
“highest priority research data is that which • Discipline specific
5 underpins a research output”
6. A Data Present
“Data underpins our economy and
our society - data about how
much is being spent and where,
data about how schools, hospitals
and police are performing, data
about where things are and data
about the weather.”
Tim Berners Lee, director of W3C.
6
7. Big Data
• Volume
• Velocity
• Variety
“The 1000 Genomes Project generated more DNA
“The 1000 Genomes Project generated more DNA
sequence data in its first 6 months than GenBank
sequence data in its first 6 months than GenBank
7 had accumulated in its entire 21 year existence”
had accumulated in its entire 21 year existence”
8. A Data Future
“The ability to take data - to
be able to understand it, to
process it, to extract value
from it, to visualise it, to
communicate it -that’s going to
be a hugely important skill in
the next decades.”
Hal Varian, Google’s chief economist.
8
Hal Varian, Chief Economist, Google
9. Big Data…and Small Data
• DIY data
• Consumer data
• Crowd Sourced data
• What about Linked data/
Web of data/Open data?
• Databases
• Learning data
• Administrative data
• Long tail data
r os s project: “
JIS C MaRDI-G ast significant
e le ce
vo lume is th resent context, sin
ep
(is sue) in th al problem
”
hnic
it i s ‘ o nly’ a tec
9
10. Some Data Issues
• Scale and complexity – data
deluge – volume, pace
• Infrastructure and management
– Storage, costs & sustainability
• Quality of data
• Reputation – FOI, DPA, computer
misuse
• Openness agenda
• Preservation
• Working in partnerships
• Funding for researchers
10
11. Funding…the Biggest Carrot/Stick?
EPSRC expects all those institutions it funds:
•to develop a roadmap that aligns their policies and processes with
EPSRC’s expectations by 1st May 2012;
•to be fully compliant with these expectations by 1st May 2015.
•http://www.epsrc.ac.uk/about/standards/researchdata/Pages/expectat
ions.aspx
11
12. Data Policies of Funders
http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies
12
13. What is Research Data Management?
“the active management and
appraisal of data over the
lifecycle of scholarly and
scientific interest”
Data management is part of
good research practice
13
14. How is Research Data Managed?
Some areas to think about:
Leicester
University
Data
management
support for
researchers
Web site
• Storage & cloud • Curation
• Data repositories • Digital Preservation
• Metadata & citation • Migration
• File naming • Sharing/openness
• Appraisal, selection & • Security
deletion • Cost
14
15. RDM Activities
What kind of activities are involved?
– producing and sharing of data with research colleagues
in collaborative environments (internal and external)
– file naming
– applying metadata for context and discovery
– ensuring that sensitive data is not shared or accessible
– cleaning data for longer-term use
– selecting mechanisms for data capture and storage
– selecting and appraising data for short and longer-term
retention
– licensing data for reuse
– developing data management plans
•Data management is about making informed decisions
15
16. The Digital Curation Centre
• A consortium comprising units from the Universities of Bath
(UKOLN), Edinburgh (DCC Centre) and Glasgow (HATII)
• launched 1st March 2004 as a national centre for solving
challenges in digital curation that could not be tackled by
any single institution or discipline
• Funded by JISC with additional HEFCE funding from 2011
for the provision of support to national cloud services
• Targeted institutional development
• http://www.dcc.ac.uk/
16
17. Advocacy and Training
How to…
• Appraise and Select
Research Data
• Cite Datasets and Link to
Publications
• Develop a Data
Management and Sharing
Plan
• License Research Data
• Set a RDM service –
coming soon!
How to cite data
17
19. DCC Tools for Engagement
Survey and interview
methodology for
investigating data holdings
and how they are managed
Capability model for establishing
consensus on capabilities and
gaps in current provision, rating
organisation, technology and
resources
Customised institutional
templates for data
management planning
19
20. Institutional Engagement Work
• Funded by the HEFCE through its Universities
Modernisation Fund (UMF)
• Intensive, tailored support to increase research data
management capability
• Originally 18 Higher Education Institutions (HEIs) between
Summer 2011 and Spring 2013
• Can help:
– win the support of senior management
– understand current data practices
– redesign data support services
– Help with policy development and training
20
21. What Part are Libraries Playing?
• RDM requires the input of all support services, but libraries
are taking the lead in the UK
• The library is leading on most of the DCC engagements
Other examples include:
–EDINA at University of
Library Information
Edinburgh
managementthe a
–Bodleian Library at is
University of Oxford
Research key skill in RDM, so
–Subject librarians at
Office it’s a major role for
University of Southampton
IT librarians
21
22. Why are Libraries Taking the Lead?
Because libraries:
•Often run publication repositories so are the stakeholder
called on when questions are raised about the
management of associated data
•Have directed the open sharing of publications so are
well placed to advice on how best to support data
requirements
•Have good relationships with researchers and good
connections with other service departments
•Have a highly relevant skill set
22
23. An Exciting Opportunity
“Researchers need help to
manage their data. This is a
really exciting opportunity for
libraries….”
Liz Lyon, VALA 2012
• Leadership
• Providing tools and support
• Advocacy and training
• Developing data informatics capacity & capability
23
24. Reskilling for Research
But librarians feel they lack appropriate skills…
Skills gap 2-5 years Now
Preserving research outputs 49% 10%
Data management & curation 48% 16%
Complying with funder mandates 40% 16%
Data manipulation tools 34% 7%
Data mining 33% 3%
Metadata 29% 10%
Preservation of project records 24% 3%
Sources of research funding 21% 8%
Metadata schema, disciplinary standards, practices 16% 2%
From RLUK, Re-skilling for Research, Jan 2012, p43
24 Other surveys include DataOne, Cologne Uni, DigCurV
25. Specialist Knowledge
“Very few librarians are likely to
have specialist scientific or
medical knowledge - if you train
as a research scientist or a medic,
you probably won’t become a
librarian.”
Mary Auckland: Reskilling for Research 2012,
RLUK.
25
26. Knowledge Needed…
• Librarians are overtaxed already, lack personal research
experience, have little understanding of complexity and
scale of issue
• Need knowledge and understanding of:
– Researchers’ practice and data holdings
– Research Councils and funding bodies’ requirements
– Disciplinary and/or institutional codes of practice and
policies
– Existing institutional policies and infrastructure
– Reputational risks associated with poor data
management – with respect to researchers’ reputations
as well as that of their institutions
– Data management and sharing benefits
– Research data management tools and technologies
26
27. And Needed Fast…
Implications of “Big Data”
and data science for
organisations in all sectors
McKinsey Global Institute
predicts a shortage of
190,000 data scientists by
2019
http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Inno
27
vation/Big_data_The_next_frontier_for_innovation
28. Is Retooling Possible?
“Significant mismatches exist between research
data and library digital warehouses, as well as
the processes and procedures librarians typically
use to fill those warehouses. Repurposing
warehouses and staff for research data is
therefore neither straightforward nor simple; in
some cases, it may even prove impossible.”
Salo, D. (2010) Retooling Libraries for the Data Challenge,
Ariadne, Issue 64.
• Libraries are organised, research data isn’t
• Need technical systems such as sheer curation, better
sharing of data and improved funding models
28
29. Possible Approaches
• University of Helsinki Library – Knotworking “collaborative
performance between otherwise loosely connected actors
and activity systems”
• University Burnaby, British Columbia - providing research
data services since the 1970s – currently exploring funding
gaps
• Deutsche Nationalbibliothek - DP4lib project (Digital
Preservation for libraries) where the library is acting as a
service-broker for digital data curation
• Research libraries - Opportunities for Data Exchange (ODE)
project as an exemplar project, which gives shares
emerging best practice
• Data intelligence 4 librarians, Delft University of
Technology
29
30. Training Librarians: RDMRose
• JISC funded project to produce OER learning materials in
RDM tailored for Information professionals
• Led by Sheffield University iSchool
• Practitioner community based on the White Rose University
Consortium’s libraries at the Universities of Leeds,
Sheffield and York
• Deliverables include curriculum, module within taught
masters course in Sheffield, self study version
• Much of course concentrates on teaching librarians about
research and the research process
• RDMRose working with Stephen Pinfield on a web-based
survey of current library RDM activity
• http://www.sheffield.ac.uk/is/research/projects
30
32. Partnership Approaches
• Research 360,
University of Bath:
• UKOLN-DCC
• Library
• IT services
• Research
Support Office
• Doctoral
Training
Centres
http://blogs.bath.ac.uk/research360/
32
33. Embedded Librarians
“Librarians may need to raise their
profile, become ‘researchers’
themselves; getting embedded in the
research community;
gaining credibility; and collaborating
as equals.”
Bent et al, 'Information literacy in a researcher's
learning life' in New Review of Information
Networking, 13 (2), 2007
33
34. So What Next?
• Address the lack of data informatics skills
• Mainstream data librarians & data scientists
• Embed new skills into LIS & iSchool curriculum
Lyon, ‘The Informatics Transform: re-engineering libraries for
the data decade’ in IJDC, 7(1), 2012
hts the
an ag ement highlig bracing
Re search data m n’s skillset. By em
licability of the libraria t, librarians
will
ap p u ppo r
p rovide RDM s d as .
the need to of institutional agen
remain at the heart
34
35. Resources to Look at…
• Riding the Wave report and many others emphasise the
relevance of research data to current academic working
• RLUK/Mary Auckland: Reskilling
for Research
• Sheila Corrall: Libraries,
Librarians and Data
• DigCurv
• Book: Managing Research Data
• HEIs research data support
pages
35
36. Thank You
• Thanks to DCC colleagues for contributing to slide material.
Any questions?
m.guy@ukoln.ac.uk
36
Editor's Notes
Bath – research led uni – one of top 10 in UK
Will think about what research data is and it’s importance Later look at where libraries fit into the picture
Gene sequencing machines at the Beijing genomics institute – one of largest in worls– crunching out data 24/7 Air Quality Monitoring on Sierra National Forest Lovell Radio Telescope – scanning the night sky Some facts about the Lovell Radio Telescope. Mass of telescope 3200 tonnes. Mass of bowl 1500 tonnes. Diameter of bowl 76.2 metres. Maximum height above ground 89 metres. Very impressive. The Lovell Radio Telescope at Jodrell Bank dominates the Cheshire countryside. WDI4500 2D Barcode scanner w/ barcodes Wasp WDI4500 2D barcode scanner launched July 2010 can scan 2D barcodes, 1D (linear) barcodes and postal barcodes. www.waspbarcode.com/scanners/wdi4500_barcode_scanner.asp National Gene Bank in Shenzhen, China Sensor equipment to monitor air quality in desert Streaming data to our desktops
large hadron collider the world's largest and highest-energy particle accelerator. Based at CERN Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it. Systems in place like Hadoop
e Apache Hadoop Big Data Platform The MaRDI - Gross project is funded by JISC to support big-science projects in developing suitable Data Management and Preservation (DMP) plans University of Glasgow & University of Lancaster
Queen's University in Belfast has been told by the Information Commissioner to hand over 40 years of research data on tree rings, used for climate research. Douglas Keenan, from London, had asked for the information in 2007 under the Freedom of Information Act. Mr Keenan is well-known for his questioning of scientists who propose a human cause for climate change. Queen's University refused his request saying it was too expensive, but it is now considering its position. The university claimed that as the information was unfinished, had intellectual property rights and was commercially confidential information, it did not have to pass it on. Philip Morris tobacco company wanted info on children and smoking. Refused in end – retracted claim.
An interesting trend to emerge is who is addressing RDM within the unis. The library is leading in most cases and is involved regardless of who ’ s championing the cause. Research offices are often the lead partner – seemingly for strategic reasons of senior buy-in and financial commitment. IT are only leading in 2 out of the 20 cases and are disengaged / absent in a few others.
There are nine areas where over 50% of the respondents with Subject Librarian responsibilities indicated that they have limited or no skills or knowledge, and in all cases these were also deemed to be of increasing importance in the future. These are listed in order of the importance in 2 - 5 years that respondents placed on them.
Knotworking is characterised by collaborative performance between otherwise loosely connected actors and activity systems. The idea is to gather experts for a short period of time to solve a specific problem in the academic library. It is clear that librarians cannot be committed to a single research group because there are not enough librarians about and the work is very demanding. So at Helsinki University they have created a new model of customised and standardised services, which are currently at implementation stage, For example the librarians have created a quick reference guide for one research group on how to handle research data management. Research data management is something that librarians have only recently become involved with and they still have much to learn. The system of knotworking was bringing research groups back to the library and generating new demanding services.