Presentation I was supposed to give at "Scotland’s Collections and the Digital Humanities" workshop in Edinburgh on May 2nd 2014. Illness prevented it, but my heroic DCC colleague Jonathan Rans stepped up and delivered the presentation on my behalf.
The Universal GTM - how we design GTM and dataLayer
Research data management: a tale of two paradigms:
1. Research Data Management: a tale
of two paradigms
‘Scotland's National Collections and the Digital Humanities’ workshop series #2
Tom Phillips, A
Humument
(1970, 1986,
1998, 2004,
2012…)
Martin Donnelly, Digital Curation Centre, University of Edinburgh
Edinburgh, 2 May 2014
2. Overview
1. Introductions and definitions
The Digital Curation Centre
Research data management
What do we mean by ‘data’, exactly?
2. Data as a hot topic: politics and practical concerns
3. Data in/and the Arts and Humanities
How the Arts and Humanities differ
Strengths and weaknesses
Reflections on opportunities for exploration at national level
4. Resources
4. The Digital Curation Centre
The (est. 2004) is…
A UK centre of expertise in digital preservation.
Emerged from the e-Journal preservation field, now
with a particular focus on research data management
(RDM)
Based across three sites: Universities of Edinburgh,
Glasgow and Bath
Working with a number of UK universities to identify
gaps in RDM provision and raise capabilities across the
sector
Also involved in a variety of national and international
collaborations…
6. What is research data management?
“the active management and appraisal
of data over the lifecycle of scholarly
and scientific interest”
Data management is a part of good
research practice.
- RCUK Policy and Code of Conduct on the
Governance of Good Research Conduct
7. The old way of doing things
1. Researcher collects data (information)
2. Researcher interprets/synthesises data
3. Researcher writes paper based on data
4. Paper is published (and preserved)
5. Data is left to benign neglect, and
eventually ceases to be accessible
8. The new way of doing things
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
SHARE
…and
RE-USE
The DataONE
lifecycle model
9. Other models are available…
Ellyn Montgomery, US Geological Survey
10. Helicopter view:What are the benefits of RDM?
TRANSPARENCY: The data that underpins research
can be made open for anyone to scrutinise, and
attempt to replicate findings.
EFFICIENCY: Data collection can be funded once, and
used many times for a variety of purposes.
RISK MANAGEMENT: A pro-active approach to data
management reduces the risk of inappropriate
disclosure of sensitive data, whether commercial or
personal.
PRESERVATION: Lots of data is unique, and can only
be captured once. If lost, it can’t be replaced.
11. Definitions vary from discipline to discipline, and from funder to funder…
Here’s a science-centric definition:
“The recorded factual material commonly accepted in the scientific community as
necessary to validate research findings.” (US Office of Management and Budget,
Circular 110)
[Addendum: This policy applies to scientific collections, known in some disciplines
as institutional collections, permanent collections, archival collections, museum
collections, or voucher collections, which are assets with long-term scientific value.
(US Office of Science and Technology Policy, Memorandum, 20 March 2014)]
And another from the visual arts:
“Evidence which is used or created to generate new knowledge and
interpretations. ‘Evidence’ may be intersubjective or subjective; physical or
emotional; persistent or ephemeral; personal or public; explicit or tacit; and is
consciously or unconsciously referenced by the researcher at some point during
the course of their research.”
(Leigh Garrett, KAPTUR project: see http://kaptur.wordpress.com/
2013/01/23/what-is-visual-arts-research-data-revisited/)
Okay, but what is ‘data’ exactly?
12. Are the goals – or indeed the concepts – of evidence, facts, validation, replication
still central in disciplines reliant on subjectivity, interpretation, argument and
qualities of expression?
How do we identify, preserve and share ephemera, emotions, the unconscious…?
How do we protect rights around creative data? What are the financial/ ownership
issues accompanying creative / Arts research?
Is it clear where creative research begins and ends? How can we differentiate
between funded research and unfunded personal work?
What problems are introduced by practice-driven research?
To what extent is non-digital material a problem? Can we share approaches to this
with other subject areas (e.g. biology, geology)?
What other characteristics do Arts and Humanities data have in common with
those of the Sciences? Which other disciplines share these issues more generally?
A few questions around data in the Arts and
Humanities
14. Nature, 09/08 Economist, 02/10
Popular Science,Science, 02/11
Nature, 09/09ACM, 12/08
InformationWeek, 08/10 Computerworld,
A hot topic: 5 years of front pages…
15. Developments in sensor technology,
networking and digital storage enable
new research and scientific paradigms
As costs also fall, possibilities for data
sharing, citation and re-use become
much more widespread
Journals dedicated solely to publishing
data have even started to appear. That’s
not to say it’s an entirely new thing:
journals have always published data,
just never before at such scale…
Technology
17. Repurposing /VfM via data re-use
Ships’ log books build picture of climate
change 14 October 2010
You can now help scientists understand the
climate of the past and unearth new historical
information by revisiting the voyages of First
World War Royal Navy warships.
Visitors to OldWeather.org will be able to
retrace the routes taken by any of 280 Royal
Navy ships. These include historic vessels such
as HMS Caroline, the last survivor of the 1916
Battle of Jutland still afloat. By transcribing
information about the weather and interesting
events from images of each ship's logbook, web
volunteers will help scientists build a more
accurate picture of how our climate has
changed over the last century.
http://www.nationalarchives.gov.uk/news/503.
htm
Detail from Royal Navy Recruitment poster, RNVR
Signals branch, 1917 (Catalogue reference: ADM
1/8331)
Endeavour, 1768-71
(Captain Cook)
HMS Beagle,
1830-34
HMS Torch,
1918
18. 6.9 The Research Councils expect the researchers they fund
to deposit published articles or conference proceedings in
an open access repository at or around the time of
publication. But this practice is unevenly enforced.
Therefore, as an immediate step, we have asked the
Research Councils to ensure the researchers they fund
fulfil the current requirements. Additionally, the Research
Councils have now agreed to invest £2 million in the
development, by 2013, of a UK ‘Gateway to Research’. In
the first instance this will allow ready access to Research
Council funded research information and related data but
it will be designed so that it can also include research
funded by others in due course. The Research Councils will
work with their partners and users to ensure information is
presented in a readily reusable form, using common
formats and open standards.
Government pressure/support
http://www.bis.gov.uk/assets/biscor
e/innovation/docs/i/11-1387-
innovation-and-research-strategy-
for-growth.pdf
19. (Aside: Open Data)
Open Data is a philosophy, underpinned by
pragmatism… transparency + utility.
“Open data is the idea that certain data should be
freely available to everyone to use and republish as
they wish, without restrictions from copyright, patents
or other mechanisms of control.” – Wikipedia
Governments, cities etc are all getting onboard
Open Knowledge Foundation is basically the political /
activist wing: http://okfn.org/
From the government / industry side, we have the
Open Data Institute: http://theodi.org/
20. Controversial FOI requests to…
- University of East Anglia
- Queens University Belfast
- University of Stirling
Risk management
21. - Reinhart & Rogoff (2010) “Growth in a Time of Debt” - paper not peer-reviewed, data
not initially made available…
- Very influential and repeatedly cited by politicians to lend weight to economic strategy
- Multiple issues (selective exclusions, unconventional weightings, coding error)
identified by a postgrad researcher attempting to replicate the paper’s findings
- Widespread embarrassment, but at least the errors were discovered!
Research quality and integrity
22. Why don’t we live in a data sharing utopia?
Four main reasons…
Lack of understanding of the fundamental issues
Lack of joined-up thinking within institutions,
countries, internationally…
Issues around ownership / privacy
Technical/financial limitations and the need for
appraisal
National bodies may be well-placed to address some
of these
23. What do research funders have to say? (i)
Seven “Common Principles on Data
Policy” – Data as a public good;
Preservation; Discovery; Confidentiality;
Right of first use; Recognition; Public
funding for RDM
Six of the seven RCUK councils require
data management plans, or equivalent, at
the application stage
The seventh (EPSRC) requires nothing
short of an institutional data infrastructure
24. 3. DATA INTHE ARTS AND HUMANITIES
Kailie Parrish, “In My Dreams” http://datavisualization.ch/showcases/in-my-dreams/
25. What do research funders have to say? (ii)
AHRC requires that significant
electronic resources or datasets
are made available in an accessible repository for at least
three years after the end of the grant
AHRC used to run several data services. Most stopped
being funded in 2008, but the Archaeology Data Service
remains at York, and the Visual Arts Data Service at UCA.
ESRC applicants submit a statement on data sharing
in the relevant section of the Je-S form, and provide
a two-page data management and sharing plan
addressing 9 distinct themes
Datasets must be offered to the UK Data
Archive on conclusion of the project
26. Some characteristics of Arts and Humanities data are likely to
require a different kind of handling from that afforded to other
disciplines
Arts ‘data’ is often personal, and creative data in particular may not
be factual in nature. Furthermore, it may be quite valuable or
precious to its creator. What matters most may not be the content
itself, but rather the presentation, the arrangement, the quality of
expression…
This tends to be why Open Access embargoes are often longer in
the Arts and Humanities than other areas
Digital ‘data’ emerging in the Arts is as likely to be an outcome of the
creative research process as an input to a workflow. This is at odds
with the scientific method, and how most RDM resources are
described.
Problems re. data in the Arts and Humanities
27. Scientific and other methods…
The scientific method is a body of
techniques for investigating phenomena,
acquiring new knowledge, or correcting and
integrating previous knowledge.
To be termed scientific, a method of inquiry
must be based on empirical and measurable
evidence subject to specific principles of
reasoning.
The Oxford English Dictionary defines the
scientific method as: “a method or procedure
that has characterized natural science since
the 17th century, consisting in systematic
observation, measurement, and experiment,
and the formulation, testing, and modification
of hypotheses.”
Source:
http://en.wikipedia.org/wiki/Scientific_method
An art methodology differs from a science
methodology, perhaps mainly insofar as the artist is
not always after the same goal as the scientist. In art it
is not necessarily all about establishing the exact truth
so much as making the most effective form (painting,
drawing, poem, novel, performance, sculpture, video,
etc.) through which ideas, feelings, perceptions can
be communicated to a public. With this purpose in
mind, some artists will exhibit preliminary sketches
and notes which were part of the process leading to
the creation of a work. Sometimes, in Conceptual art,
the preliminary process is the only part of the work
which is exhibited, with no visible end result displayed.
In such a case the "journey" is being presented as
more important than the destination.
Source: http://en.wikipedia.org/wiki/Art_methodology
28. There’s nothing new about data re-use in the Arts and Humanities;
it’s an integral part of the culture, and always has been
Think Kristeva’s intertextuality, Barthes’ ‘galaxy of signifiers’,
Shakespeare’s plots, Lanark’s assorted ‘plagiarisms’, Edwin Morgan’s
‘found’ newspaper poems, Marcel Duchamp, variations on a theme,
collage and intermedia art, T.S. Eliot, sampling/hip-hop, etc etc
(http://www.slideshare.net/martindonnelly/data-reuse-in-the-arts)
However, it’s often more fraught than data re-use in other areas
(such as the Sciences)
For starters, people tend not to think of their sources or influences
as ‘data’, and the value and referencing systems are quite different
Furthermore, practice /praxis based research is pretty much the sole
preserve of the Humanities, and research / production methods are
not always rigorously methodical / linear…
Strengths and weaknesses re. data in the Arts and
Humanities
29. REFUGE: Many universities are developing data repositories for their
funded research data, but a comparatively high proportion of Arts
research does not receive external funding, so there’s less incentive
for the institutions to provide support (no stick, and little demand
from researchers)
APPRAISAL, STEWARDSHIP AND DISCOVERY: Furthermore, it is
(probably/usually) preferable for data to be deposited in discipline-
or domain-specific repositories. There’s a gap in the market, and
national bodies are already experienced in managing large digital
collections.
SUPPORT AND ADVOCACY: Humanities scholars are entirely
comfortable with the use of primary and secondary sources. It just
requires a little translation for the core concepts of RDM to become
meaningful in an Arts and Humanities context. The trust is already
there.
National roles around Arts and Humanities data?
31. i. Arts-centric resources
DCC and University of the Arts London were both involved in the KAPTUR
project: http://kaptur.wordpress.com
DCC subsequently ran an institutional engagement with UAL between
2011 and 2013, which developed…
A data management guidance web area:
http://www.arts.ac.uk/research/research-environment/research-
management/data-management/
An institutional policy: http://www.arts.ac.uk/media/research/documents/UAL-
Research-Data-Management-Policy.pdf
A UAL data management planning template in http://dmponline.dcc.ac.uk
A UAL data community-of-practice is being launched, with support of the senior
management
Events
RDMF10: “Research data management in the Arts and Humanities”, Oxford,
September 2013
UoE Digital Humanities workshop: “Managing Humanities Research Data”,
Edinburgh, January 2014
32. ii. Other DCC resources
Publications
Briefing Papers and How-To Guides
Training
e.g. DC101 events and Curation Reference
Manual
Advice
e.g. Disciplinary metadata,
www.dcc.ac.uk/resources/metadata-
standards
Tools
DMPonline, CARDIO, Data Asset
Framework, DRAMBORA
33. iii. Further resources
JISC Services
RDM resources, www.jisc.ac.uk/guides/research-data-management
EDINA and Mimas (national data centres)
JISCMRD projects (Phase 1 (2009-2011) and Phase 2 (2011-2013))
covered a wide range of topics, including infrastructure, planning,
training, support and guidance, events and tools
Universities
Great RDM materials are available from Edinburgh, Cambridge, Oxford,
Glasgow, Bristol, and many other places
Alliance of Digital Humanities Organizations (ADHO)
http://digitalhumanities.org/
34. “Ten recommendations for libraries to get started with research data
management: Final report of the LIBER working group on E-Science /
Research Data Management” - Christensen-Dalsgaard et al. (LIBER,
2012)
“Curating research data: the potential roles of libraries and information
professionals”, Nielsen & Hjørland (2014) Journal of Documentation,
Vol. 70 Iss: 2, pp.221 - 240
For more on potential future roles for librarians, see slides from Open
Repositories 2013 workshop: http://tinyurl.com/whyte-OR13
Two recent surveys about libraries and data…
USA & Canada – “Academic Libraries and Research Data Services: Current
practices and plans for the future” - Tenopir, Birch & Allard, University of
Tennessee (Association of College & Research Libraries, June 2012)
UK – “Research data management and libraries: Current activities and
future priorities” - Cox & Pinfield, Information School, University of
Sheffield (Journal of Librarianship and Information Science, June 2013)
iv. Further reading
35. Last slide: take-home messages
Research data management (RDM) is…
An integral part of doing quality research in the 21st century
Increasingly expected / required by funders, publishers and
others
An opportunity for new discoveries and different
approaches to research
A safeguard against inappropriate data disclosure
Sometimes complicated in the Arts and Humanities!
And hence… an activity that requires careful planning and
consideration, and – ideally – coordination and support at
many levels
36. Thank you
Questions?
Image credits
Slide 2 (forest) – http://assets.worldwildlife.org/photos/934/images/hero_small/forest-overview-HI_115486.jpg?1345533675
Slide 3 (dictionary) – http://www.flickr.com/photos/dougbelshaw/
Slide 13 (politics) – https://www.flickr.com/photos/junglearctic/
Slide 22 (utopia) – http://www.flickr.com/photos/burningmax/
Slide 30 (Thierry) – https://twitter.com/AFC_Fisher/
Slide 36 (love note) – http://www.edawax.de/wp-content/uploads/2013/01/Metadata_love250.jpg
Thanks to Sarah Callaghan, PREPARDE, for the Rosse example
This work is licensed under the
Creative Commons Attribution
2.5 UK: Scotland License.
For more about DCC services see www.dcc.ac.uk
or follow us on twitter @digitalcuration and #ukdcc
Martin Donnelly
Digital Curation Centre
University of Edinburgh
martin.donnelly@ed.ac.uk
@mkdDCC