Tales from the Keepers Registry: Dr Who and the Scholarly Record

Association of Subscription Agents & Intermediaries

ASA ANNUAL CONFERENCE 2014
24-25 February 2014

Tales from the Keepers Registry
Dr Who and the Scholarly Record
Peter Burnhill
EDINA, University of Edinburgh, UK
http://creativecommons.org/licenses/by/3.0/

Overview
1. “Who does forever?” : A Registry of Keepers
Who is looking after e-journals with archival intent?

2. Dr Who and the Scholarly Record
Time Travel for Scholarly Web

3. Evidence from the Keepers Registry
Statistics on who is looking after what, & what is at risk

Some Consequences of Web
• Essentials of supply chain have changed
• licensed to access, not sale of content

• Libraries no longer take physical custody of much
“The Library [Committee], which
key content made up of librarians and
is
• online academics,
remotely, not on-shelf locally

• Role of libraries as reassurance about
… wants trusted keepers of information
long-term disrupted
and culture has beenpreservation before
confirming a University policy
– Need assurance of continuity of access
of goinge-only.”
• of all content for future generations

from email sent by a the licence
• of the back copies, post-cancellation of big UK Library

• Does this mean that the Scholarly Record is at risk?

1. “Who does forever?”
Many reports over past 10 years highlighted risks
•

„digital decay‟: format obsolescence & bit rot

and warned against single points of failure:
•
•

natural disasters (earthquake, fire and flood)
human folly (criminal and political action): hacking

+ risks with commercial events in the publisher/supply chain

Some early archiving initiatives emerged …
•eDepotat KoninklijkeBibliotheek
• international significance(Elsevier &Kluwer)

•the LOCKSS project at Stanford University
• from which came CLOCKSS [as library/publisher „dark archive‟]

•the electronic-archiving initiative at JSTOR
• from which came Portico[as service provider]

A „global challenge‟: trans-national action
UK.BL 10%
Netherlands
& Germany:
c. 4.5% each

„hidden‟ e-journals:
low % ISSN

US.LoC 20%

Brazil 4%

%age of the 113,000 ISSN issued for e-serials

Researchers (and therefore libraries) in any one country
are dependent upon content written and published
in countries other than their own

A Variety of „Archiving Organisations‟
① web-scale

not-for-profit

archiving agencies

e.g. CLOCKSS Archive & Portico
② national libraries (with legal deposit in mind)
e.g. e-Depot (Netherlands); British Library;DnB etc
③

research libraries: consortia & specialist centres
e.g. Global LOCKSS Network, HathiTrust,
Scholars Portal, Archaeology Data Service
Disclaimer:
University of Edinburgh is a CLOCKSS Node & Board Member:
Jisc supports UK LOCKSS Alliance

How can we know who is looking after what & how?
(and uncover what is still at risk)
SERVICES: user
requirements
E-J Preservation Registry Service

Data
dependency

E-Journal
Preservation
Registry

(b)

The Keepers Registry,
product of Jisc-funded
PEPRS Project
(EDINA & the ISSN IC)
METADATA
on preservation action

(a)

METADATA
on extant e-journals

ISSN Register at heart
of the Data Model;
ISSN-L as kernel field

ISSN
Register

(Taken from Figure 1 in reference paper in Serials, March 2009)

Digital Preservation
Agencies
e.g. CLOCKSS, Portico; BL, KB;
UK LOCKSS Alliance etc.

How can we know who is looking after what & how?
(and uncover what is still at risk)
SERVICES: user
requirements
E-J Preservation Registry Service

Data
dependency

E-Journal
Preservation
Registry

(b)

The Keepers Registry,
product of Jisc-funded
PEPRS Project
(EDINA & the ISSN IC)
METADATA
on preservation action

(a)

Look forward to
ISNI for publisher
as kernel field
ISSN Register at heart
of the Data Model;
ISSN-L as kernel field

METADATA
on extant e-journals

ISSN
Register

(Taken from Figure 1 in reference paper in Serials, March 2009)

Digital Preservation
Agencies
e.g. CLOCKSS, Portico; BL, KB;
UK LOCKSS Alliance etc.

Many archiving organisations is a Good Thing 

“Digital information is best preserved by replicating it at
multiple archives run by autonomous organizations”
B. Cooper and H. Garcia-Molina (2002)

Now have a global Registry of e-journal archiving

… to discover who is looking after what
Enter title
or ISSN

to search across metadata
reported by leading
archiving organisations
*news*
Library of Congress has now joined the Keepers Registry
[& have high hopes for some others …]

… and discover details of its „archival status‟
This e-journal is being archived
by 5 archiving agencies …

… but coverage
of volumes is
partial & patchy

Example search: „Origins of Life’
11

Overview: Time for Part 2

2. Dr Who and the Scholarly Record
(Time Travel for Scholarly Web)
• ‘Reference Rot’: When what was referenced and cited
ceases to say the same thing, or ‘has ceased to be’
http://www.snorgtees.com/this-parrot-has-ceased-to-be

The „reference rot‟ problem definition
Investigating Reference Rot in Web-Based Scholarly Communication

1. http:// link to a resource no longer works
•

Link rot

2. The citation is inadequate
•

Not robust over time

3. The content referenced at the end of the link
a) has evolved,
b) changed dramatically,

c) disappeared completely.
http://hiberlink.org #hiberlink

Hiberlink Project: Andrew W. Mellon Foundation

Partners
• Los Alamos National Laboratory: Research Library
•

Martin Klein, Robert Sanderson, Herbert Van de Sompel

• University of Edinburgh: EDINA&Language Technology Group
•

Peter Burnhill, Neil Mayo, Muriel Mewissen, Christine Rees, Tim Stickland,
Richard Wincewicz&Beatrice Alex, Claire Grover, Richard Tobin, Ke „Adam‟ Zhou

Acknowledgments
• Primary datasets: arXiv, Chesapeake Project, Elsevier, PubMed Central,
PLoS, … Planning on large-scale investigation (looking for more …)
• Secondary datasets: Ex Libris, MS Academic, SerialsSolutions
• Technology support: CrossRef Labs, CrossRef Prospect, Elsevier
• Liaisons: archive.is, CrossRef, Internet Archive, Old Dominion University
Web Science & Digital Library Research Group, perma.cc


Hiberlink Project: Four work packages
1. Problem Quantification. text mining of vast corpus of
scholarly literature to uncover references to web resource (URIs);
using Memento; determine availability on live web and in archives.
2. Archival Solution Infrastructure. Prototyping proactive, web-centric archiving approaches mechanisms for archiving
cited web resources at the point of use or publication.
3. Temporal Reference Solutions. Prototyping new methods of
citation to enable creation of precise & actionable time-specific
references.
4. Dissemination and Outreach. Raising awareness of the
challenges at the heart of digital scholarly communication.



References in Web-Based Scholarly
Communication
References to other online
scholarly works
Link Rot

DOI, HTTP version of DOI

Content
Decay

References to online
resources on the „wider Web‟

Fixity of content
Archiving: CLoCKSS,
LoCKSS, Portico…
(Keepers Registry)

This is becoming
understood but issues, see

This is unexplored, so
to be Hiberlink focus

David Rosenthal blog post http://blog.dshr.org/2013/11/patio-perspectives-at-anadp-ii.html

Articles Increasingly link to online resources
on the „wider Web‟

URIs extracted from PubMed papers – links to Web at Large resources

Quantifying the extent of „Reference Rot‟ – Early Results

Using: PubMed Central Corpus 01/1997 - 12/2012
•
•
•
•

Articles processed:
Articles that contain links (URIs) to „Web at Large‟ :
Number of references to „Web at Large‟ URIs:
Unique referenced Web at Large URIs:

494,785
176,527
557,432
327,782

Percentage Exists & Archived Referenced URIs
31.2%

Exists & Archived
!Exists & Archived
Exists & !Archived
!Exists & !Archived

16.8%

11.3%

40.7%

31%
11%
41%
17%

are available & safe
can be retrieved
at risk
are lost

Thoughts on How to Address Content Decay
Who is not selling defective goods?
• Remedy: Pro-active approach to trigger web
archiving when web-based content is referenced
in scholarly work:
– By authors
• during note taking, authoring, when submitting

– By publication platforms
• During submission, editing, acceptance, issue

+ Tool with Temporal Context for Links
• Memento for Chrome is an application that uses Original URI-R and
dates to access Mementos in various web archives

Memento Time Travel for Chrome

http://bit.ly/memento-for-chrome

BackTo The Overview - Part 3

E-journals should be easy
– right?

… but is the e-journals
problem is being solved?
3. Evidence from the
Keepers Registry
Statistics on who is looking
after what, & what is at risk

3. Evidence from the Keepers Registry
a)

21,557 e-serial titles are reported as being
ingested by the 10 Keepers
– organisations with archival intent
– with many „missing volumes and issues‟

b) 113,092 ISSN assigned to „online serials‟ in the
ISSN Register
 Progress with a key indicator: ratio of a/b = 19%
– was 17% at close of 2011 (16,558 / 97,563)

Progress, but far from „job done‟

Do we need to agree a „priority list‟ of titles?
1. Should we only be interested in the c.30,000 „peer-reviewed‟
scholarly journals? [Ulrich‟s]

2. Do we look only at on what individual libraries list?
– In 2012 we checked „archival status‟ for 3 large university libraries
c.75%

„at risk‟
c.11%
held by
3 or more

• Two key indicators: %age (& number) of titles that are „at risk of loss‟
%age (& number) titles that are ‘preserved by 3 or more Keepers’.

1. Should we ask the audience?
•

The researchers and students who read online serials

Looking from the user‟s point of view …
… with usage logs for the UK OpenURL Router
• 10.4m full text requests in 2012; ISSN-L to de-duplicate ISSN
• 53,311 online titles requested by researchers & student from 108/160+
Analysis using the Keepers Registry:

• Only 15% (7,862) are being kept by 3+ Keepers
• Over two thirds (68%) held by none
 36,326 titles „at risk‟ of loss 

So „preservation‟ (or lack of it) is still a real
and present problem!

Good News & Main Challenge?
Good news?
• Most of the big publishers engage with archiving initiatives
– typically CLOCKSS, e-Depot and Portico.

• Are those titles, volumes & issues actually being archived?

Main challenge?
• Long tail of smaller publishers - regardless of business model.
• Everyone in the audience should check whether they are
participating in at least one preservation approach?

• Role of Agents, who arrange subscriptions with those
small publishers?
– Or only role of national libraries & research library consortia?

Choice of future with 2020 Vision
•

Best Case scenario for ASA 2020
– Libraries, Agents & Publishers have acted to reduce that
alarming 80% figure to near to zero 
– They have ensured that all the e-journal content used by
their researchers in 2013 has been preserved and can
be successfully used in 2020, and assuredly beyond. 

•

Worst Case scenario for ASA 2020
– Libraries, Agents & Publishers have failed to act 
– Important literature has been lost 
– Citizens & scholars complain of neglect!

The Keepers Registry: Actionable Evidence
Sidebar note on monitoring their progress …
1. To assist publishers „do the right thing‟
–

A showcase for the real heroes: the archiving organisations

–

Means to check what content is being reported as archived

–

Provide libraries, publishers & archiving organisations with lists of
titles that seem to be at risk of loss

2. To keep a close focusBreaking News:
on volumes & issues

Need New release (end of Q12014) Members Area:
for Publishers & Libraries to make sure all issued content is
being kept safe
Upload a list of ISSNs& get back archival status of Titles
3. To assist collaboration between Keepers: „a safe places network
Access to API, to report archival status on 3rd Party websites
–

4. If it is worth preserving, it really should have an identifier

Gentle Wake-up Call to Ensure Continuity of Access
‘Go Smell The Coffee’

#hiberlink
http://thekeepers.blogs.edina.ac.uk/

http://thekeepers.orghttp://hiberlink.org/

Ask a librarian in 2020: 3 possible answers
1. "Yes, we have it (we've checked recently, both in the
catalogue and in actuality), and you can access it now"
2. "No, but we know some body that does (we trust),
– so we can point you to (or arrange access to) it now/soon-ish"

3. "Sorry, we don't know …
- perhaps nobody has it

- it may be lost forever, altho' perhaps somebody somewhere ...”

- That was true for the print world
- Unfortunately, unless we do something now, the 3rd answer
could become the common one for a lot of e-journal content

Sidebar note on National Libraries
Should we wait upon Legal Deposit?
– 94% of libraries have some form of legal deposit for print.

• Only 44% national libraries had legislation in 2011 for e-books or
e-journals; expected to rise to 58% by June 2012.
from presentation, CENL 2011 Survey by Lynne Brindley
to CDNL Annual Meeting Puerto Rico, 15/8/11

• Only 27%[expected to rise to 37% by June 2012] actually ingesting via legal
deposit
 Total national libraries collecting = those 14 via legal deposit
+ 9 by other means (Netherlands, UK/BL, Switzerland voluntary deposit)
 Only KB e-Depot, BL, NSLC (+ LoC) in The Keepers Registry
 Only when the other 19 join will all know about their activity

 Key point is not about call for „legal deposit‟ but that on its
own it is taking too much time

Tales from the Keepers Registry: Dr Who and the Scholarly Record

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Tales from the Keepers Registry: Dr Who and the Scholarly Record

Semelhante a Tales from the Keepers Registry: Dr Who and the Scholarly Record (20)

Mais de EDINA, University of Edinburgh

Mais de EDINA, University of Edinburgh (20)

Último

Último (20)

Tales from the Keepers Registry: Dr Who and the Scholarly Record

Notas do Editor