A summary of research on uniquely-held titles in ARL libraries, prepared for discussion at ALA Chief Collection Development Officers meeting, January 2008
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
Measuring Uniqueness in System-wide Book Holdings: Implications for Collection Management
1. RLG Programs
Measuring Uniqueness in
System-wide Book Holdings:
Implications for Collection
Management
Constance Malpas
Program Officer
RLG Programs
2. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
2
This presentation
Summarizes recent data-mining efforts by OCLC
Programs and Research
System-wide sample (Summer 2007 – Spring 2008)
ARL unique print books (Autumn 2007)
Suggests implications for collection managers
Outlines next steps for RLG Programs
An opportunity to discuss what additional
evidence and analysis is needed
3. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
3
What we mean by ‘last copy’
Monographic title uniquely-held by a single
WorldCat contributor
Cf. „single copy‟ repositories, where „last copy‟ is relative
to local/group holdings
May represent a last manifestation, expression or
work
Bibliographic records describe manifestations, not
copies; unique manifestations are the point of departure
for analysis
Some are intrinsically unique; others are
rendered unique by erosion of system-wide
holdings
Historical data may help document increased copy or
work-level availability, but weren‟t included in the
studies presented here
4. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
4
Distribution of uniquely-held print books
in ARL member institutions
0
100,000
200,000
300,000
400,000
500,000
600,000
700,000
LC
YaleAlberta
C
olum
bia
U
C
hicago
U
CLA
M
cG
ill
Penn
U
vaH
aw
aii
U
M
d
San
Diego
SU
NY
BuffaloR
utgers
D
artm
outh
N
otre
Dam
eO
regonG
A
Tech
D
elaware
Florida
State
So
IllinoisAlabam
a
Irvine
G
W
U
W
ayne
State
York
Virginia
TechW
A
State
C
ase
W
esternM
anitobaH
ow
ard
ARL member institution
Uniquetitles
Distribution of wealth: ARL unique books
A classic Pareto distribution
20% of the population holds >75% of unique titles
Median institutional
holdings = 19K titles
institutional excellence?
(or) a “network effect?”
N = 6.95 M titles
5. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
5
Why focus on uniquely-held titles?
“Scarcity is common”
limited redundancy in holdings = limited preservation
guarantee, limited opportunity to create economies of scale by
aggregating supply
Research institutions bear the brunt of responsibility for
long-term preservation and access of unique titles
Academic and independent research libraries hold up to 70%
of aggregate unique print book collection
Continuing costs of managing (storing, providing access to)
print collections are high; use is generally declining
Space pressure on physical plant (on-campus, remote) is high;
understanding distribution and characteristics of unique
holdings can inform decisions about disposition of physical
collection
Increased attention to stewardship of special collections
ARL SCWG, CLIR, LC Task Force on Bibliographic Control –
new attention to what constitutes „special‟ collections,
appropriate standards of care, modes and metrics of use
6. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
6
Challenges
Identification requires group / network view of holdings
WorldCat provides a reasonably proxy for system-wide
collection
Some materials (MSS, theses and dissertations, etc.) are
intrinsically unique; not all can be algorithmically identified
in MARC records
hybrid approach combines computational and manual
analysis of bibliographic data
Sparse bibliographic records impede efficient work/title
matching, may introduce spurious measure of uniqueness
external sources (including Google) sometimes helpful in
filling gaps
Non-English titles (especially transliterated non-roman
scripts) are especially difficult to match
we resisted the temptation to exclude these
7. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
7
Study I: System-wide Sampling
250 randomly selected, uniquely-held titles
Limited to printed books (including theses) published
before 2005
English-language cataloging only
Iterative re-sampling required to fill gaps
Independently reviewed by three project staff
Level of uniqueness
Material type
Results periodically collated for group analysis
Compare results of individual analysis for consistency
Seek consensus on difficult cases – relatively few of
these
Re-sample as necessary to fill gaps
White paper anticipated March 2008
8. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
8
Study II: ARL uniquely-held books
Ad hoc analysis by RLG Programs, prompted by IMLS
Connecting to Collections grant announcement
How might the existing evidence base be used to focus
regional preservation investments?
Based on January 2007 snapshot of WorldCat database:
13M records for titles (6.95M print books) uniquely held by
ARL institutions; 300+ OCLC symbols; 123 institutions
Iterative analysis examined relative impact of
theses/dissertations and recent imprints on system-wide
uniqueness; regional and institutional distribution of holdings
Findings shared with ARL Special Collections Working Group
(October 2007) and selected RLG partner institutions (UC;
CIC; ReCAP; Harvard; ASU; NYU)
Heritage Preservation willing to share Heritage Health survey
data for cross-tabulation on as-needed basis
9. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
9
Limitations
Current studies limited to printed books –
excludes serials, special collections; only a partial
measure of uniqueness in system-wide collection
Incomplete representation of world book
collection; for non-English titles especially,
uniqueness of North American holdings is only
relative
Cataloging backlogs of up to 5 years mean that
holdings for recent acquisitions are imperfectly
reflected
Incomplete coverage of rare books and special
collections prior to (ongoing) integration of RLG
Union Catalog
10. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
10
Our findings – distribution of unique titles
Research and academic libraries hold >70% of
aggregate unique print book collection
while value and utility of these holdings may be widely
distributed across the library community, holdings are
concentrated at institutions with a research / teaching /
learning mandate
limited data on aggregate use, sources of demand
Institutional distribution of unique holdings is
highly skewed, with a handful of libraries holding
a majority share of collective assets
ARL unique print book holdings range from 400 – 600K
titles per institution; median holdings = 19K titles
generally, institutions with large collections hold more
unique materials – but absolute size of collection is not
an indicator of relative uniqueness
11. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
11
Based on a randomly selected sample of 250 uniquely-held print
book titles in WorldCat (Jan. 2007)
Unique titles by library type
50%
27%
6%
6%
4%
4% 2% 1%
ARL
Academic (non-ARL)
Gov't
State and National
Special
Public
Unknown
Networks
12. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
12
Distribution of Unique Print Books in ARL Member Institutions
0
100000
200000
300000
400000
500000
600000
700000
LCM
ichigan
N
ALU
W
iscU
rbanaU
W
ash
Em
ory
Pitt
N
ew
M
exicoO
klahom
a
U
tah
KentState
D
avis
Florida
State
VanderbiltW
U
S
TLC
oloradoU
m
ass
Texas
TechM
cM
asterQ
ueen's
P
E
P
National libraries and institutions with deep
collections and an aggressive approach to
collecting and cataloging new monographs –
LC, Harvard, Libraries & Archives Canada –
have an exceptional range of unique holdings
Unique Print Books in ARL Institutions
CRL’s focus on theses and dissertations is
evident – most uniqueness is attributable
to these holdings
Institutions with
younger collections,
actively seeking to
increase scope of
coverage - NCSU,
Temple – are building
uniqueness in new
titles
13. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
13
Content-type Distributions: CRL and ARL
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Center for
Research Libraries
ARL aggregate
collection
Unique theses
Unique print books pub'd
2000 and after
Unique print books pub'd
before 2000
Intrinsically unique
content, “only copies”
May include “first copies”
in cataloging queue;
uniqueness subject to
rapid erosion
14. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
14
Our findings – levels of uniqueness
~60% of titles represent unique works
Ex: Report and recommendation … on a proposed loan … equivalent
to US$70 million to the … Islamic Republic of Pakistan for a power
plant efficiency improvement project (1987) – World Bank report held
by George Washington University
~15% of titles represent unique manifestations
Ex. Gallipolis … an account of the French five hundred and of the town
they established … compiled by Workers of the Writers' program of the
Work projects administration (1940) – microform pamphlet held by
Yale University; related manifestations at 40 libraries
~5% of titles represent unique expressions
Ex: E.J. Luck. A pedigree of the families Luck, Lock and Lee (1908) –
book held by Masssanutten Regional Library, VA; similar title (Luck,
Lock) by same author, pub‟d in 1900, held at LC
~20% of titles not unambiguously unique: duplicate or near-
duplicate records can be found in WorldCat
Ex: K. Kimura. Edo no akebono (1956) – book held by Harvard
Yenching; apparent duplicate (cataloged with original scripts) held by
Waseda, Yale
15. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
15
Our findings – content characterization
Material types
~35% are books (>50pp)
most appear to be non-fiction titles, less likely to have
additional manifestations
~20% theses and dissertations
many at Master‟s level – unlikely to be held beyond issuing
institution
~15% government documents
mostly federal and state, may be duplicated in depositories
~10% pamphlets
unique content, but rarely useful in isolation
~10% analytics; single articles or issues bound as a
separate volume
non-unique content
<5% early imprints
lost treasures?
Small numbers of by-laws, scripts, legal briefs,
minutes, etc.
16. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
16
Implications
Institutions with significant unique holdings may benefit
from „splitting the difference‟ between unique works and
manifestations
unique manifestations and analytics should be judged with an
eye to provenance history; unless they contribute to local
distinctiveness, immediate action may not be warranted
A preliminary sort by material type may help guide local
decision-making regarding the physical disposition of
unique holdings
pamphlets and technical reports may be candidates for
cataloging enhancement and storage transfer; books may be
short-listed for digitization and/or transfer to special
collections
Institutions with smaller unique print book collections may
benefit from collective action to aggregate supply
(through effective disclosure) and demand (through
special resource-sharing and digitization initiatives) around
specific topical and disciplinary interests
local collections gain in significance when presented in context
with related holdings
17. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
17
Recommendations
Adopt a nuanced understanding of „relative uniqueness‟ when
assessing local holdings
Unique manifestations may not represent unique
intellectual content, but may have other value
As artifacts special collections
As a networked resource increased availability
Unique works may gain relevance and value when
presented as part of a larger disciplinary or topical
collection
Theses and dissertations may benefit from special discovery
tools, integration in local scholarly communications initiatives
Pamphlets and technical reports may be virtually aggregated
for specific communities of use
Maximize disclosure of unique holdings to increase their
impact and value
Focus on use and utility of unique holdings to ensure
long-term preservation, enduring value to parent institution
18. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
18
What’s Next . . .
Holdings validation study will examine a sample
of scarcely-held (<5 copies) US imprints in
North-American research libraries
Compare current WorldCat holdings to historical holdings
– looking for signs of collection erosion; elimination of
local backlogs (diminishing uniqueness)
Compare local holdings to current WorldCat holdings –
location changes/storage transfers, withdrawals
Assess impact of local preservation actions on system-
wide holdings (availability, condition) and potential
value of „full disclosure‟
Collaborative effort with RLG partner institutions
anticipated Spring/Summer 2008
19. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
19
Some closing observations
Opportunities
Large research libraries hold a wealth of unique materials –
long tail resources with broad potential audience
Aggregated bibliographic data supports programmatic
analysis and enrichment – work-level clustering,
identification of duplicates
Largest institutions, with enduring commitments to
retention and access, hold majority of potential „at risk‟
titles
Challenges
Libraries ill-equipped to measure potential demand for
unique holdings
Technical and social infrastructure for aggregating supply is
lacking
University presses are potential distribution partners, but
alliances are weak
20. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
20
Questions, Comments?
„Managing the Collective Collection‟ work agenda
Data-mining for management intelligence
Shared print collections
http://www.oclc.org/programs/ourwork/collectivecoll
Midwinter RLG Update Session
1:30-3:30
Marriott 302-304
Contact:
Constance Malpas
Program Officer
malpasc@oclc.org
21. RLG Programs Managing Last Copies
CCDO Meeting, ALA Midwinter – 12 January 2008
21
N=5.9M titles
Median institutional holdings =96k unique titles