1) The document discusses the issues caused by duplicate bibliographic records in a consortial catalog, such as increased workload and costs for database maintenance.
2) It provides statistics on duplicate records for several authors before and after consolidation in the PINES catalog.
3) The document also discusses patron feedback expressing confusion over multiple listings for the same title and issues that can arise from inconsistencies in record creation and data quality.
1. A Unit of of the University System of Georgia
A Unit the University System of Georgia
2. Bibliographic database integrity in a
consortial environment
Evergreen International Conference
May 21, 2009
• Elaine Hardy
• PINES Bibliographic Projects and Metadata Manager
6. GPLS Intern’s statistics
Before After
Alexander McCall Smith 245 172
Grace Livingston Hill 1119 549
Mary Higgins Clark 771 386
Magic School Bus (print) 554 218
Danielle Steel 1235 718
7. Duplicate records cause
– “User information overload”
– “Reduced system efficiency”
– “Low cataloging productivity”
– “Increased cost for database maintenance”
Sitas and Kapidakis, 2008
“There is no question that merging such records is vital to
effective user services in a cooperative environment.”
Tennant, 2002
8. What patrons think ---
• wish that you would list the most current book first and have only
one entry for each book instead of showing multiple entries.
Sometimes I have to look through 50 - 100 entries to see 20 books
and the newest book by the author is entry 80. There should be a
way to stream line this procedure.
• Consolidate entries for the same title. There are numerous entries
on some titles beyond the breakdown of hard cover, PB, large
print,audio, etc.”
• Why so many listings for the same books--that's confusing
• When I look up a book, many times I get two pages all of the same
title with the same cover. It confuses me because I see that my
library system doesn't have it, but if I scroll down...Whoops! We do
have it. What is that all about? It sucks.
• Creating a standard for the way an items information is entered.
Some books only have half the title entered and this can create
problems when searching for specific materials
14. • Big library does not equal good data
• A large library does not always follow rules and adhere
to standards
• Size can they cut corners for “efficiency”
• Local notes don’t belong in subject fields
• Make the time to check your data
• Publishers are not catalogers’ friends
17. Legacy system characteristics
• All were IBM based systems
• No tags, thus no definition of fields
• All fields fixed length
– allotted so many characters for each field
• No standards
– Not required to enter pagination or publisher
• Extraction of data a problem
– had to count in to find beginning of next field
– In many cases, had to supply a pub date. One lib has 1901 as a
pub date on most of their extracted records
30. Lessons learned
• Big library does not equal good data
• Make the time to check your data
• Publishers are not catalogers’ friends
• Be careful about CIPs with no description and records with multiple ISBNs
• Come up with realistic match when records are same but information differs
• One library will not have the same good records across all their collections
– may have good print but bad AV
• LOTS of programming if multiple sources of records.
• No matter -- budget, personnel, time -- is as important as concentrating on
clean-up prior to migration
• Be as specific as possible with vendors, test and have a penalty phase.
• Have the right people in place from day one