The State Library of North Carolina is legally mandated to facilitate public access to publications issued by State agencies and manage the depository system. With the increase of born digital documents and the demand for electronic access, the State Library needed to find a way to support the systematic collection, preservation, and access to state information in digital formats. Focusing on finding repository solutions for digital state publications and based on comparisons among leading products, the library found CONTENTdm to be the best overall fit. With the continuing need to create MARC records for digital documents, CONTENTdm offered functionality to create compound objects for single documents as well as structured serials, providing one permanent URL either way. Working with born digital and digitized serials still presents certain challenges with workflows, providing access, and compensating for the differences between born digital and digitized formats. This presentation discusses the ups and downs of managing digital serials in CONTENTdm, how we do it, and why we do it from the perspective of a mid-size state government library.
Francesca Francis
Assistant State Documents Cataloger, State Library of North Carolina
Raleigh, NC
I assist in the cataloging of original publications created by the state agencies of North Carolina, metadata/class schema/authority creation and management, and catalog problem-solving with a small side of reference desk work at the Government & Heritage Library. Prior to my time at the State Library, I worked part-time on a reference desk in the Cumberland County library system. While living in the DC area, I served as the catalog librarian for the U.S. Census Bureau and worked on a shelf list project with the U.S. GPO. I got my start in the library field when I was selected to work as the cataloging assistant at the law library of Catholic University while earning my MLS. As you may be able to guess, I kind of have a thing for cataloging and providing access to information, whether I'm on deck or in the control room...although I kind of have a penchant for playing the "[wo]man behind the curtain."
Eve Grunberg
Documents Cataloger, State Library of North Carolina
I have been working at the State Library of North Carolina as a documents cataloger since 2006. I am responsible of cataloging everything published by state agencies regardless of the format. Working with differnet publications has given me a great deal of knowledge and experience with MARC cataloging rules and standards, different classification schemas, authority work, Library of Congress and OCLC cataloging tools, metadata standards, and the creation of controlled vocabularies.
Sailing the Digital Serial Seas: Charting a New Course with CONTENTdm
1. Sailing the Digital Serial Seas:
Charting a New Course with
CONTENTdm
Eve Grünberg
Francesca Francis
State Library of North Carolina
NASIG 2013
2. Mandate of the Library and advent of digital
publications
Finding a digital content (or asset) management
system
Different types of digital serials and how we work
with them
Challenges and expectations
“Sailing” ahead
The Manifest
3. The Library’s mandate: Manage and preserve state
publications, respectively, in all formats for
permanent public access and maintain a permanent
depository collection of all printed state documents
North Carolina State Documents Depository System
(est. 1987) = Clearinghouse + depository libraries
Identify, collect, process, distribute, provide access
Background
4. 2003 – 60%+ state government information born
digital
Need a way to manage digital content
Digital Management Program (DIMP) study
CONTENTdm
Qualified Dublin Core
Automated metadata crosswalking
Customization
Digital Archive for preservation
Digital pubs, ho!
5. Trial period: October 2006-February 2007
We bought and designed the ship
(but are renting the dock…)
The (pre-)maiden voyage
6. Readying the ship
Guidelines:
Digital Collection Development
Digitization Priorities
General Metadata
Metadata for Serials
Preservation Metadata
Preservation and File Format
http://digital.ncdcr.gov/cdm/about
7. Connexion Digital Import (i.e. CDI) = MARC
(Connexion) Qualified Dublin Core (CONTENTdm)
Serials
Multiple digital file upload
Single reference URL
Structure (“parent” & “children”)
Handling the cargo
8. How received:
PDF (from agencies or converted)
Routine searching
How inventoried:
Digital database
PinPoint Hash
CINCH does both!
http://cinch.nclive.org/Cinch/CINCHdocumentation.pdf
Final prep:
Renaming (using guidelines; unique, for preservation)
Move to cataloging
Cargo type #1: Born digital
9. Start in MARC
CDI feature – attach the digital object
Crosswalk from MARC to Qualified Dublin Core –
digital object (w/ metadata) “drops” into
CONTENTdm
Serials editing
Edit metadata of parent approve index
Edit metadata of children approve index
Transporting from Connexion to
CONTENTdm
14. Library digitizes many of its own publications
(including serials)
Identify publications digitized
In house
Internet Archive
Cargo type #2: Digitized serials
16. Large serial files are treated as monographs (serial
structure not created)
Digitized files typically larger
Some documents are thousands of pages long
Loading times are too long, can freeze
Solution:
Load as single items (individual compound objects w/ full
metadata)
Add extra metadata field (serial title) to collocate
Issue identifying information (i.e. year) added to title
Special link to search results of all issues in serial title
Oversized cargo
18. Serial structure (parent and children) also used for
collection level records
Collections:
Have collective research value, but may not be worth
cataloging alone (i.e. ephemeral)
Share subject and/or agency information, other natural
relationships
May or may not have a collective title
Miscellaneous cargo
20. Anchoring serials during title changes
Digital materials tied to records
Throwing traditional title changes overboard
New records for title changes as usual in MARC
Single record approach in digital collection
Add all OCLC numbers, all titles (in other title field) on all
records associated
Same link on all MARC records associated
Exceptions: major changes to serial (i.e. title
merged/separated, agency shift, content/focus)
21. Our experience and feedback has shown us that it is
very difficult for the patrons to see the relationships
and understand the records if we use a multiple
records approach to show the serial title changes
When you think about your physical collection, the
serials sit together seamlessly on the shelf regardless
of the title change; and so by using a one-record
approach, digital serials can “sit together” in the
digital collection
WHY???
23. Creating serial records is a multi-step and complex
process
The index runs in the background…or does it?
Approving large files (the sea monsters of the
collection)
Coordinating workflows with others
Deleting/replacing serial issues, or: whoops, we broke
the structure
Turning monographs into serials
Riding the waves: Challenges with
CONTENTdm
24. Smoother-running approval and indexing
Ability to handle secure and large files like other files
Better search engine: the white whale
Relevant results
Alphabetical/chronological order
How much cargo can this ship hold? Finding the limits
of an “unlimited” collection (and patching the leaks)
Sailing ahead: Our wish list
26. Contact information
Eve Grünberg, State Documents
Cataloger
eve.grunberg@ncdcr.gov
Francesca Francis, Assistant State
Documents Cataloger
francesca.francis@ncdcr.gov
State government information is valuable and widely used by the citizens of North Carolina. The State Library of North Carolina is legally mandated by a General Statute to manage and preserve state publications, respectively, in all formats for permanent public access and maintain a permanent depository collection of all printed state documents. The State Library fulfills this responsibility through the North Carolina State Documents Depository System, established in 1987 by G.S. 125-11. The Depository System consists of the State Publications Clearinghouse, which is responsible for working with state agencies to identify publications, as well as collecting, processing, and distributing state agency publications, and Depository Libraries, which are responsible for providing public access to state agency publications.
The State Publications Clearinghousewasn’t structured and staffed at this time to accommodate born-digital information and support the systematic collection, preservation, and access to state information in digital formats. Users wanted electronic access to state agency publications.Depository librarians were unified in their desire to provide electronic access by having the State Library maintain a digital repository and distribute electronic publications by providing MARC catalog records with reference links to these publications. In 2006, the Digital Information Management Program (DIMP) was formed to focus on finding the best digital repository solution for digital state publications.Preference for Qualified Dublin Core schema and the Library’s lack of cataloging staff would require the Library to find an automated metadata cross-walking tool to streamline the cataloging of these publications.Based on their research, the DIMP had expectations that CONTENTdm would be a simple, inexpensive service for building digitized collections.Research had indicated that CONTENTdm would have an out-of-the-box public interface that allows for -- but does not require -- customization, thus minimizing the need for technical support. In addition, the team expected that CONTENTdm would allow for the storage of digital objects in its database without impact to retrieval performance, provide easily customizable metadata schemas, allow for metadata to be entered remotely, handle multi-part objects, allow for full-text search, and allow for the import/export of data. For preservation functionality and Qualified Dublin Core to MARC crosswalks, it seemed reasonable to use the tool Digital Archive from the same company.
After the trial period from October 2006 to February 2007, which was successful and met our expectations, the State Library subscribed to a hosted level license with CONTENTdm.
We developed our metadata guidelines and workflows, and started adding digital objects to the State Publications Collection.
OCLC Connexion Digital Import (CDI) feature allows us to start with a MARC bibliographic record in WorldCat and upload a file to our CONTENTdm collection, which creates a link in the WorldCat record to the file and crosswalks the MARC record to Qualified Dublin Core metadata in CONTENTdm. The crosswalk is controlled by OCLC, which means certain MARC fields are crosswalked into pre-designated QDC fields.Serials:Allows us to upload multiple digital files at the same time and create a single reference URL. Also, it creates a structure for the serial title, where all the “children” issues are under one “parent” title. When you search for the specific serial title and open the record, you can see full metadata under the title and accompanying issues listed on the sidebar of the title, which have their own metadata and full textTalk about initial appearance of serial structure
The Library receives born digital serials from state agencies via Dropbox and email in PDF format (or that which will be converted to PDF by us). We also perform routine searches on agency websites, searching for documents we don’t already have. The Library developed the Capture, Ingest, & Checksum Tool (CINCH), which is “designed to locate targeted files on the internet and download them in a preservation-ready state. This includes maintaining the files’ integrity by virus checking and repeated checksumming, as well as enhancing the files’ context with metadata extraction.”Once the document files are received, they are processed for cataloging, where they are checked into our database and the original file name and checksum (“thumbprint” for the specific file) are recorded. This metadata is found using PinPoint Hash software and/or CINCH, depending on how the files were received. The files are then renamed using file naming convention guidelines, which were created in house to keep a consistency for all related serial items for archival storage. Once these files are processed, they are moved to folders accessible by cataloging.
Since depository libraries have expressed the need for traditional MARC records as well as Dublin Core records, the cataloging process is begun by creating a MARC record in OCLC Connexion. Once the MARC record is created, the CDI feature is used to attach the digital objects to the MARC record (a minimum of two files are required to create the serial record structure) and crosswalk the MARC metadata into the qualified Dublin Core metadata fields in CONTENTdm, simultaneously dropping a record into CONTENTdm and creating a reference URL to the digital object from the MARC record. In the CONTENTdm Administration module, cataloging edits the metadata of the “parent” (main) record and approves it for indexing. After the initial index, the record is pulled in the CONTENTdm Project Client, where the serial structure can be viewed with “children” records (multiple attached items) branching from the parent.Here, the children records can be accessed and individually edited to reflect each issue’s unique metadata, the minimum requirements for which are determined by the Library’s metadata guidelines.The serial is then sent back into the approval queue, approved, and indexed once more, creating the final product.
Visual slide for CDI feature (Once the MARC record is created, the CDI feature is used to attach the digital objects to the MARC record (a minimum of two files are required to create the serial record structure) and crosswalk the MARC metadata into the qualified Dublin Core metadata fields in CONTENTdm, simultaneously dropping a record into CONTENTdm and creating a reference URL to the digital object from the MARC record)Talk about what happens when you have to wait for a second issue (if it is high importance – create monographic record and later change it to the serial record, if low importance drop to the waiting folder for outreach to get more issues)
This visual slide shows structured serial record in Project Client. Serial items order is oldest on the bottom, newest on the top
Once the serial structure is created, issues are added as they come to the Library. After the preparation process, Library technicians will add additional issues using the Project Client software by pulling up the parent record. Metadata is added using the guidelines, and the technicians send the issues individually to the approval queue. Once issues are examined by cataloging and approved, an index is run, and the parent record and any new issues are pulled into the Project Client together. Here, the items are attached to the parent record, edited to reflect their relationship to the title as above, and sent to the approval queue and final indexing.
Monographic compound structure (sort of similar to serial structure in display)
DIMP creates a list of titles and OCLC records for Internet Archive which corresponds to the materials we are sending. Internet Archive is able to grab additional information from our catalog using Z39.50 protocol. Once Internet Archive has digitized our publications, DIMP uses an Internet Archive download tool (created by Eastern Carolina University and developed by the Library) to grab the objects and metadata file. This and other metadata assessed by DIMP, such as the file paths and preservation metadata for each item, are used to create records for digitized items. The digital objects are pulled into the hard drive from Internet Archive and selected for processing in Project Client. All of the metadata – parent plus any available items – are pulled in through the Project Client as a text file. Once the serial structure is created, new digitized items can simply be pulled in through Project Client individually, as above, with new issues getting uploaded and attached. Oftentimes it will be that a long-running serial will have a mixture of digitized and born digital issues attached to the parent record, in effect documenting the change from analog to digital formats.(example: Symphony stories)
In some cases, digitized (and particularly large born digital) serial files must be treated as monographs structurally. Some state documents can be thousands of pages in length. Because digitized files tend to be larger than born digital documents in general, this can cause problems both on the back end and in the public view, where extra long load times and/or freezing will occur. These larger files are brought in as individual compound objects to prevent such problems. In this case, we make sure that some piece of issue-identifying information, such as year, is visible in the results when a search is conducted. Also, we generate a serial/series title to tie them together under one unified title on the title search. For that we added extra metadata field “Serial Title” for large file serials (for example: North Carolina Public Documents, North Carolina Session Laws, etc.)
Visual slide showing large digital files structured as monographic items in digital collection. Red circled metadata helps to connect those items. Special URL to collocate items are created.
Like the monographically-structured serial, not everything translates easily from the traditional paper library to a digital collection. Another type of “serial” record we create is for our collection level records. We opt for collection level records because we sometimes receive pamphlet/ephemeral type materials from agencies. These can be monographic or serial materials that share subject information to the extent that subject access can be adequately provided with one or more subject headings. They do not merit item-level or minimal-level cataloging, but collectively have a research value. We take advantage of the natural relationships that exist among the items within the collection and capture those relationships in one bibliographic record. In CONTENTdm, collection level records are treated as serial records with the standard serial structure (parent and children). We use the collective title for the parent, and individual titles are identified by issue.In this way, all items remain grouped together as a collection.
Visual slide shows example for previous slide – how collection level record is constructed in digital collection.
The digital presentation of materials as being tied to their records presents a particular challenge when it comes to handling a serial’s (nearly inevitable) title change(s). In traditional cataloging practices, every time a serial changes its title or a different body becomes responsible for its creation, a new record must be created. In the digital world, and through the use of metadata, we’ve tried to be more flexible and considerate of customers’ preferences. When a serial title change occurs and we know this is the same serial published by the same agency, we don’t create new digital records for it. We just continue to add issues to the same parent record regardless of the title change. The record we choose to add issues to is ideally the first title of that serial; however, we tend to acquire newer issues first, and therefore initially create a record for and attach all items to what would be considered a successive record.We don’t completely abandon convention though, as we do create a new MARC record for each title change following traditional cataloging rules and practices. We then link this new MARC record to the metadata record containing all issues in our digital collection. We also add the new OCLC number to the parent record and use “Other Title”, “Title Replaced by” and “Title Replaces” fields to record title changes in this record.
How a serial title change is treated – all titles on one record with additional title data
Creating serial records is a complex and multiple step process. To complete a record we have to approve the title, index, edit the parent and children, approve, and index again. This doesn’t sound like a major problem until you consider the fact that the index, which is supposed to run in the background of our collection and allow us to continue working unbothered, actually prevents us from approving and making other changes to the collection. This, in turn, holds us back from producing records more quickly and plentifully. We also tend to get locked out of approving records in the administrative module when one person is simultaneously approving a large file or conducting a mass approval – another function that should be running behind the scenes. We are also facing difficulty with replacing and/or deleting single serial issues out of a record. On occasion we have to replace an existing serial item with the new one (broken file, etc.). What should be the easiest way – simply deleting the item out of the serial structure – may cause the entire record to come apart. This is because our serial structure (parent and children) is created by attaching two serial issues to the record using CDI and crosswalking it to the digital collection. When we need to delete one child record from the parent record, we need to be sure that the child is not one of the original children that were used to create the initial serial structure. If it is, the serial structure will fall apart and single issues of the serial will float around in the collection like “lost children”.Sometimes a publication that appears to be a monographic item becomes serial instead. This happens a lot with state publications, as many different types of reports are issued – for example, a report which was published once may be published again the next year or after two years. When a title acquires this sort of frequency, we need to turn the single item into a serial, if nothing else so that our patrons would be able to find them easily. We recatalog the monographic record as or derive a new serial record in MARC, reacquire the older file from digital storage, and create a serial structure for this title in the digital collection. This is the sort of flexibility that is necessary as the collection grows and changes.
Over the course of adopting and adapting to CONTENTdm the Library has had its share of positive and negative experiences, from which many lessons have been learned. We have also used this opportunity to take our collective knowledge and experience to look to the future – especially now that we have a better idea of what it is we are looking for (and not looking for) in a collection management system.Aside from smoother-running approval and indexing processes, other items on our wish list include the ability to work better with locked/secure files. Some agencies are not comfortable providing their documents without some security feature on the PDFs. Problems range from the inability to create a thumbnail or pull full text to issues with creating a compound object. Part of this concern can be addressed by working with these agencies to educate them on how the Library handles their documents; but we are also hoping that digital content management technology will become sophisticated enough to deal with secure files.Perhaps one of the more basic desires is to be able to perform a search that results in a completely alphabetical list of publications. In the beginning stages of the collection, this was possible; however, several upgrades later, the function appears to be somewhat broken – seemingly alphabetized search results with non-alphabetized results mixed in between. If it makes finding documents difficult for us, the issue is most likely exponentially frustrating for the end users, rendering the function somewhat useless. The biggest issue for us has been the concept of an “unlimited” collection. Initially we were told that our collections could be unlimited in size, but have found this to only be partially true. After developing our state publications collection as a single collection – including some fairly large documents, such as Session Laws, Public Documents, House and Senate Journals, etc. – the branches managing this collection started running into issues in various parts of the process. It was only recently that we learned that a single collection does indeed have a limit, which is quantified in number of pages. The solution seems to be to split this particular collection into several smaller collections, taking into account leaving room for currently running serials and other elements that might expand pieces of the collection. We are also concerned about the collection running seamlessly as a whole, especially on the public side.