Wake Forest University has begun contributing digital collections to the Digital Public Library of America (DPLA) via the North Carolina Digital Heritage Center Service Hub. Each month, the North Carolina Digital Heritage Center aggregates OAI-PMH feeds of digital collections of contributing North Carolina institutions, and the DPLA in turn harvests this aggregation. Wake Forest is using participation in the DPLA as an opportunity to assess and clean up its metadata. Borrowing the principal of iterative and incremental development from the agile software development community, each monthly harvest is treated as a four-week development cycle during which small but meaningful improvements to metadata are identified and implemented (e.g. revising rights statement or populating the dc.date.created field). In contrast to a model that delivers a finished product only at the end of a project timeline, this approach allows the organization to immediately reap the benefits of participation in the DPLA, such as increased referrals to digital materials from the DPLA site and API. A presentation at the the Coalition for Networked Information 2014 Spring Membership Meeting.
21. Increased traffic
Opportunity to continually
evaluate & improve our
metadata
Ability to market our
digital collections as data
Ability to engage our
public in new ways
DPLA API workshop
DPLA Hackfest
PARTICIPATION IN THE DPLA OFFERS US
23. SLIDE 2: Dan Cohen. “The Digital Public Library of America:
Coming Together.”
http://www.dancohen.org/2012/10/16/the-digital-public-
library-of-america-coming-together.
SLIDE 4: Ed Summers. “The DPLA as a generative platform.”
http://inkdroid.org/journal/2011/05/25/the-dpla-as-a-
generative-platform/.
WORKS MENTIONED
As the Digital Initiatives Librarian at Wake Forest University, I oversee the digitization of special collections and liaise with faculty interesting in pursuing digital humanities research and pedagogy. To me, these two roles bridge two ends of the scholarship lifecycle. On one end, I provide access to raw materials for humanistic inquiry in the form of digitized special collections. And on the other end of the lifecycle, I consult with faculty who are incorporating digital humanities components into their teaching and research. With that combination of responsibilities you can imagine that one reason the DPLA is so attractive is that it helps me to market digitized special collections as potential digital humanities corpora. My personal experience with the DPLA is that I have followed its development with interest since I entered library school, and I attended the DPLA Plenary in Washington, DC in October 2011. When I began at Wake, I was thrilled to learn that the North Carolina Digital Heritage Center would become a service hub and that Wake Forest would have an opportunity to contribute collections to the DPLA.
Fairly early in the process of envisioning the DPLA, Dan Cohen used the metaphor of a pond feeding a lake feeding an ocean to characterize the planned technical infrastructure for the DPLA. In a blog post he wrote, “one can think of this initial set of materials […] as content from local ponds—small libraries, archives, museums, and historic sites—sent through streams to lakes—state digital libraries […] —and then through rivers to the ocean—the DPLA” (http://www.dancohen.org/2012/10/16/the-digital-public-library-of-america-coming-together). And not to extend the metaphor too far, but this metaphor has certainly influenced Wake Forest’s understanding of our niche within a larger ecology.
I’m shamelessly using the ponds -> lakes -> oceans metaphor to structure my presentation, and I’ll spend a little bit of time on the ocean and the lake, but for the most part I’ll concentrate on Wake Forest’s perspective as a pond or contributing institution.
Many of you probably recognize this fabulous doodle from the DPLA plenary in October 2011, but the purpose statement of the DPLA that most resonates with me was actually made a little bit earlier in May 2011 by Ed Summers, who attended an expert working group meeting of the DPLA. Ed called attention to the phrase “generative platform for unspecified future uses” in the description of that working group and foregrounded it in his reflection about what the DPLA could be. By working with the grain of the Web and deep linking between content and from content to contributing institutions, the DPLA facilitates deep research and knowledge creation. But despite this serious purpose, one of the primary use qualities of the DPLA, whether browsing dp.la or interacting with one of the apps that has been built on top of its API, is sheer fun. We at Wake Forest want to expose our materials to the world, and we want to do it in a way that aligns with the Web and with this concept of a generative platform that enables people to build things.
Traveling backwards from the ocean to the lake, Nick Graham and Lisa Gregory of the North Carolina Digital Heritage Center shared information about their program with me, and I’ll speak briefly to their role as a DPLA service hub in order to contextualize Wake Forest’s participation as a contributing institution.
The North Carolina Digital Heritage Center is an ongoing LSTA-funded program that provides digitization and digital publishing services to cultural heritage institutions across the state of North Carolina. When an institution partners with the Digital Heritage Center to digitize select materials, those materials are also published on the Digital Heritage Center’s site, digitalnc.org, whose home page is pictured here. Because of its strong existing partnerships with academic libraries, public libraries, archives, museums, and historical societies in North Carolina, the Digital Heritage Center was a natural fit as a service hub for the DPLA. In Fall 2013, the North Carolina Digital Heritage Center became a service hub, a member of the second cohort of service hubs following the DPLA’s launch in April 2013. Last October, the Digital Heritage Center invited digital collection managers to a full-day informational meeting in Greensboro, NC with Emily Gore and Amy Rudersdorf of the DPLA so that we could learn more about the DPLA metadata model, the ingest process, and ask questions about the peculiarities of our digital collections. We at Wake Forest were really lucky to be in North Carolina where the relationship infrastructure is already in place. The technical infrastructure seems like the hard part, but it’s really not; the relationship infrastructure is the hard part, and it’s already well-established in North Carolina.
North Carolina cultural heritage institutions contribute to the DPLA via the North Carolina Digital Heritage Center service hub in one of two ways. The first is by having their collections appear in the Digital Heritage Center’s site, digitalnc.org. The Digital Heritage Center contributes the digital collections that appear in digitalnc.org to the DPLA. Here you can see that the Digital Heritage Center’s partners are located from the mountains to the sea, as we like to say in North Carolina, and they run the gamut of institution type, with 68 academic libraries, 52 public libraries or private libraries and archives, and 24 cultural heritage organizations.
The second way that institutions can contribute to the DPLA via the North Carolina Digital Heritage Center service hub is to contribute collections that appear on the institution’s site, rather than digitalnc.org. Currently 12 institutions are contributing collections from their own sites via OAI-PMH feeds — the Digital Heritage Center itself, the State Library and State Archives of North Carolina, 8 academic libraries, and 2 public libraries. Of these, most institutions are using CONTENTdm, and WakeForest is the only institution using DSpace to manage our digital special collections. To a certain degree, we’ve had to blindly feel our way forward, and we’re proud of the steps we’ve made so far.
Each month, the North Carolina Digital Heritage Centerprovides the DPLA with a single stream of metadata, represented in MODS. This metadata stream is created from multiple feeds exposed by institutions around North Carolina via OAI-PMH.To incorporate an institution into that single stream, they ask for an institution’s OAI-PMH URL and take a look at their metadata to make sure all of the elements the DPLA requires are included. Right now, they only take Dublin Core metadata via OAI-PMH, although they’re able to expand that. They then create an XSLT style sheet per institution that will transform those Dublin Core elements to the standardized MODS elements which we provide to the DPLA.Once the style sheet is prepared, they use aggregation software called REPOX to essentially take all of those incoming feeds and provide a single MODS stream. REPOX manages that entire process – with it they are able to set up each data provider, add each of the sets the data provider wishes to include, and assign the style sheet created for that institution. Then, they do a test ingest from the data provider and take a look at the MODS output, to make sure all of the elements are correctly mapped. If the elements look good, then they’re set until the next monthly harvest.About a week before the DPLA wants to harvest the single stream, they go through and re-ingest each data provider’s sets, to make sure they have the most up-to-date metadata. They also add any new collections, as requested.Then, they simply wait until the DPLA finishes harvesting and things go online, where they spot check the results.
Digitalnc.org gets between 100–200 visits and around 300 pageviews per month from dp.la., and the number has gone up each month they’ve participated. Additionally,the DPLA is constantly looking for ways to increase traffic to partner websites by promoting content through social media. I spoke with Lisa Gregory of the North Carolina Digital Heritage Center over email, and she said quote, “The bigger benefits we’ve seen so far from participation is the experience we’ve gained in aggregating metadata on such a broad scale and seeing how it performs, as well as being a player in the conversation about data aggregation with others in the DPLA.”
Having shared my understanding of the larger DPLA ecosystem in which Wake Forest is participating, now I’ll focus on our experience so far as a contributing institution.
From the start, our tack has been iterative and incremental by design. This approach fits in well with the DPLA’s design process, which also drew inspiration from agile development methodologies. Most memorable for me was the call for participants in the DPLA beta sprint, which drew submissions that thoughtfully engaged all sorts of design problems from aggregating metadata to preserving an item’s context in its original collection to interfaces that facilitated serendipitous discovery.I’ll address a series of problems and discuss how we broke them down into priorities that we addressed in successive development cycles.For us, agile development is a loose framework for how we approached contributing to the DPLA, not necessarily a series of strict 4-week sprints. What was valuable was defining a single priority that we focus on at the expense of all other possible priorities during one of our loosely defined development cycles.
Dspace instanceHierarchical community, collection, itemDigitalForsyth, ad hoc collections
Our digital special collections, IR, and other digital content related to campus events are all managed together in one DSpace bucketDifferent collections have different access controls; not all are open to general publicWe couldn’t send one OAI-PMH feed to the Digital Heritage Center; we had to send one OAI-PMH feed per collection