The Digital Public Library of America: An Overview and Working with the National Collections. Martin R. Kalfatovic. NAGARA/CoSA Joint Conference. Santa Fe, New Mexico. 21 June 2012
A Beginners Guide to Building a RAG App Using Open Source Milvus
The Digital Public Library of America: An Overview and Working with the National Collections
1. The Digital Public Library of America
An Overview and Working with the National Collections
Martin R. Kalfatovic | Smithsonian Libraries
NAGARA/CoSA Joint Conference
21 June 2012 | Santa Fe, New Mexico
2. El Dorado County Public Library
South Lake Tahoe, CA
Travelin' Librarian
Flickr: http://www.flickr.com/photos/travelinlibrarian/176011378/
7. Libraries
Archives
Museums
Denver Art Museum &
Denver Public Library
8. The Digital Public Library of America (DPLA) will make the cultural
and scientific heritage of humanity available, free of charge, to all.
The DPLA’s primary focus is on making available materials from
the United States. By adhering to the fundamental principle of free
and universal access to knowledge, it will promote education in the
broadest sense of the term. That is, it will function as an online
library for students of all ages, from grades K-12 to postdoctoral
researchers and anyone seeking self-instruction; it will be a deep
resource for community colleges, vocational schools, colleges,
universities, and adult education programs; it will supplement the
services of public libraries in every corner of the country; and it will
satisfy other needs as well—the need for data related to
employment, for practical information of all kinds, and for
enrichment in the use of leisure
Concept Note (March 2012)
Denver Art Museum &
Denver Public Library
9.
10. "These libraries have improved the
general conversation of the
Americans, made the common
tradesmen and farmers as intelligent
as most gentlemen from other
countries, and perhaps have
contributed in some degree to the
stand so generally made throughout
the colonies in defense of their
privileges.”
Benjamin Franklin, Autobiography
19. + DPLA planning initiative
The
grew out of an October 2010
meeting at the Radcliffe Institute
for Advanced Study which
brought together 40
D P
representatives from
L A
foundations, research
institutions, cultural
organizations, government, and
libraries to discuss best
approaches to building a national
digital library.
Towards a Digital Public Library
of America
“...an open, distributed network of comprehensive online resources that
would draw on the nation’s living heritage from libraries, universities,
archives, and museums in order to educate, inform, and empower
everyone in the current and future generations.”
20. +
Code
Metadata
Content
D P
L A
Tools & Services
Community
Governance
Towards a Digital Public Library
of America
> Formally launched in October 2011 at the DPLA Plenary at the National
Archives in Washington;
> $5 million in funding from the Sloan and Arcadia Foundations;
> Ambitious two year goal with a launch of the DPLA in October 2013.
21.
22.
23. National Archives | Smithsonian Institution | Library of Congress
Modeling a Digital Collaboration for America’s National Collections
24. America’s National Collections
National Archives Smithsonian Institution Library of Congress
The National Archives and The Smithsonian Institution—the Today's Library of Congress is
Records Administration is the world’s largest museum and an unparalleled world resource.
nation’s record keeper, research complex —includes 19 The collection of more than 144
safeguarding and preserving the museums and galleries and the million items includes more than
records of the United States National Zoological Park. The total 33 million cataloged books and
Government and ensuring that the number of artifacts, works of art other print materials in 460
American people can discover, and specimens in the languages; more than 63 million
use, and learn from this Smithsonian’s collections is manuscripts; the largest rare
documentary heritage. In addition estimated at 137 million. The bulk book collection in North
to the Archives facilities in the of this material—more than 126 America; and the world's largest
Washington, DC area, there are million specimens and artifacts—is collection of legal materials,
14 regional Archives facilities and part of the National Museum of films, maps, sheet music and
13 Presidential Libraries around Natural History. In addition, the sound recordings. By providing
the country. We have over 10 Smithsonian Libraries maintains these materials online, those
billion records in our holdings. Our 1.9 million library volumes, who may never come to
holdings occupy over 4 million including rare books; and 89,000 Washington can gain access to
cubic feet of space and 100 cubic feet of material are held in the treasures of the nation’s
terabytes of electronic storage. archives. library.
25. The importance of participating in the
DPLA, and especially of showing the
ability of three of the nation's public
institutions to collaborate in making
their collections accessible drove the
project. For the Beta Sprint, we
selected a small set of records that
would show some of the breadth of
our collections.
Abraham Lincoln’s Hat
Smithsonian Institution
26. Each of our individual collections
uniquely contribute to America's
National Collections. In the short
time of the Beta Sprint, it was not
possible to build a fully working
implementation of an interface to
the disparate collections of the
Smithsonian Institution, the
Library of Congress, and the
National Archives.
Proof of Union Service
National Archives
27. For the purpose of the DPLA Beta Sprint, staff from the
Smithsonian Institution, the Library of Congress, and the
National Archives modeled a faceted search aggregator
using the Smithsonian's Enterprise Digital Asset Network
(EDAN) as a starting point.
Yankee volunteers marching into Dixie
Library of Congress
28. As part of this proof of concept, a selection of records, with associated digital
assets were drawn from the collections of the Library of Congress and the
National Archives. Only a small set from the Library of Congress and National
Archives were selected to test data mapping.
The eleven records from NARA represent a
sampling of documents from the Online Public
Access (OPA) system, including 19th-century
photographs, patents, drawings, and
correspondence.
The ten records from the Library of
Congress were drawn from the Performing
Arts Encyclopedia, including music
manuscripts of composer Johannes Brahms
and several pieces of 19th century sheet
music.
Example searches from LC and NARA websites
29. These joined the 7.44+ million records with 570,000+ images, video and sound
files, electronic journals and other resources from the Smithsonian's libraries,
archives & museums in a test site for the Beta Sprint.
LIBRARIES ARCHIVES MUSEUMS
Smithsonian Libraries collections Smithsonian archival collections Smithsonian museum collections
Example search retrieving digital Photographs, papers, online Scientific specimens as well as
books and other library finding aids and other archival artworks and historical artifacts
collections from the collections are available through are findable through the
Smithsonian’s Collections Search the Smithsonian’s Collections Smithsonian’s Collections
Center. Search Center Search Center.
30. The World Wide Web
DPLA
Conceptual High-Level
Architecture for a Common
Aggregated Search Index and
Web & Mobile Applications
Service Layer, Including an Image
User Interface(s)
Delivery Service
Metadata Delivery Service Tag Service
Image Delivery
Search Index / Metadata Repository Service (IDS)
Data Transformation
LoC NARA SI
Library, Archive and Museum Systems (LAMS) Data Sets
31. End Users
DPLA
Technical High-Level Architecture
for a Common Aggregated Search
Index and Service Layer, Including
an Image Delivery Service
Web or Mobile Application / User Front-End – Cloud-hosted –
MDS, IDS
Handler
Metadata Delivery Service (MDS)
Image Delivery Service (IDS)
Firewall
MDS
Back-end + IDS Back-end
Master Index Request handlers, Response handlers,
Update handlers
Master Lucene Index / Metadata Repository
Image Store(s)
Pre-Processing
Raw Index
jetty (admin for Solr)
update handlers
Ingest / pre-processing repository
XML Data Sources
SI NARA
Data Ingests LoC ...
37. Acknowledgements: Morgan Cundiff (LC) | David Ferriero (NARA) | Nancy Gwinn (SI) | Matthew
Jenkins (SI) | Martin Kalfatovic (SI) | Deanna Marcum (LC) | Ruth Scovill (LC) | Nate Trail (LC) |
Günter Waibel (SI) | Ching-Hsien Wang (SI) | Pamela Wright (NARA)
38. Next steps
Governance: Create a DPLA Board
Content: Convene content providers
DPLA Hack-a-thon (April 2012)
Technical: Tech Dev Team and Beta Sprint, pt. 2
Community: engagement of more audiences
39. Where to learn more about DPLA
http://dp.la
http://dp.la/wiki
http://dp.la/blog