Presentation from Digital Curator Dave Thompson on systems and processes for digitisation at the Wellcome Library for our second Digitisation Open Day.
Unblocking The Main Thread Solving ANRs and Frozen Frames
Systems, processes & how we stop the wheels falling off
1. Systems, processes & how we
stop the wheels falling off
Digitisation Open Day, September 2013
Dave Thompson
Digital Curator, Wellcome Library
2. Digitisation – process overview
Plan project
Catalogue
Identify material
Identify resources
Plan process
Review as you go
Digitise/proces
s
Deliver
Refine processes
Document/share
Document/share
Document/share
Funding, staff, equipment, IT,
storage, data management
planning
Open source player
3. Meanwhile, at the coal face…
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
4. Thinking conceptually … OAIS
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
In OAIS speak this is a SIP. An aggregation of object & its
metadata in a form that is acceptable to the repository, e.g.
JPEG2000 images and MARC XML.
The Open Archive Information System Reference model (OAIS) is an ISO
that describes a conceptual model of an archive. It sets out the activities of an
archive & the processes involved in submission, storage & access. Developed
by NASA after they ‘lost’ space data through obsolescence.
5. Thinking conceptually… OAIS
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
In OAIS speak this is a AIP. This is the object & its metadata
stored in a repository.
OAIS talks of 3 information packages.
1.Submission Information package = what is ingested
2.Archive Information Package = what is stored
3.Dissemination Information package = what is made available
6. Thinking conceptually …OAIS
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
In OAIS speak this is a DIP. This is the parts of the object & its
metadata that we are able to make available.
As defined in the (#DPC) handbook, access is assumed to mean continued,
ongoing usability of a digital resource, retaining all qualities of authenticity,
accuracy and functionality deemed to be essential for the purposes the digital
material was created and/or acquired for.
7. Lets tackle the basics…processing
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
Administrative metadata, (AMD) technical description of the files.
Automatically created by Safety Deposit Box (SDB) on ingest
into our repository. Used by the player for display purposes.
Administrative MetaData is typically created automatically, it could be:
•File size
•Image HxW
•File format
•Checksum
8. Lets tackle the basics…processing
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
DMD. MARC, converted to MARC XML. This becomes MODS in
the METS. Material must be catalogued before we can store it &
make it available.
Descriptive MetaData (DMD), typically human generated, AKA cataloguing
metadata. ISAD(g) for archival material, MARC for bibliographic material.
Metadata Object Description Schema (MODS)
9. Lets tackle the basics…processing
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
Safety Deposit Box (SDB), the place where we store digital stuff.
Ingest is automatically initiated by Goobi. Database that
associates objects with DMD & AMD. Source for dissemination.
Digital Repositories offer a convenient infrastructure through which to store,
manage, re-use and curate digital materials. They are used by a variety of
communities, may carry out many different functions, and can take many
forms.
10. Lets tackle the basics…processing
Administrative
metadata
Descriptive metadata
Digitised images
Ingestion into
repository
Creation of METS Access
+
=
+
+ +
METS is metadata about structure & pagination created by
humans, METS file built automatically.
A Metadata Encoding & Transmission Standard (METS) file is an aggregated
collection of DMD & AMD (a file list with structure) that provides a mechanism
for managed access. A METS file allows metadata from different system to
be combined into a portable format.
11. The formats
• JPEG2000 is our master image format.
• We create dissemination images (JPEG) on the
fly.
• Also use PDF, MPEG2, MP3
12. The systems
• Goobi. Manages & tracks the production of
digitised content.
• SDB. Repository that stores digitised content
along with its DMD & AMD.
• Player. User interface to view digitised material.
13. How Goobi works – the basics
• Project based.
• Workflow driven.
• Users accept ‘tasks’.
• A users role determines what projects they belong
to & what roles they have.
15. How Goobi works – METS editing
Pagination as per original
Descriptive metadata
Structure
16. Lessons from Goobi
• Design your workflows in advance. But be flexible.
• Automate as much as possible, saves time &
more efficient.
• Document processes & procedures.
• Share what you learn.
17. How SDB works – the basics
• Workflow based easily ‘talks’ to other systems.
• Content agnostic.
• Creates administrative metadata on ingest.
• Preservation orientated.
19. How SDB works – behind the scenes
• No public access to SDB.
• Little direct staff access to SDB content.
• High levels of automation of ingest, Goobi.
• Platform for dissemination mediated by the player.
20. Lessons from SDB
• Plan your systems integration, which system talks
to which, and how.
• Plan workflows & processes.
• Data management plan. Your eggs in one basket.
• Plan what you’ll do when it all turns to custard.
22. How the player works
• Makes HTTP request to SDB for content.
• Draws access conditions from METS file.
• Permitted actions drawn from METS.
• Draws DMD from live catalogue.
23.
24. Summary
• Digitisation is an end to end process that brings
together objects & metadata.
• Have to think about the whole system to deliver
results. Process is one of combining metadata
from different systems.
• Document plans & document process.
• Be prepared to be flexible & to change as
necessary. But try to stick to the plan!
25. Further reading
• Wellcome Library – http://wellcomelibrary.org
• Metadata Encoding & Transmission Standard at the Library of Congress -
http://www.loc.gov/standards/mets/
• Reference Model for an Open Archival Information System (OAIS).
Magenta Book. Issue 2. June 2012 -
http://public.ccsds.org/publications/RefModel.aspx
• Tessella, Safety Deposit Box - http://www.tessella.com/tag/safety-deposit-
box/
• Data management planning - http://www.dcc.ac.uk/resources/data-
management-plans
• Repository Software Comparison: Building Digital Library Infrastructure at
LSE - http://www.ariadne.ac.uk/issue64/fay
26. Thank you
Questions now, questions later…?
Dave Thompson, Digital Curator
Wellcome Library
d.thompson@wellcome.ac.uk - #welldigi
http://wellcomelibrary.org/