Digitalização: Captura de Imagem e Fluxo de Trabalho - Constance Rinaldo

Digitalização: Captura de Imagem
e Fluxo de Trabalho
Martin Kalfatovic, Keri Thompson &
Connie Rinaldo

Selection
Refinement
Digitization
CurationUse
Selection
Collection Management Cycle

• Workflow has become more
complicated
• Difficulty finding books that are
easy to scan
• Reviewing titles in copyright takes
time
• Fragile books need repair
• The same amount of work, but a
different kind

Upload spreadsheet titles scanned plans. Include OCLC number, title, volume number,
Author, Publisher, Date
Tool tries to find matches in other spreadsheets submitted
Lesson: metadata is always worse than you think

Title, volumes needed
Which library has which volumes,
additional information
conversation
about which
volumes need
to be scanned
GEMINI: A Critical Tool

Selection
Refinement
Digitization
CurationUse
Selection

• Purpose - to provide an accurate digital
representation of the original object
• one page per image
• (except Field note-books - 2 pages per image)
• no image editing
• Reuse existing metadata
• in the library catalog
• other sources (BioStor etc.)
Capture: Scanning

Capture-Scanning
• Most libraries BHL US / UK use the Internet
Archive (IA) for scanning books
• Some shared funds/one contract for all BHL
• Open Access, nonprofit
• Services inexpensive
• Each member library has its own workflow
• Members provide basic metadata from library
catalog
• In-house digitization or hire another seller
• MACAW

• * Scan books, from
cover to cover one
image per page?
• * Also called
"volume" or "item"
is a physical unit,
not intellectual
unity, ie, a book =
multiple articles or
book = a
monograph
Cover
Cover
good stuff

Partial replication in
Alexandria, Egypt
Secondary backup is in the
Smithsonian, including TIFF
scanned volumes for home (SIL)
~ 90TB
Primary Storage files and
"staging area" is on the
Internet Archive in San
Francisco, USA

Images scanned by the library or other
vendor
Metadata collected through Z39.50
Additional metadata for the item and
pages entered by library staff using the
software Macaw (biblio software mimics
IA)
In-house scanning

Smithsonian Libraries:
uses 2 sets of Phase One:
P65 60 MP camera on a copy stand and BC100 -
dual-chamber 40mP
CaptureOne software
By folios (> 36cm), fragile books
EXCEPT Notebooks Field
Project (Smithsonian
Archives) - 2 pages per
image to notebooks, letters
flatbed scanner

Capture: Harvest
• Scheduled tasks automated
• Books already in the Internet Archive
• subject terms
• Library "call numbers”
• BioStor/articles

Interface for staff to
edit records and
serial volumes put in
order
Curated add and edit
metadata includes
books, merging records
and authors, removing
volumes that are
outside the scope of
the collection, re-scan
books with errors.
CURATION

allows people to
enter the page-level
metadata such as
page number, page
type (picture, text,
etc.)
creates XML files to
upload to IA
Replicates software
functionality from
Internet Archive
Installed in a shared
SI server for
partners to use
MACAW: MetadatA Collection And Workflow
A Critical Tool

•"Title" Record MARC library catalog
•Transformed into MARCXML and MODS
•Information "Volume" catalog or introduced by humans, stored
in xml
•"Segment" (article) the information entered by humans or
bioStor etc. (after scanning)
•"Page" metadata entered by humans, stored in the XML file that
provides structure to the digital object
Metadata

add metadata
page level,
such as page
numbers or
titles of
articles

• Other files derived from Internet Archive processes
– PDF
– Djvu (OCR text - .txt and .xml)
– ePub/Daisy/Kindle
• Other files created by BHL processes
–Taxonomic names
–OCR text
– BHL METS

Discovering and storing species names associated with pages allows the creation of
"species bibliographies," EOL.org connections, GBIF connections

Users can (and do!)
Report technical
problems
Request new
functionality
Report data errors
Request scanning of
specific titles
Gemini

Which library has which volumes,
additional information
Gemini
Title, volumes needed
Assigned to
Cornell
University
Requestor
For all we know, in response to user requests is rare in the world
of Digital Library.

Smithsonian Libraries
Workflow
s
database
library
catalog
Macaw
Internet
Archive
Move &
de-
duplicate
tracking &
shipping
Scanning &
metadata
harvesting
BHL
transform
& package
scanning &
metadata
harvesting
create
metadata
page
create
derivative
create
metadata
page
MARC  MARCxml
URL to BHL into MARC record species names
quality
control
(% sample)

Digitalização: Captura de Imagem e Fluxo de Trabalho - Constance Rinaldo

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (18)

Semelhante a Digitalização: Captura de Imagem e Fluxo de Trabalho - Constance Rinaldo

Semelhante a Digitalização: Captura de Imagem e Fluxo de Trabalho - Constance Rinaldo (20)

Mais de SciELO - Scientific Electronic Library Online

Mais de SciELO - Scientific Electronic Library Online (20)

Último

Último (20)

Digitalização: Captura de Imagem e Fluxo de Trabalho - Constance Rinaldo

Notas do Editor