Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
A Digitization Primer for Botanical and Horticultural Librarians
1. A Digitization Primer for
Botanical and Horticultural Librarians
• Chris Freeland
– MBG Web and Digitization Project
Coordinator
• Doug Holland
– MBG Administrative Librarian
• Heather Rolen
– NYBG Digitization Specialist
CBHL 2002: A Digitization Primer
2. Why Digitize?
• Makes resources broadly available while
preserving original.
• 24/7 worldwide availability.
• Capitalize on investment in resources and
technology (Collections, storage curation)
• Assimilate disparate resources
• Learn something new (It’s Fun!!)
• Pressure from above (Everyone is doing it!)
CBHL 2002: A Digitization Primer
3. Survey Summary
13 Humble Responses!
– Little to no experience with projects
– Some with Scanning/Photoshop
• Types of materials
– Slides and glass plates 6
– Photos (Electrophoresis gels?) 7
– Printed material [loose, bound (rare books!)],
newspaper clippings, maps, architectural drawings,
seed and nursery catalogs] 10
– Herbarium Specimens 2
• Inhouse image database (Annie Malley)
CBHL 2002: A Digitization Primer
4. What we will be covering
• Audience and Users
• Goals
• Ownership
• Preservation
• Access
• Metadata
• Scanning
• Sustainability
CBHL 2002: A Digitization Primer
5. A Framework of Guidance for Building
Good Digital Collections
http://www.imls.gov/pubs/forumframework.htm
• Interoperability
• Reusability (Repurposing)
• Persistence
• Verification
• Documentation
• Respecting copyright and intellectual property law
• Think a little bigger and think about the future.
CBHL 2002: A Digitization Primer
6. Audience and Users
• Who are your users
– Today
– Future
• Lifelong Learners
• Scholar/researcher
• Students
• Business Community
CBHL 2002: A Digitization Primer
7. Why is it important to define
users?
• Guide selection process
• Determines complexity and type of
metadata
• Determines image resolution
• Determines web-site design
(Database or exhibit format)
• Determines equipment needs
CBHL 2002: A Digitization Primer
8. How can you retain users and
keep them coming back?
• Keep adding new content
• Creating value-added content after the
initial rollout
– Lesson plans, etc.
• Create an e-mail newsletter
CBHL 2002: A Digitization Primer
9. User Comments
• Should include a way to solicit, retain, and
respond to user comments and suggestions.
– Can tell you if you’re reaching your intended
audience
– Can provide you with wonderful comments to include in
grant proposals or to show your administration:
• “Thanks so much for sharing this. This is the internet
at its best.”
• “This is fantastic. I am most enjoying
these rare books, especially the
illustrations. I hope to use this with
teachers in the future.”
CBHL 2002: A Digitization Primer
10. Planning and Goals
• Have clear project goals and objectives
• Be aware that funding agencies may influence
the scope of your project
• Designate a project manager.
• Identify key departments or staff
• Stay realistic (perhaps conservative) in your
production promises.
• Document all changes and evolution in your
project.
CBHL 2002: A Digitization Primer
11. Ownership
• Copyright needs to be considered
• Holding doesn’t mean owning
• Is item in public domain?
http://www.unc.edu/~unclng/public-d.htm
http://cidc.library.cornell.edu/copyright/
• Modify your deed of gift to include digital
distribution
• Controlling intellectual property after digitization
CBHL 2002: A Digitization Primer
12. Selection
• Audience needs
• Good Collections
• Condition
• One or many collections or mainstreaming
• Item formats and sizes
• Metadata available or Collection condition
(Activities other than scanning require 75% or
project time)
• Rights
• Sensitive Issues (Skeletons??)
• Who else is doing the same or similar items?
CBHL 2002: A Digitization Primer
13. Preservation and Digitization
• Digitization is NOT preservation
• Do not discard originals.
• Why not?
– Media longevity
– Software and hardware obsolescence
• Digitization does preserve original
through reduced exposure and
handling.
CBHL 2002: A Digitization Primer
14. Preserving the Original
• Handle Items Once (Scan high!)
• Consider rehousing either before or
after scanning.
• Appropriate long term storage
• Remember 2/3 of project time has
nothing to do with scanning.
CBHL 2002: A Digitization Primer
15. Discovery and Access
(or Scanned and Deliver)
• Online Catalog or Database
– Subject Heading or keyword search
• Finding Aids for archival collections
• Exhibit style educational page
• Don’t forget metatags and visibility to
Web search engines. (If that is one of
your goals!)
CBHL 2002: A Digitization Primer
16. Web Access and Display
• Exhibit Approach
• Database Approach
• Both
CBHL 2002: A Digitization Primer
17. Exhibit Approach
• Pull together text, images, maps,
documents, etc. to tell a story
• Value added information enhances the
scanned images
• Appealing to a wide audience
CBHL 2002: A Digitization Primer
18. Example of Exhibit Approach
• Private
Passions,
Public
Legacy:
Paul
Mellon's
Personal
Library at
the
University
of Virginia
CBHL 2002: A Digitization Primer
19. Database Approach
• Give access to images through a search
mechanism
– Generally have to know something about
the collection to find what you’re looking
for
• Appealing to a more focused audience
– Scholars, professionals
CBHL 2002: A Digitization Primer
20. Example of Database Approach
• Making of
America
• Google Image
Search
CBHL 2002: A Digitization Primer
21. Both Approaches
• Provide value added information to
reach a wider audience
• Also give full access to the data for
people who know what they want to
view.
CBHL 2002: A Digitization Primer
22. Example – MBG Rare Book Site
CBHL 2002: A Digitization Primer
23. Design vs. Development
• Usually spend too much time discussing
background colors and layout
– Too subjective
• Should focus on
– Search engine placement
– Successful searches for key phrases
– Usage statistics
CBHL 2002: A Digitization Primer
24. “If you build it, they may not come”
• Indexing by search engines is not a
given
• Great images + great metadata does
not equal a popular site
• You must consider how search engines
work
CBHL 2002: A Digitization Primer
27. Indexing tips - <title> tag
• Use descriptive <title> tags:
– <title>MBG Rare Books: Plate 1 -
Cinchona officinalis</title>
CBHL 2002: A Digitization Primer
28. Indexing tips - <body> text
• Use text in your page:
– A Description of the Genus Cinchona by
Lambert, Aylmer Bourke
– Description of Page: Plate 1 -
Cinchona officinalis (Cinchona officinalis
L., quinine)
CBHL 2002: A Digitization Primer
29. More indexing tips
• Having key phrase in all 3 (<meta>, <title>,
and body text) increases your search engine
rank
• Indexing robots follow links on pages
– They will follow the hierarchy of your site
• Robots don’t:
– Click on buttons
– Use dropdown menus
– Natively navigate or index Flash/multimedia
content
CBHL 2002: A Digitization Primer
30. Case Study:
Köhler’s Medizinal Pflanzen
• Published 1883 –
1914
• Digitized in 1997
• Images were heavily
edited and cropped
• Text was added to
images
CBHL 2002: A Digitization Primer
31. Case Study:
Köhler’s Medizinal Pflanzen
• Created static HTML pages with links
through site
• Created a list of current botanical
names with links to illustration
• NOT technically sophisticated
• Used an Exhibit Approach
CBHL 2002: A Digitization Primer
32. Case Study:
Köhler’s Medizinal Pflanzen
• Receive more user feedback and image
requests for this site than any other
• Reasons:
– Popular content with interesting images
– Has been online for several years
– Simple web display that can be indexed
by all search engines
CBHL 2002: A Digitization Primer
33. Lessons learned
• DON’T:
– spend too much time bickering over
color schemes, fonts, and layout
– confuse users and indexing robots with
irregular navigation
– ignore importance of search engine
results for your content
CBHL 2002: A Digitization Primer
34. Lessons learned
• DO:
– spend time creating rich <meta> and
<title> tags and body text
– Learn how search engines index content
– Consider display, but focus on
development
CBHL 2002: A Digitization Primer
35. Metadata and Electronic
Resources
• Vast amount of information, increasing at a
faster rate than is manageable
• Standards developing and evolving, using best
practices
• Web enabled search engines—many, varied in
retrieval success
• Everyone’s a publisher, everyone’s a librarian
• HTML Metatags structure and content limited,
inhibits reliable searching
• Lack of subject rich terms
CBHL 2002: A Digitization Primer
36. Metadata and Standards
• Metadata definition: data about data; data
that aids in identification, description and
location of networked resources
• Standard Generalized Mark-up Language
(SGML)--1986
– Structure for producing documents
– Document Type Definition (DTD) created for
each type of material or individual publication
– SGML’s support of encoding text AND
description of document in the header
CBHL 2002: A Digitization Primer
37. Dublin Core Basics
• http://purl.oclc.org/dc/
• How it began
• Why it is important
– Simple to create
– Easy to understand
– International
– Flexible
• Descriptive, Structural and Administrative metadata
• All elements repeatable, all optional
CBHL 2002: A Digitization Primer
38. Dublin Core Elements
• Title • Subject
• Creator terms/classification
• Publisher • Rights Management
• Contributor • Source
• Description • Type
• Identifier • Language
• Date • Relation
• Format • Coverage
CBHL 2002: A Digitization Primer
39. How MBG uses DC for a book
• Title: Icones pictae plantarum rariorum descriptionibus et
observationibus illustratae / Auctore J.E. Smith, M.D. Fasc.
1-3.
• Creator: Smith, James Edward
• Subject_LCSH: Botany -- Pictorial works.
• Subject_LCCS: QK98 .S657
• Description: 2 p.l., 18 numb. 1. : 18 col. pl. ; 50 cm.
• Publisher: London, 1790-93, Missouri Botanical Garden
• Contributor: Photography and Web design by Debbie Windus.
• Date: 1998-09-01
• Identifier:
http://ridgwaydb.mobot.org/mobot/rarebooks/title.asp?
relation=QK98S657
• Relation: QK98S657
• Rights:
http://ridgwaydb.mobot.org/mobot/rarebooks/copyright.asp
CBHL 2002: A Digitization Primer
40. How MBG uses DC for a page/image
• Title: QK495F270L351797_0060.jpg
• Creator: Lambert, Aylmer Bourke, 1761-1842 Subject:
Cinchona.|Hyaenanche.|Rubiaceae.|Euphorbiaceae.|
Graphic media : --Copper engraving -- Uncolored --
1797 -- England.|
• Description: Plate 9 - Cinchona angustifolia
• Publisher: Missouri Botanical Garden
• Contributor: Missouri Botanical Garden
• Date: 1998-10-01
• Type: Image
• Format: jpeg
• Identifier: 0060
• Source: QK495.F270 L35 1797
CBHL 2002: A Digitization Primer
41. Subject Access
• Controlled vocabularies
– Vocabularies and thesauri
– Taxonomies
– Access
CBHL 2002: A Digitization Primer
42. XML
• METADATA
– descriptive
– facilitate discovery
• OAI
• MARC
• EAD
• Dublin Core
– administrative
– identify/manage/preserve digital object(s) over time
• info on where pieces reside
• info on how to view digital object
• info on scanning process
CBHL 2002: A Digitization Primer
43. XML
• METADATA cont.
– structural
– storage/presentation of digital object(s)
• METS (metadata encoding and transmission standard)
» http://www.loc.gov/standards/mets
• TEI (text encoding initiative) http://www.tei-c.org
• TEI for Libraries (5 levels of encoding)
• http://www.indiana.edu/~letrs/tei/
• METAe -automatic metadata creation
• http://meta-e.uibk.ac.at
CBHL 2002: A Digitization Primer
44. XML
• SGML/HTML/XML
– Standard Generalized Markup Language (1986)
– Hypertext Markup Language (1989)
– eXtensible Markup Language (1996)
• XML
– a document markup language for defining
structured information
– a language used by computers to define hidden
information about the structure of a document
CBHL 2002: A Digitization Primer
45. XML
• XML cont. -best of both worlds
– storage
• can store any kind of structured info/not limited
to Web delivery
– presentation
• flexible development/design
CBHL 2002: A Digitization Primer
46. XML
• XML is a lot simpler than SGML and is sometimes
described as an 80/20 solution: you get 80% of the
power of SGML for 20% of the effort
• You can use XML without thinking ahead and make up
your elements en route as long as they nest within each
other. This is called writing "well-formed" rather than
"valid" XML. Purists discourage this but people will do it
anyhow.
• XML is specifically designed to work easily with the
Web.
– http://facultyweb.at.nwu.edu/english/mmueller/ariadne/teixintro/
index.htm
CBHL 2002: A Digitization Primer
47. XML
• XML and NYBG digitization project
XML text
Images
files
Public access GSDL software
server suite
Public use
CBHL 2002: A Digitization Primer
48. XML
• XML/NYBG project
– lack of adopted standards
– nature of the data
– delivery mechanisms
• Research!
CBHL 2002: A Digitization Primer
49. XML
• XML sites
– http://www.oasis-open.org/cover/sgml-xml.html
– http://www.w3.org/XML/
– http://www.ucc.ie/xml/#exec
– http://www.xml.com/
• SGML sites
– http://www.oasis-open.org/cover/general.html
– http://www.w3.org/MarkUp/SGML/
• Listservs
– http://sunsite.berkeley.edu/XML4Lib/
– http://www.oasis-open.org/cover/lists.html
CBHL 2002: A Digitization Primer
50. Scanning
• Principles for
Scanning
• Access (not
preservation)
• Storage
• Outsource options
CBHL 2002: A Digitization Primer
51. Howard Besser’s Principles
• Scan at the highest resolution appropriate to
the informational content of the originals
• Scan at an appropriate level of quality to avoid
rescanning and re-handling of the originals in
the future--scan once
• Create and store a master image file that can
be used to produce derivative image files and
serve a variety of current and future user
needs
• Use system components that are non-
proprietary
CBHL 2002: A Digitization Primer
52. Besser’s Principles Cont.
• Use image file formats and compression
techniques that conform to industry standards
• Create backup copies of all files on a stable
medium
• Create meaningful metadata for image files or
collections
• Store media in an appropriate environment
• Monitor and recopy data as necessary
• Outline a migration strategy for transferring
data across generations of technology
• Anticipate and plan for future technological
developments
CBHL 2002: A Digitization Primer
53. Scan Basics
• Digital formats—Master/Archival, access,
thumbnail
• Always keep a facsimile master
• Minimum recommended standards-
NARA/LC/CPD
• Hardware requirements:
– Scanner that exceeds your standards
– Workstation—At least Pentium III, 650mhz,
storage (20+gigabyte)
– Server for display and archiving
CBHL 2002: A Digitization Primer
54. MBG Imaging Lab Specs
• See handout
CBHL 2002: A Digitization Primer
55. Scanning
• Your requirements may be different
than the accepted norm
– Maybe 600 dpi is too low for your
project
• Should be aware of generally accepted
guidelines
– Have to know the rules before you break
them
CBHL 2002: A Digitization Primer
57. Scanning
• Software—Scanners come with some
basic software, Adobe Photoshop Lite
• Keep current on software
• Physical facilities for scanning
• When to outsource/special materials
CBHL 2002: A Digitization Primer
58. Outsourcing
• What?
– Contract work to service providers
– Off-site, on-site, imaging only, image/content
display/management provider, ASP (application
service provider)
• Why?
– Factors to consider
• Project size
• project expectations
• staff size
CBHL 2002: A Digitization Primer
59. Outsourcing
• Why? Cont.
• staff expertise
• available resources (funding for staff training and
equipment, physical space)
• deadlines
CBHL 2002: A Digitization Primer
60. Outsourcing
• NYBG/Mellon Digitization Project
– 3 titles from RB collection
– conservation efforts necessary
– 21 month grant, no lab, no allocated space to
build lab, no staff, no expertise, no extra
funding for equipment or staff training, project
expectations (grant stipulates archival quality
imaging, hard deadline)
– image capture outsourced to east coast vendor,
quality checks performed in-house
CBHL 2002: A Digitization Primer
61. Outsourcing
• Weighing the pros and cons
– fragile/rare materials under supervised control
vs. equipment costs and
updates/staff/expertise/time/ physical space
• Worth consideration
– …”For digitization projects, institutions and service providers
are working with developing technologies and a new
vocabulary, creating new quality and production benchmarks,
and trying to determine best practices. All the while, digital
technology continues to evolve. Both parties must collaborate
to determine capture requirements, costs, and deliverables;
manage the process; and agree on criteria.” -Meg Bellinger,
President, Preservation Resources, Moving Theory into Practice, 2000.
CBHL 2002: A Digitization Primer
62. Outsourcing
• Vendors
– Octavo http://www.octavo.com/
– Systems Integration Group
http://www.sigi.com/
– Preservation Resources
http://www.oclc.org/oclc/presres/
– Saztec http://www.saztec.com
– Innodata http://www.innodata.com/
– Northern Micrographics http://www.normicro.com/
northern_micrographics.htm
CBHL 2002: A Digitization Primer
63. Sustainability
• Digitization shouldn’t be a fling, (when
others are paying the bills) It is a
marriage and more.
• Time = Money
• Permanence
• Data Migration and Emulation
• Review and schedule upgrades
• Documentation
CBHL 2002: A Digitization Primer
64. Cost
• Not cheap, but consider the value of objects,
the investment already made on your
collections and your organizational mission .
• Prices range from $7 - $35 per image
• Most projects are funded on soft money.
Attempt to incorporate scanning into normal
operating budgets.
• Scanning is 1/3 of total cost.
• Largest cost is in research and time invested
in creation of metadata or organization of
collections.
CBHL 2002: A Digitization Primer
65. Staffing
• Staff with tolerance for ambiguity
• Staff with creativity
• Training in metadata, scanning
• Photographic skills (artistic eye)
microcomputer skills, web design skills
• Staff with risk taking attitude
CBHL 2002: A Digitization Primer
66. Concluding Thoughts
• Create digital products worth preserving
• Collaborate!
• Adhere to standards
• Refresh/migrate your data
• Don’t forget preservation metadata-
digital products are not copies, but new
artifacts
CBHL 2002: A Digitization Primer
Notas do Editor
Introductions Disclaimers – Not on “Guru Tour” of digitization workshops Share what we have learned.
Ask people why they want to digitize Call on or read those that replied Tell how Raven was asking why if the Art museums library was digitized, why isn’t ours? Cataloging vs. digitizing. We are really talking about “Reformatting” as opposed to items born digital.
Share what we have learned. Everything is scaleable. You are not LC Focusing on projects, but can be 50 or 5000 or 50000 images. Even if you are scanning only for in house use, not for web delivery much of this is relevant. As complicated as this seems, it is continually getting more standardized and easier. Many more resources and standards available A Framework of Guidance for Building Good Digital Collections
Distinguish between a enduring value and immediate value scanning. Importance of thinking of issues bigger than “scanning” especially when using funding from large agencies. Lots of people around the world digitizing. Standards, Interoperbility More Bang for Buck. Reuse and repurposing of digitized items. Think big and think about the future Includes Sources for detailed information on Good selection Sources for detailed information on creating Good Digital Objects Souces for detailed information on creating good Metadata Sources for detailed information on running Good Project
See matrix Appendix 1 Predicting users is difficult.
Allow for major staffing, hardware or software delays Project manager who is accountable and empowered Who do you need to talk to about server space, databases, programming? Documentation for others and your own institution. Reports for granting agencies. Make an estimate---Double time, triple expense. Cost for scanning 500 page rare book $3000 staff time and media. No equipment or indirect costs.
Mr. Sid controls access to high resolution images
Selection can be based on Themes Geography Historic Period Subject Headings Core Lists! Material Types (images) Don’t give exact localities of rare and endangered plants Don’t scan personal or sensitive information about founders or their descendents.
Will address preservation of digital objects later.
“Guided tour” of images Browse by Subject could be thought of as an exhibit approach
No search function. Purpose is to tell the story about Paul Mellon, not document his collection in its entirety.
Give MBG Archives project at example.
No search feature No thought given to future books, format, etc.
What’s different about electronic resources
Some history and standards
We want to provide guidance as well as guidelines
Sho Think through long term commitment next month, 5 years, 50 years 100 years? Cannot put a digital collection on the shelf for 50 years Storing archival images on CD or Servers Documentation for future archaeologists, or the project manager who takes over when you move to a new job. Good metadata will contain some documentation