SlideShare uma empresa Scribd logo
1 de 49
Pen to Pixel:
Bringing Appropriate Technologies to
    Digital Manuscript Philology
                    Michael B. Toth
                     R. B. Toth Associates
                           rbtoth.com
             http://www.thedigitalwalters.org/
     On behalf of the Walters Art Museum Digitization
                     Team, especially:
                      Lynley Herbert,
     Ariel Tabritha, Diane Bockrath, Kimber Wiegand,
                        Doug Emery

  Supported by the US National Endowment for the Humanities
Walters Art Museum
                                                        W.562, 2b
                                                           Koran
                                          9th century AH / 15th CE


Walters Art Museum, Baltimore, Maryland




Digital Imaging System
St. Catherine’s Monastery, Sinai




Spectral Imaging System
US Library of Congress




Spectral Imaging System
Advanced Digitization
Applied Science & Technology
…to Manuscript Studies
Manuscript Studies
 20th Century and prior
Manuscript Studies
    21st Century
Obscured Information
Illuminated Manuscripts
Digital Manuscript Challenges
“…an ultimate challenge to creators and users of digital
tools wishing to produce useful and reliable digital counter-
parts to these medieval sources of knowledge and
testimonies of intellectual creativity.”

   • Complex, Changing Technical Climate
   • Range of Digital Products & Formats
   • Need for Integrity of Entire Data Set
   • Demand for Continual & Faster Access
   • User Repurposing of Content
   • Restrictions on Access and Use
Simplicity of Data

1. Access to data
 • By People
 • By Machines
2. Licensing
 • Global Storage &
   Access
Walters Online Manuscripts
The Digital Walters
  http://www.thedigitalwalters.org/
Islamic Manuscripts of the Walters
   Art Museum: A Digital Resource
                    (2008 to 2011)
Parchment to Pixel:
Creating a Digital Resource of
        Medieval Manuscripts
                 (2010 to 2012)
The Digital Walters

       Over 10 Terabytes of Data
                                                   ng!
                                                wiing!
                                          d grro w
                                        nd g o
                              .. .. .. a n
                                       a
                  Islamic  Parchment           Total
                            to Pixel
No. of                 172        107              279
Manuscripts
No. of TEI             170               37        207
Descriptions
Distinct Images      46,857         34,084      80,941
Image Files        187,266        134,698 321,964
Data Size          5.99 TB        4.09 TB 10.08 TB
Data & Metadata

• Long-term data set viability beyond the
  lifetime of current technologies
  – Adherence to existing broadly accepted
    standards
  – Simple, flat metadata records
• Integration of metadata with images,
  supporting data and scholarly products
Cataloging & Metadata
• Metadata Integrated with Digital Object
  – Adherence to broadly accepted standards
  – Simple, flat metadata records
• Persistent Identifiers
• Accepted Standards
  – Standardized Vocabularies
  – Metadata Schema
  – xml to support conversion to other formats
    (e.g. MARC, MODS, EAD)
• Documentation & Preserve Standards
Data Integrity
•   Image
•   XML Metadata
•   TEI Catalog
•   License
Standardiz
         e
•   Cataloging
•   Metadata
•   File Format
•   Imaging and Color
•   Resolution or Fidelity
•   Vocabulary and Geographic Names
       • Foreign Language and English
•   Intellectual Property
•   Storage
•   Quality and Quality Control
•   Others
Preservation & Access

Owner of Archimedes Palimpsest:
• Preserve data in “flat files”
   – Do not tailor data for Web interfaces
• Host data on “spinning disks”
   – Did not want digital product to end up on media that
     could become obsolete, with limited access
• Make broadly available on Internet
   – Do not place restrictions on use
Data Layout


 Access            ReadMe     Data

 Walters
Manuscripts
                  Technical
                  ReadMe      Supplemental
  Access
   Other
  Books
Digital Walters File Structure
Cataloging Information
• Manuscript level: all information that applies the
  manuscript as a whole, including an abstract, physical
  dimensions and features of the manuscript, like size,
  extent, collation, and binding.
• Manuscript item level: all information that applies to
  the intellectual divisions of the book, including the titles
  of works, rubrics, incipits, colophons, layout information
  about the written surface.
• Manuscript piece level: all information for the items
  imaged (i.e., binding pieces, flyleaves, and folios),
  including item name, folio number, and, for illuminated
  pieces, detailed descriptions of the art work.
Dublin Core Metadata Initiative
         Element Set
Manuscript DCMI Elements

• Identifier: the shelf mark for manuscripts (e.g., W.582), and the image
    serial number for images (e.g., W582_000001)
• Creator: always the Walters Art Museum
• Contributor: one entry for each project participant responsible for the
    creation of the manuscript’s data set
•   Date: the date of web page or image creation
•   Title: the title of the manuscript (e.g, “Walters Ms. W.579, Prayer”)
•   Description: a description of the manuscript or image
•   Source: source of the object used to create the image or image collection
•   Type: Image for individual images; Collection for all images of a manuscript
•   Format: image/tiff for images, text/html for a manuscript web page
•   Subject: keywords describing the manuscript or imaged folio
•   Rights: license and usage terms
License and use: UPDATED! 6 February 2013
All License and use:images and descriptions provided here are licensed for use under the
    Walters manuscript UPDATED! 6 February 2013
Creative Commons Attribution-Share Alike 3.0 Unported License are licensed for use under the
   All Walters manuscript images and descriptions provided here and the
   Creative Commons Attribution-Share Alike 3.0 Unported License and the
GNU Free Documentation License.
You are Free to download andLicense. images and descriptions on this website under the licenses
   GNU free Documentation use the
named are freeYou do not need to apply to the Walters prior to using the images. We ask only that
   You above. to download and use the images and descriptions on this website under the licenses
you cite the source of the not needas the Walters Art Museum. to using the images. We ask only tha
   named above. You do images to apply to the Walters prior
Additionally, we request that images of any work created using these materials be sent to the
   you cite the source of the a copy as the Walters Art Museum.
Curator of Manuscripts andthat a copy ofat the Walters Art Museum, 600 N. Charles Street, the
   Additionally, we request Rare Books any work created using these materials be sent to
Baltimore, of Manuscripts and Rare Books at the Walters Art Museum, 600 N. Charles Street,
   Curator MD 21201, mss-curator@thewalters.org.
Note these terms 21201, mss-curator@thewalters.org.
   Baltimore, MD mark a change from our previous license, which placed a noncommercial
restriction on the use of these materials. The previous license, which placed a noncommercial this
   Note these terms mark a change from our noncommercial restriction no longer applies, and
license supersedes use previously advertised license, and replaces that foundlonger applies, and thi
   restriction on the the of these materials. The noncommercial restriction no in many of the
   license supersedes the previously advertised license, and replaces that found in many of the
archival TIFF image headers.
This change follows theheaders. Art Museum’s licensing policy. More information on the Walters’
   archival TIFF image Walters
   This change follows the Walters Art Museum’s licensing policy. More information on the Walters’
intellectual property policy can be found on the Walters website: http://art.thewalters.org/license/.
   intellectual property policy can be found on the Walters website: http://art.thewalters.org/license/.
Metadata xml Information
•   /manuscript: top-level container of metadata for a manuscript’s images
•   /manuscript/image_object: description of the manuscript, primarily Dublin Core
    metadata, with the number of images captured in the imageCount element
•   /manuscript/images: container for the manuscript’s image data
•   /manuscript/images/image: information about a single capture and its
    derivatives, including:
     –   /manuscript/images/image/index: the order of the image in the set, beginning with 0
     –   /manuscript/images/image/image_subject: the folio number or name of the piece imaged
•   /manuscript/images/image/capture: detailed information about the image’s
    capture extracted from the imaging software database
•   /manuscript/images/image/masterDerivation: description of how the archival
    TIFF image was generated from the camera raw file, including cropping and
    color correction information
•   /manuscript/images/image/jhoveData: XML output of the JHOVE utility run on
    the archival TIFF file
•   /manuscript/images/image/derivative: three elements containing cropping and
    scaling information needed to generate the 300 PPI, SAP, and thumbnail files
    from the archival TIFF
xml Model
/manuscript
 /manuscript
  /image_object
   /image_object

          //
     manuscript/i
     manuscript/i
       mages
       mages
        /image
         /image
                 /image
                  /image
    /capture
     /capture
                      /image
                       /image
         /capture
          /capture
                            /image
                             /image
              /capture
               /capture
                   /capture
                    /capture
Preserve Standards
Standard Workflows
          for Data Management
• Transfer & archive digital data for research
  and analysis by the curatorial, scholarly,
  preservation and imaging communities
• Clear access procedures
  − Ensuring data integrity for digital storage
    repositories,
  − Preventing introduction of mislabeled and
    incorrect metadata
Quality Control
• Data Quality
  – Automate data handling to avoid error
  – Audit trail for manual data manipulation
• Quality Management
  – Implement processes for quality review
  – Verification and Validation
• Documentation
  – Define metrics &
    quality goals
Data Management System
• Internal Digital Asset Management System
  – Internal Server
     • Image Files
     • Catalog Data
• Access Infrastructure
• Security
• Backup
  – Internet Systems
    Consortium
IDR Access Model

                    Johns Hopkins Metadata Application




Metadata
 Metadata
                  Agent                                Agent        Metadata
(METS)                                                               Metadata
 (METS)                                                             (METS)
                                                                     (METS)
                               Preservation
                               Metadata:
                   Event       Implementation          Request
                    Event      Strategies               Event
                               (PREMIS)

    Digital
     Digital                                                         Digital
Representation                                                        Digital
 Representation                                                  Representation
  e.g. TIFF                                                       Representation
   e.g. TIFF                                                       e.g. TIFF
   Image                                                            e.g. TIFF
     Image                                                          Image
                                                                      Image
                               Dublin Core      TEI
                                Dublin Core      TEI
                                Metadata
                                 Metadata
                                Initiative
                                  Initiative
                                 (DCMI)
                                   (DCMI)
Preservation of the Data
Preservation Heresy:
Preservation Heresy:
  The Digital information is closer to the original
   The Digital information is closer to the original
  than the Artifact itself
   than the Artifact itself




                                  <
“I don’t use the parchment. The parchment is gone! As far as the
 “I don’t use the parchment. The parchment is gone! As far as the
scholars are concerned, there is no parchment. You only work from
 scholars are concerned, there is no parchment. You only work from
digital images on the laptop – that’s the only thing that matters for the
 digital images on the laptop – that’s the only thing that matters for the
reading.” – Dr. Reviel Netz, 14 Jan WYPR
 reading.” – Dr. Reviel Netz, 14 Jan WYPR
What Will Happen to the Data?
    “There’s a big technical issue that has me worried.
     “There’s a big technical issue that has me worried.
    The information on the Net is not all simple text. It’s
     The information on the Net is not all simple text. It’s
    structured, whether it’s Microsoft Word documents or
     structured, whether it’s Microsoft Word documents or
    PDFs. That means the information is only really
     PDFs. That means the information is only really
    accessible if you understand how to interpret the bits.
     accessible if you understand how to interpret the bits.
    What happens when files are there and we don’t
     What happens when files are there and we don’t
    know how to interpret them anymore?
     know how to interpret them anymore?
    “If you have a CD but the form isn’t known anymore. II
     “If you have a CD but the form isn’t known anymore.
    have 5 1/4-in. diskettes, but nothing to read them.
     have 5 1/4-in. diskettes, but nothing to read them.
    Even 3 1/2-in. diskette readers are becoming hard to
     Even 3 1/2-in. diskette readers are becoming hard to
    come by. The physical source media change.
     come by. The physical source media change.
    We may lose the ability to read them.”
     We may lose the ability to read them.”
    Vint Cerf,
     Vint Cerf,
    Google Internet Evangelist, recipient of US Presidential Medal of
     Google Internet Evangelist, recipient of US Presidential Medal of
    Freedom, and basic architecture of the Internet.
     Freedom, and basic architecture of the Internet.
    July 30, 2007 (Computerworld)
     July 30, 2007 (Computerworld)
Digital Preservation

Impermanence of Digitized Data
• Dynamic technology, media and
  formats
     • Rapid obsolescence
     • Regular reformatting required
• Ensure utility of data
• Broad distribution to service providers
• Standardized formats & encoding
License
All artworks in the photographs are in public domain due to age. The photographs of two-
dimensional objects are also in the public domain. Photographs of three-dimensional objects and
all descriptions have been released under the Creative Commons Attribution-Share Alike 3.0
Unported License and the GNU Free Documentation License.
You are free to download and use the images and descriptions on this website under the licenses
named above, but if you desire digital images at a higher resolution, for scholarly or commercial
publication, please contact our photo services department.
Trusted Digital Repository

• Compliance with the Reference Model for an
  Open Archival Information System (OAIS)
• Administrative responsibility
• Organizational viability
• Financial sustainability
• Technological and procedural suitability
• System security
• Procedural accountability
Future Opportunities




      Michael B. Toth
      R. B. Toth Associates
          rbtoth.com

Mais conteúdo relacionado

Semelhante a Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
Jon Voss
 

Semelhante a Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology (20)

International Image Interoperability Framework panel at #CIDOC2017 conference
International Image Interoperability Framework panel at #CIDOC2017 conferenceInternational Image Interoperability Framework panel at #CIDOC2017 conference
International Image Interoperability Framework panel at #CIDOC2017 conference
 
MMA dia-digital-access-presentation-2018
MMA dia-digital-access-presentation-2018MMA dia-digital-access-presentation-2018
MMA dia-digital-access-presentation-2018
 
Just Digitise It! - Daniel Wilksch
Just Digitise It! - Daniel WilkschJust Digitise It! - Daniel Wilksch
Just Digitise It! - Daniel Wilksch
 
Digitization and public libraries
Digitization and public librariesDigitization and public libraries
Digitization and public libraries
 
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
 
Digital Thinking: Applying Studies in the Field
Digital Thinking: Applying Studies in the FieldDigital Thinking: Applying Studies in the Field
Digital Thinking: Applying Studies in the Field
 
How the Web of Data Will be Won
How the Web of Data Will be WonHow the Web of Data Will be Won
How the Web of Data Will be Won
 
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office VictoriaJust Digitise It - Daniel Wilksch of the Public Records Office Victoria
Just Digitise It - Daniel Wilksch of the Public Records Office Victoria
 
Digital Democratization of Art led by the Smithsonian’s Freer|Sackler
Digital Democratization of Art led by the Smithsonian’s Freer|SacklerDigital Democratization of Art led by the Smithsonian’s Freer|Sackler
Digital Democratization of Art led by the Smithsonian’s Freer|Sackler
 
Just Digitise It - Daniel Wilksch - 2015
Just Digitise It - Daniel Wilksch - 2015Just Digitise It - Daniel Wilksch - 2015
Just Digitise It - Daniel Wilksch - 2015
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Data Mining Lecture_1.pptx
Data Mining Lecture_1.pptxData Mining Lecture_1.pptx
Data Mining Lecture_1.pptx
 
datamining-lect1.pptx
datamining-lect1.pptxdatamining-lect1.pptx
datamining-lect1.pptx
 
chương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdfchương 1 - Tổng quan về khai phá dữ liệu.pdf
chương 1 - Tổng quan về khai phá dữ liệu.pdf
 
Christensen dunlop dublin_core
Christensen dunlop dublin_coreChristensen dunlop dublin_core
Christensen dunlop dublin_core
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
American Art Collaborative Linked Open Data presentation to "The Networked Cu...
American Art Collaborative Linked Open Data presentation to "The Networked Cu...American Art Collaborative Linked Open Data presentation to "The Networked Cu...
American Art Collaborative Linked Open Data presentation to "The Networked Cu...
 
FRBR and TMS: Applying a Conceptual Organizational Model for Cataloguing Pho...
FRBR and TMS:Applying a Conceptual Organizational Model for Cataloguing Pho...FRBR and TMS:Applying a Conceptual Organizational Model for Cataloguing Pho...
FRBR and TMS: Applying a Conceptual Organizational Model for Cataloguing Pho...
 
Balboa Park Commons: Collaborative Digitization for a Public Resource
Balboa Park Commons: Collaborative Digitization for a Public ResourceBalboa Park Commons: Collaborative Digitization for a Public Resource
Balboa Park Commons: Collaborative Digitization for a Public Resource
 
digital Preservation
digital Preservationdigital Preservation
digital Preservation
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology

  • 1. Pen to Pixel: Bringing Appropriate Technologies to Digital Manuscript Philology Michael B. Toth R. B. Toth Associates rbtoth.com http://www.thedigitalwalters.org/ On behalf of the Walters Art Museum Digitization Team, especially: Lynley Herbert, Ariel Tabritha, Diane Bockrath, Kimber Wiegand, Doug Emery Supported by the US National Endowment for the Humanities
  • 2. Walters Art Museum W.562, 2b Koran 9th century AH / 15th CE Walters Art Museum, Baltimore, Maryland Digital Imaging System
  • 3. St. Catherine’s Monastery, Sinai Spectral Imaging System
  • 4. US Library of Congress Spectral Imaging System
  • 6. Applied Science & Technology
  • 8. Manuscript Studies 20th Century and prior
  • 9. Manuscript Studies 21st Century
  • 12. Digital Manuscript Challenges “…an ultimate challenge to creators and users of digital tools wishing to produce useful and reliable digital counter- parts to these medieval sources of knowledge and testimonies of intellectual creativity.” • Complex, Changing Technical Climate • Range of Digital Products & Formats • Need for Integrity of Entire Data Set • Demand for Continual & Faster Access • User Repurposing of Content • Restrictions on Access and Use
  • 13. Simplicity of Data 1. Access to data • By People • By Machines 2. Licensing • Global Storage & Access
  • 15. The Digital Walters http://www.thedigitalwalters.org/
  • 16. Islamic Manuscripts of the Walters Art Museum: A Digital Resource (2008 to 2011)
  • 17.
  • 18. Parchment to Pixel: Creating a Digital Resource of Medieval Manuscripts (2010 to 2012)
  • 19.
  • 20. The Digital Walters Over 10 Terabytes of Data ng! wiing! d grro w nd g o .. .. .. a n a Islamic Parchment Total to Pixel No. of 172 107 279 Manuscripts No. of TEI 170 37 207 Descriptions Distinct Images 46,857 34,084 80,941 Image Files 187,266 134,698 321,964 Data Size 5.99 TB 4.09 TB 10.08 TB
  • 21. Data & Metadata • Long-term data set viability beyond the lifetime of current technologies – Adherence to existing broadly accepted standards – Simple, flat metadata records • Integration of metadata with images, supporting data and scholarly products
  • 22. Cataloging & Metadata • Metadata Integrated with Digital Object – Adherence to broadly accepted standards – Simple, flat metadata records • Persistent Identifiers • Accepted Standards – Standardized Vocabularies – Metadata Schema – xml to support conversion to other formats (e.g. MARC, MODS, EAD) • Documentation & Preserve Standards
  • 23. Data Integrity • Image • XML Metadata • TEI Catalog • License
  • 24. Standardiz e • Cataloging • Metadata • File Format • Imaging and Color • Resolution or Fidelity • Vocabulary and Geographic Names • Foreign Language and English • Intellectual Property • Storage • Quality and Quality Control • Others
  • 25. Preservation & Access Owner of Archimedes Palimpsest: • Preserve data in “flat files” – Do not tailor data for Web interfaces • Host data on “spinning disks” – Did not want digital product to end up on media that could become obsolete, with limited access • Make broadly available on Internet – Do not place restrictions on use
  • 26. Data Layout Access ReadMe Data Walters Manuscripts Technical ReadMe Supplemental Access Other Books
  • 27. Digital Walters File Structure
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33. Cataloging Information • Manuscript level: all information that applies the manuscript as a whole, including an abstract, physical dimensions and features of the manuscript, like size, extent, collation, and binding. • Manuscript item level: all information that applies to the intellectual divisions of the book, including the titles of works, rubrics, incipits, colophons, layout information about the written surface. • Manuscript piece level: all information for the items imaged (i.e., binding pieces, flyleaves, and folios), including item name, folio number, and, for illuminated pieces, detailed descriptions of the art work.
  • 34. Dublin Core Metadata Initiative Element Set
  • 35. Manuscript DCMI Elements • Identifier: the shelf mark for manuscripts (e.g., W.582), and the image serial number for images (e.g., W582_000001) • Creator: always the Walters Art Museum • Contributor: one entry for each project participant responsible for the creation of the manuscript’s data set • Date: the date of web page or image creation • Title: the title of the manuscript (e.g, “Walters Ms. W.579, Prayer”) • Description: a description of the manuscript or image • Source: source of the object used to create the image or image collection • Type: Image for individual images; Collection for all images of a manuscript • Format: image/tiff for images, text/html for a manuscript web page • Subject: keywords describing the manuscript or imaged folio • Rights: license and usage terms
  • 36. License and use: UPDATED! 6 February 2013 All License and use:images and descriptions provided here are licensed for use under the Walters manuscript UPDATED! 6 February 2013 Creative Commons Attribution-Share Alike 3.0 Unported License are licensed for use under the All Walters manuscript images and descriptions provided here and the Creative Commons Attribution-Share Alike 3.0 Unported License and the GNU Free Documentation License. You are Free to download andLicense. images and descriptions on this website under the licenses GNU free Documentation use the named are freeYou do not need to apply to the Walters prior to using the images. We ask only that You above. to download and use the images and descriptions on this website under the licenses you cite the source of the not needas the Walters Art Museum. to using the images. We ask only tha named above. You do images to apply to the Walters prior Additionally, we request that images of any work created using these materials be sent to the you cite the source of the a copy as the Walters Art Museum. Curator of Manuscripts andthat a copy ofat the Walters Art Museum, 600 N. Charles Street, the Additionally, we request Rare Books any work created using these materials be sent to Baltimore, of Manuscripts and Rare Books at the Walters Art Museum, 600 N. Charles Street, Curator MD 21201, mss-curator@thewalters.org. Note these terms 21201, mss-curator@thewalters.org. Baltimore, MD mark a change from our previous license, which placed a noncommercial restriction on the use of these materials. The previous license, which placed a noncommercial this Note these terms mark a change from our noncommercial restriction no longer applies, and license supersedes use previously advertised license, and replaces that foundlonger applies, and thi restriction on the the of these materials. The noncommercial restriction no in many of the license supersedes the previously advertised license, and replaces that found in many of the archival TIFF image headers. This change follows theheaders. Art Museum’s licensing policy. More information on the Walters’ archival TIFF image Walters This change follows the Walters Art Museum’s licensing policy. More information on the Walters’ intellectual property policy can be found on the Walters website: http://art.thewalters.org/license/. intellectual property policy can be found on the Walters website: http://art.thewalters.org/license/.
  • 37. Metadata xml Information • /manuscript: top-level container of metadata for a manuscript’s images • /manuscript/image_object: description of the manuscript, primarily Dublin Core metadata, with the number of images captured in the imageCount element • /manuscript/images: container for the manuscript’s image data • /manuscript/images/image: information about a single capture and its derivatives, including: – /manuscript/images/image/index: the order of the image in the set, beginning with 0 – /manuscript/images/image/image_subject: the folio number or name of the piece imaged • /manuscript/images/image/capture: detailed information about the image’s capture extracted from the imaging software database • /manuscript/images/image/masterDerivation: description of how the archival TIFF image was generated from the camera raw file, including cropping and color correction information • /manuscript/images/image/jhoveData: XML output of the JHOVE utility run on the archival TIFF file • /manuscript/images/image/derivative: three elements containing cropping and scaling information needed to generate the 300 PPI, SAP, and thumbnail files from the archival TIFF
  • 38. xml Model /manuscript /manuscript /image_object /image_object // manuscript/i manuscript/i mages mages /image /image /image /image /capture /capture /image /image /capture /capture /image /image /capture /capture /capture /capture
  • 40. Standard Workflows for Data Management • Transfer & archive digital data for research and analysis by the curatorial, scholarly, preservation and imaging communities • Clear access procedures − Ensuring data integrity for digital storage repositories, − Preventing introduction of mislabeled and incorrect metadata
  • 41. Quality Control • Data Quality – Automate data handling to avoid error – Audit trail for manual data manipulation • Quality Management – Implement processes for quality review – Verification and Validation • Documentation – Define metrics & quality goals
  • 42. Data Management System • Internal Digital Asset Management System – Internal Server • Image Files • Catalog Data • Access Infrastructure • Security • Backup – Internet Systems Consortium
  • 43. IDR Access Model Johns Hopkins Metadata Application Metadata Metadata Agent Agent Metadata (METS) Metadata (METS) (METS) (METS) Preservation Metadata: Event Implementation Request Event Strategies Event (PREMIS) Digital Digital Digital Representation Digital Representation Representation e.g. TIFF Representation e.g. TIFF e.g. TIFF Image e.g. TIFF Image Image Image Dublin Core TEI Dublin Core TEI Metadata Metadata Initiative Initiative (DCMI) (DCMI)
  • 44. Preservation of the Data Preservation Heresy: Preservation Heresy: The Digital information is closer to the original The Digital information is closer to the original than the Artifact itself than the Artifact itself < “I don’t use the parchment. The parchment is gone! As far as the “I don’t use the parchment. The parchment is gone! As far as the scholars are concerned, there is no parchment. You only work from scholars are concerned, there is no parchment. You only work from digital images on the laptop – that’s the only thing that matters for the digital images on the laptop – that’s the only thing that matters for the reading.” – Dr. Reviel Netz, 14 Jan WYPR reading.” – Dr. Reviel Netz, 14 Jan WYPR
  • 45. What Will Happen to the Data? “There’s a big technical issue that has me worried. “There’s a big technical issue that has me worried. The information on the Net is not all simple text. It’s The information on the Net is not all simple text. It’s structured, whether it’s Microsoft Word documents or structured, whether it’s Microsoft Word documents or PDFs. That means the information is only really PDFs. That means the information is only really accessible if you understand how to interpret the bits. accessible if you understand how to interpret the bits. What happens when files are there and we don’t What happens when files are there and we don’t know how to interpret them anymore? know how to interpret them anymore? “If you have a CD but the form isn’t known anymore. II “If you have a CD but the form isn’t known anymore. have 5 1/4-in. diskettes, but nothing to read them. have 5 1/4-in. diskettes, but nothing to read them. Even 3 1/2-in. diskette readers are becoming hard to Even 3 1/2-in. diskette readers are becoming hard to come by. The physical source media change. come by. The physical source media change. We may lose the ability to read them.” We may lose the ability to read them.” Vint Cerf, Vint Cerf, Google Internet Evangelist, recipient of US Presidential Medal of Google Internet Evangelist, recipient of US Presidential Medal of Freedom, and basic architecture of the Internet. Freedom, and basic architecture of the Internet. July 30, 2007 (Computerworld) July 30, 2007 (Computerworld)
  • 46. Digital Preservation Impermanence of Digitized Data • Dynamic technology, media and formats • Rapid obsolescence • Regular reformatting required • Ensure utility of data • Broad distribution to service providers • Standardized formats & encoding
  • 47. License All artworks in the photographs are in public domain due to age. The photographs of two- dimensional objects are also in the public domain. Photographs of three-dimensional objects and all descriptions have been released under the Creative Commons Attribution-Share Alike 3.0 Unported License and the GNU Free Documentation License. You are free to download and use the images and descriptions on this website under the licenses named above, but if you desire digital images at a higher resolution, for scholarly or commercial publication, please contact our photo services department.
  • 48. Trusted Digital Repository • Compliance with the Reference Model for an Open Archival Information System (OAIS) • Administrative responsibility • Organizational viability • Financial sustainability • Technological and procedural suitability • System security • Procedural accountability
  • 49. Future Opportunities Michael B. Toth R. B. Toth Associates rbtoth.com

Notas do Editor

  1. 1. Title Slide: Meeting the Challenge: Digitizing Islamic Manuscripts at the Walters
  2. Data - all core data: images, transcriptions, metdaata _ checkusm Documents - internal and external documentation ResearchContirib - importtant data that is not integrate with core data set: conservation information, speical or experimental images Supplemental -- Source files for other core data files: folio-by-folio transcriptions are derived from work length transcriptions: Floating Bodies, Method