SlideShare uma empresa Scribd logo
1 de 33
ROBOCATALOGING
Accelerated workflows using OCR and automation




                                                        Joshua Polansky
                                               University of Washington
                                           College of Built Environments
Cataloging Case Studies   April 21, 2012     Visual Resources Collection
University of Washington College of Built Environments
Visual Resources Collection
         Serves the departments of Architecture, Construction Management,
         Landscape Architecture and Urban Design & Planning


Analog collection:
• 130,000 35mm slides accessioned and cataloged since 1950s
• Typewritten records; no digital database or online component until 2002
Visual Resources Collection
Digital components:

MS Access database catalog    MDID2 for faculty / students
The big question:

Automated processes exist for batch
digitizing analog photos.
The big question:

Automated processes exist for batch
digitizing analog photos.




      Is it possible to batch digitize old cataloging data, too?

                              Good cataloging information here,
                              researched and typed years ago.




                              More good data, including source
                              and a unique accession number.
Paper records to the rescue

Binders and binders of accession records   Pristine label photocopies
A closer look at the slide label

                                                                                 Architect
                                                                                 Building name

                                                                                 Location / Year

                                                                                 View
                                                                                 Source




Photocopied label edge that     Collection ID that appears    Accession number
will interfere with OCR later   on every label in this form
The big challenge:

•   Digitize these typewritten pages
•   Sort slide label text into distinct columns in Excel
•   Identify each record with its accession number
•   Do it all with common or affordable tools
Photo: Alvaro Farfán via Flickr. 3392225359
Hardware




           Apple iMac
             •   2010 model
             •   OS 10.6

           Any recent Mac will do (OS 10.4 or higher)




                              Photo: Alvaro Farfán via Flickr. 3392225359
Hardware




           Epson Perfection V500 scanner
            •   With optional Automatic Document
                Feeder for stacks of 30 sheets at a time
            •   Standard transparency unit makes it
                useful for other scanning projects
            •   Retails for less than $300 with ADF




                             Photo: Alvaro Farfán via Flickr. 3392225359
Photo: Zak Moreira via Flickr. 3425393424
Software




           Photo: Zak Moreira via Flickr. 3425393424
Adobe Photoshop CS4
• Resize and realign scanned page into a
  single-column tif with Actions




Adobe Acrobat Pro
• Create a pdf of each tif
• Analyze pdf with optical character recognition
  (OCR) and make pdf text selectable
Microsoft Excel 2008
• Receive text from Acrobat in columns
• After text manipulation and sorting, output
  in a cross-platform format like csv




Apple Automator
Automator Virtual Input
• Execute workflows to control multiple
  applications. Launch, copy, paste,
  manipulate, save, repeat.
• Create Folder Actions for Finder automation
• Virtual Input: Extend the functionality of
  Automator for even more control over
  apps, mouse, keyboard
Automator

•   Comes standard with
    Mac OS X 10.4+
•   Allows scripting and
    workflow creation via
    GUI
•   Can perform operations
    within an application or
    across multiple
    applications
Document scanning: Automator, Folder Actions, Photoshop
[video here in original presentataion]
Text processing: Automator + Automator Virtual Input, Folder Actions, Acrobat, Excel
[video here in original presentataion]
Processed output in Excel
Sometimes it looks good...
Sometimes it looks good...




Sometimes it doesn’t.
Final result after text sorting and cleanup
Goal
• Produce nearly perfect metadata,
  clean enough to import into
  existing database
Goal                                 Actual outcome
• Produce nearly perfect metadata,   • Produced pretty good metadata
  clean enough to import into        • Spent lots of time on data cleanup
  existing database                    to get there
Goal
• Use tools on hand; any new tools
  should be cheap or useful for
  other projects
Goal                                 Actual outcome
• Use tools on hand; any new tools   • Used standard software, plus one
  should be cheap or useful for        new application ($25)
  other projects                     • iMac is a student workstation
                                     • Epson scanner is in use for print
                                       and film scanning plus pdf creation
Goal
• Have 75,000 new records ready
  to pair with images and publish
  to MDID
Goal                                Actual outcome
• Have 75,000 new records ready     • Got 75,000 records!
  to pair with images and publish   • Created a searchable shelf list and
  to MDID                             archival finding aid
                                    • With further data cleanup, the
                                      original goal of MDID use can be
                                      achieved
Photo: JF Sebastian via Flickr. 412874324
• Every Mac comes with Automator
  and it is easy to learn
• You probably have OCR tools on
  your computer right now
• Experimenting can produce great
  results




                 Photo: JF Sebastian via Flickr. 412874324
• Every Mac comes with Automator
                                             and it is easy to learn
                                           • You probably have OCR tools on
                                             your computer right now
                                           • Experimenting can produce great
                                             results




Photo credits                                     Thank you
• Software icons and screenshots by Adobe, Apple, Rainer Metzger
  Microsoft and Singed Labcoat                     University of Washington
• Kraftwerk images by Flickr users Zak Moreira,
  Alvaro Farfán and JF Sebastian
• Other photo and video by UW CBE VRC

                                                                 Photo: JF Sebastian via Flickr. 412874324

Mais conteúdo relacionado

Semelhante a VRA 2012, Cataloging Case Studies, ROBOCATALOGING

Cool Tools for Technical Writers
Cool Tools for Technical WritersCool Tools for Technical Writers
Cool Tools for Technical WritersJeff Haas
 
Get your Project back in Shape!
Get your Project back in Shape!Get your Project back in Shape!
Get your Project back in Shape!Joachim Tuchel
 
Image Processing and Computer Vision in iPhone and iPad
Image Processing and Computer Vision in iPhone and iPadImage Processing and Computer Vision in iPhone and iPad
Image Processing and Computer Vision in iPhone and iPadOge Marques
 
Image Processing and Computer Vision in iOS
Image Processing and Computer Vision in iOSImage Processing and Computer Vision in iOS
Image Processing and Computer Vision in iOSOge Marques
 
Résumé - Mahlon E. Lo Vuolo
Résumé -  Mahlon E. Lo VuoloRésumé -  Mahlon E. Lo Vuolo
Résumé - Mahlon E. Lo VuoloEdLoVuolo
 
PLAT-20 Building Alfresco Prototypes in a Few Hours
PLAT-20 Building Alfresco Prototypes in a Few HoursPLAT-20 Building Alfresco Prototypes in a Few Hours
PLAT-20 Building Alfresco Prototypes in a Few HoursAlfresco Software
 
D7 10 modules-in-20mins v2 copy
D7 10 modules-in-20mins v2 copyD7 10 modules-in-20mins v2 copy
D7 10 modules-in-20mins v2 copyAcquia
 
Developing Windows Phone Apps with the Nokia Imaging SDK
Developing Windows Phone Apps with the Nokia Imaging SDKDeveloping Windows Phone Apps with the Nokia Imaging SDK
Developing Windows Phone Apps with the Nokia Imaging SDKNick Landry
 
Development Processes and Tooling
Development Processes and ToolingDevelopment Processes and Tooling
Development Processes and ToolingBora Bilgin
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkDatabricks
 
DevOps for AI Apps
DevOps for AI AppsDevOps for AI Apps
DevOps for AI AppsRichin Jain
 
Ephesoft SnapDoc SDK 4.0
Ephesoft SnapDoc SDK 4.0Ephesoft SnapDoc SDK 4.0
Ephesoft SnapDoc SDK 4.0Stephen Boals
 
Distributing Information Online
Distributing Information OnlineDistributing Information Online
Distributing Information OnlineLethbridge College
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with EaseBuild, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with EaseDatabricks
 
SE2016 Java Alex Theedom "Java EE revisits design patterns"
SE2016 Java Alex Theedom "Java EE revisits design patterns"SE2016 Java Alex Theedom "Java EE revisits design patterns"
SE2016 Java Alex Theedom "Java EE revisits design patterns"Inhacking
 
CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!
CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!
CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!Joseph Labrecque
 
Chapter 7
Chapter 7 Chapter 7
Chapter 7 carnillr
 
Appcelerator Titanium Intro
Appcelerator Titanium IntroAppcelerator Titanium Intro
Appcelerator Titanium IntroNicholas Jansma
 

Semelhante a VRA 2012, Cataloging Case Studies, ROBOCATALOGING (20)

Cool Tools for Technical Writers
Cool Tools for Technical WritersCool Tools for Technical Writers
Cool Tools for Technical Writers
 
Get your Project back in Shape!
Get your Project back in Shape!Get your Project back in Shape!
Get your Project back in Shape!
 
Image Processing and Computer Vision in iPhone and iPad
Image Processing and Computer Vision in iPhone and iPadImage Processing and Computer Vision in iPhone and iPad
Image Processing and Computer Vision in iPhone and iPad
 
Image Processing and Computer Vision in iOS
Image Processing and Computer Vision in iOSImage Processing and Computer Vision in iOS
Image Processing and Computer Vision in iOS
 
Résumé - Mahlon E. Lo Vuolo
Résumé -  Mahlon E. Lo VuoloRésumé -  Mahlon E. Lo Vuolo
Résumé - Mahlon E. Lo Vuolo
 
PLAT-20 Building Alfresco Prototypes in a Few Hours
PLAT-20 Building Alfresco Prototypes in a Few HoursPLAT-20 Building Alfresco Prototypes in a Few Hours
PLAT-20 Building Alfresco Prototypes in a Few Hours
 
Online File Formats.pptx
Online File Formats.pptxOnline File Formats.pptx
Online File Formats.pptx
 
D7 10 modules-in-20mins v2 copy
D7 10 modules-in-20mins v2 copyD7 10 modules-in-20mins v2 copy
D7 10 modules-in-20mins v2 copy
 
Developing Windows Phone Apps with the Nokia Imaging SDK
Developing Windows Phone Apps with the Nokia Imaging SDKDeveloping Windows Phone Apps with the Nokia Imaging SDK
Developing Windows Phone Apps with the Nokia Imaging SDK
 
Development Processes and Tooling
Development Processes and ToolingDevelopment Processes and Tooling
Development Processes and Tooling
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache SparkBuild, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
 
DevOps for AI Apps
DevOps for AI AppsDevOps for AI Apps
DevOps for AI Apps
 
Ephesoft SnapDoc SDK 4.0
Ephesoft SnapDoc SDK 4.0Ephesoft SnapDoc SDK 4.0
Ephesoft SnapDoc SDK 4.0
 
Distributing Information Online
Distributing Information OnlineDistributing Information Online
Distributing Information Online
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with EaseBuild, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with Ease
 
SE2016 Java Alex Theedom "Java EE revisits design patterns"
SE2016 Java Alex Theedom "Java EE revisits design patterns"SE2016 Java Alex Theedom "Java EE revisits design patterns"
SE2016 Java Alex Theedom "Java EE revisits design patterns"
 
Alex Theedom Java ee revisits design patterns
Alex Theedom	Java ee revisits design patternsAlex Theedom	Java ee revisits design patterns
Alex Theedom Java ee revisits design patterns
 
CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!
CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!
CrunchBuddy: Server-based Video Transcode for AMS with Adobe AIR!
 
Chapter 7
Chapter 7 Chapter 7
Chapter 7
 
Appcelerator Titanium Intro
Appcelerator Titanium IntroAppcelerator Titanium Intro
Appcelerator Titanium Intro
 

Mais de Visual Resources Association

Comparative Study and Expansion of Metadata Standards for Historic Fashion Co...
Comparative Study and Expansion of Metadata Standards for Historic Fashion Co...Comparative Study and Expansion of Metadata Standards for Historic Fashion Co...
Comparative Study and Expansion of Metadata Standards for Historic Fashion Co...Visual Resources Association
 
The Medieval Kingdom of Sicily Image Database Project: From Concept to Reality
The Medieval Kingdom of Sicily Image Database Project: From Concept to RealityThe Medieval Kingdom of Sicily Image Database Project: From Concept to Reality
The Medieval Kingdom of Sicily Image Database Project: From Concept to RealityVisual Resources Association
 
Interactive Topography with IIIF: Open Access to Photographs from the Ernest ...
Interactive Topography with IIIF: Open Access to Photographs from the Ernest ...Interactive Topography with IIIF: Open Access to Photographs from the Ernest ...
Interactive Topography with IIIF: Open Access to Photographs from the Ernest ...Visual Resources Association
 
Recreating a 19th-Century Spectacle: The 3D Glass Stereo Project
Recreating a 19th-Century Spectacle: The 3D Glass Stereo ProjectRecreating a 19th-Century Spectacle: The 3D Glass Stereo Project
Recreating a 19th-Century Spectacle: The 3D Glass Stereo ProjectVisual Resources Association
 
Cradle of Texas Gay Liberty: An Alternate History of the Alamo City
Cradle of Texas Gay Liberty: An Alternate History of the Alamo CityCradle of Texas Gay Liberty: An Alternate History of the Alamo City
Cradle of Texas Gay Liberty: An Alternate History of the Alamo CityVisual Resources Association
 
Material Order: A Discovery Group, Shared Catalog, and Research Platform for ...
Material Order: A Discovery Group, Shared Catalog, and Research Platform for ...Material Order: A Discovery Group, Shared Catalog, and Research Platform for ...
Material Order: A Discovery Group, Shared Catalog, and Research Platform for ...Visual Resources Association
 
Disinformation and Deepfakes: The Urgent Need for Visual Literacy
Disinformation and Deepfakes: The Urgent Need for Visual LiteracyDisinformation and Deepfakes: The Urgent Need for Visual Literacy
Disinformation and Deepfakes: The Urgent Need for Visual LiteracyVisual Resources Association
 
Pattern and Representation: Critical Cataloging for a New Perspective on Camp...
Pattern and Representation: Critical Cataloging for a New Perspective on Camp...Pattern and Representation: Critical Cataloging for a New Perspective on Camp...
Pattern and Representation: Critical Cataloging for a New Perspective on Camp...Visual Resources Association
 
Stories from the Stop (and Re-Start?): Visual Resources Professionals Face Re...
Stories from the Stop (and Re-Start?): Visual Resources Professionals Face Re...Stories from the Stop (and Re-Start?): Visual Resources Professionals Face Re...
Stories from the Stop (and Re-Start?): Visual Resources Professionals Face Re...Visual Resources Association
 
Supporting Art History Students' Digital Projects at American University
Supporting Art History Students' Digital Projects at American UniversitySupporting Art History Students' Digital Projects at American University
Supporting Art History Students' Digital Projects at American UniversityVisual Resources Association
 
Assessing the use of Qualitative Data Analysis Software (QDAS) by Art Histori...
Assessing the use of Qualitative Data Analysis Software (QDAS) by Art Histori...Assessing the use of Qualitative Data Analysis Software (QDAS) by Art Histori...
Assessing the use of Qualitative Data Analysis Software (QDAS) by Art Histori...Visual Resources Association
 
Describing Art on the Street: The Graffiti Art Community Voice
Describing Art on the Street: The Graffiti Art Community VoiceDescribing Art on the Street: The Graffiti Art Community Voice
Describing Art on the Street: The Graffiti Art Community VoiceVisual Resources Association
 
Photographic Glass Plates and Birthdates: Secrets to Optimizing AI-Generated ...
Photographic Glass Plates and Birthdates: Secrets to Optimizing AI-Generated ...Photographic Glass Plates and Birthdates: Secrets to Optimizing AI-Generated ...
Photographic Glass Plates and Birthdates: Secrets to Optimizing AI-Generated ...Visual Resources Association
 
Accessibility Guidance for Digital Cultural Heritage
Accessibility Guidance for Digital Cultural HeritageAccessibility Guidance for Digital Cultural Heritage
Accessibility Guidance for Digital Cultural HeritageVisual Resources Association
 

Mais de Visual Resources Association (20)

Comparative Study and Expansion of Metadata Standards for Historic Fashion Co...
Comparative Study and Expansion of Metadata Standards for Historic Fashion Co...Comparative Study and Expansion of Metadata Standards for Historic Fashion Co...
Comparative Study and Expansion of Metadata Standards for Historic Fashion Co...
 
Unsettling Collections: Bias in the Visual Canon
Unsettling Collections: Bias in the Visual CanonUnsettling Collections: Bias in the Visual Canon
Unsettling Collections: Bias in the Visual Canon
 
The Medieval Kingdom of Sicily Image Database Project: From Concept to Reality
The Medieval Kingdom of Sicily Image Database Project: From Concept to RealityThe Medieval Kingdom of Sicily Image Database Project: From Concept to Reality
The Medieval Kingdom of Sicily Image Database Project: From Concept to Reality
 
Interactive Topography with IIIF: Open Access to Photographs from the Ernest ...
Interactive Topography with IIIF: Open Access to Photographs from the Ernest ...Interactive Topography with IIIF: Open Access to Photographs from the Ernest ...
Interactive Topography with IIIF: Open Access to Photographs from the Ernest ...
 
Recreating a 19th-Century Spectacle: The 3D Glass Stereo Project
Recreating a 19th-Century Spectacle: The 3D Glass Stereo ProjectRecreating a 19th-Century Spectacle: The 3D Glass Stereo Project
Recreating a 19th-Century Spectacle: The 3D Glass Stereo Project
 
Cradle of Texas Gay Liberty: An Alternate History of the Alamo City
Cradle of Texas Gay Liberty: An Alternate History of the Alamo CityCradle of Texas Gay Liberty: An Alternate History of the Alamo City
Cradle of Texas Gay Liberty: An Alternate History of the Alamo City
 
Material Order: A Discovery Group, Shared Catalog, and Research Platform for ...
Material Order: A Discovery Group, Shared Catalog, and Research Platform for ...Material Order: A Discovery Group, Shared Catalog, and Research Platform for ...
Material Order: A Discovery Group, Shared Catalog, and Research Platform for ...
 
Personal Archiving for Undergraduate Students
Personal Archiving for Undergraduate StudentsPersonal Archiving for Undergraduate Students
Personal Archiving for Undergraduate Students
 
Disinformation and Deepfakes: The Urgent Need for Visual Literacy
Disinformation and Deepfakes: The Urgent Need for Visual LiteracyDisinformation and Deepfakes: The Urgent Need for Visual Literacy
Disinformation and Deepfakes: The Urgent Need for Visual Literacy
 
Jean Charlot: Artist as Archivist
Jean Charlot: Artist as ArchivistJean Charlot: Artist as Archivist
Jean Charlot: Artist as Archivist
 
Pattern and Representation: Critical Cataloging for a New Perspective on Camp...
Pattern and Representation: Critical Cataloging for a New Perspective on Camp...Pattern and Representation: Critical Cataloging for a New Perspective on Camp...
Pattern and Representation: Critical Cataloging for a New Perspective on Camp...
 
Stories from the Stop (and Re-Start?): Visual Resources Professionals Face Re...
Stories from the Stop (and Re-Start?): Visual Resources Professionals Face Re...Stories from the Stop (and Re-Start?): Visual Resources Professionals Face Re...
Stories from the Stop (and Re-Start?): Visual Resources Professionals Face Re...
 
Supporting Art History Students' Digital Projects at American University
Supporting Art History Students' Digital Projects at American UniversitySupporting Art History Students' Digital Projects at American University
Supporting Art History Students' Digital Projects at American University
 
Material Objects and Special Collections
Material Objects and Special CollectionsMaterial Objects and Special Collections
Material Objects and Special Collections
 
Digital Art History
Digital Art HistoryDigital Art History
Digital Art History
 
Assessing the use of Qualitative Data Analysis Software (QDAS) by Art Histori...
Assessing the use of Qualitative Data Analysis Software (QDAS) by Art Histori...Assessing the use of Qualitative Data Analysis Software (QDAS) by Art Histori...
Assessing the use of Qualitative Data Analysis Software (QDAS) by Art Histori...
 
Describing Art on the Street: The Graffiti Art Community Voice
Describing Art on the Street: The Graffiti Art Community VoiceDescribing Art on the Street: The Graffiti Art Community Voice
Describing Art on the Street: The Graffiti Art Community Voice
 
Photographic Glass Plates and Birthdates: Secrets to Optimizing AI-Generated ...
Photographic Glass Plates and Birthdates: Secrets to Optimizing AI-Generated ...Photographic Glass Plates and Birthdates: Secrets to Optimizing AI-Generated ...
Photographic Glass Plates and Birthdates: Secrets to Optimizing AI-Generated ...
 
Crowdsourcing Collection Development
Crowdsourcing Collection DevelopmentCrowdsourcing Collection Development
Crowdsourcing Collection Development
 
Accessibility Guidance for Digital Cultural Heritage
Accessibility Guidance for Digital Cultural HeritageAccessibility Guidance for Digital Cultural Heritage
Accessibility Guidance for Digital Cultural Heritage
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

VRA 2012, Cataloging Case Studies, ROBOCATALOGING

  • 1. ROBOCATALOGING Accelerated workflows using OCR and automation Joshua Polansky University of Washington College of Built Environments Cataloging Case Studies April 21, 2012 Visual Resources Collection
  • 2. University of Washington College of Built Environments Visual Resources Collection Serves the departments of Architecture, Construction Management, Landscape Architecture and Urban Design & Planning Analog collection: • 130,000 35mm slides accessioned and cataloged since 1950s • Typewritten records; no digital database or online component until 2002
  • 3. Visual Resources Collection Digital components: MS Access database catalog MDID2 for faculty / students
  • 4. The big question: Automated processes exist for batch digitizing analog photos.
  • 5. The big question: Automated processes exist for batch digitizing analog photos. Is it possible to batch digitize old cataloging data, too? Good cataloging information here, researched and typed years ago. More good data, including source and a unique accession number.
  • 6. Paper records to the rescue Binders and binders of accession records Pristine label photocopies
  • 7. A closer look at the slide label Architect Building name Location / Year View Source Photocopied label edge that Collection ID that appears Accession number will interfere with OCR later on every label in this form
  • 8. The big challenge: • Digitize these typewritten pages • Sort slide label text into distinct columns in Excel • Identify each record with its accession number • Do it all with common or affordable tools
  • 9. Photo: Alvaro Farfán via Flickr. 3392225359
  • 10. Hardware Apple iMac • 2010 model • OS 10.6 Any recent Mac will do (OS 10.4 or higher) Photo: Alvaro Farfán via Flickr. 3392225359
  • 11. Hardware Epson Perfection V500 scanner • With optional Automatic Document Feeder for stacks of 30 sheets at a time • Standard transparency unit makes it useful for other scanning projects • Retails for less than $300 with ADF Photo: Alvaro Farfán via Flickr. 3392225359
  • 12. Photo: Zak Moreira via Flickr. 3425393424
  • 13. Software Photo: Zak Moreira via Flickr. 3425393424
  • 14. Adobe Photoshop CS4 • Resize and realign scanned page into a single-column tif with Actions Adobe Acrobat Pro • Create a pdf of each tif • Analyze pdf with optical character recognition (OCR) and make pdf text selectable
  • 15.
  • 16. Microsoft Excel 2008 • Receive text from Acrobat in columns • After text manipulation and sorting, output in a cross-platform format like csv Apple Automator Automator Virtual Input • Execute workflows to control multiple applications. Launch, copy, paste, manipulate, save, repeat. • Create Folder Actions for Finder automation • Virtual Input: Extend the functionality of Automator for even more control over apps, mouse, keyboard
  • 17. Automator • Comes standard with Mac OS X 10.4+ • Allows scripting and workflow creation via GUI • Can perform operations within an application or across multiple applications
  • 18. Document scanning: Automator, Folder Actions, Photoshop [video here in original presentataion]
  • 19.
  • 20. Text processing: Automator + Automator Virtual Input, Folder Actions, Acrobat, Excel [video here in original presentataion]
  • 23. Sometimes it looks good... Sometimes it doesn’t.
  • 24. Final result after text sorting and cleanup
  • 25. Goal • Produce nearly perfect metadata, clean enough to import into existing database
  • 26. Goal Actual outcome • Produce nearly perfect metadata, • Produced pretty good metadata clean enough to import into • Spent lots of time on data cleanup existing database to get there
  • 27. Goal • Use tools on hand; any new tools should be cheap or useful for other projects
  • 28. Goal Actual outcome • Use tools on hand; any new tools • Used standard software, plus one should be cheap or useful for new application ($25) other projects • iMac is a student workstation • Epson scanner is in use for print and film scanning plus pdf creation
  • 29. Goal • Have 75,000 new records ready to pair with images and publish to MDID
  • 30. Goal Actual outcome • Have 75,000 new records ready • Got 75,000 records! to pair with images and publish • Created a searchable shelf list and to MDID archival finding aid • With further data cleanup, the original goal of MDID use can be achieved
  • 31. Photo: JF Sebastian via Flickr. 412874324
  • 32. • Every Mac comes with Automator and it is easy to learn • You probably have OCR tools on your computer right now • Experimenting can produce great results Photo: JF Sebastian via Flickr. 412874324
  • 33. • Every Mac comes with Automator and it is easy to learn • You probably have OCR tools on your computer right now • Experimenting can produce great results Photo credits Thank you • Software icons and screenshots by Adobe, Apple, Rainer Metzger Microsoft and Singed Labcoat University of Washington • Kraftwerk images by Flickr users Zak Moreira, Alvaro Farfán and JF Sebastian • Other photo and video by UW CBE VRC Photo: JF Sebastian via Flickr. 412874324