SlideShare uma empresa Scribd logo
1 de 50
Baixar para ler offline
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
1
Distributed Repositories of Medieval Calendars
and Crowd-Sourcing of Transcription
Rob Sanderson
azaroth42@gmail.com
azaroth@stanford.edu
t: @azaroth42
Stanford University
Ben Albritton, Stanford University
Doug Emery, University of Pennsylvania
Will Noel, University of Pennsylvania
Dot Porter, University of Pennsylvania
http://iiif.io/
This research was primarily funded by the Andrew W. Mellon Foundation
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
2
Image Repositories
•  Increase in digitization
•  Particularly precious,
fragile, beautiful objects
•  Medieval Manuscripts
•  Digitized images online
•  Increasingly Open
•  At high resolution
•  Easy to capture an image
•  Very hard to capture the text
http://gallica.bnf.fr/ark:/12148/btv1b8449691v/
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
3
Calendars
•  Ubiquitous in liturgical books
•  e.g. Books of Hours
•  Structured and often tabular:
Date, Day, Saint / Event
•  Content varies slightly
•  Variation details give us
information about the
provenance of the object
•  Much easier to transcribe
•  Good pilot project!
http://www.e-codices.unifr.ch/en/bge/lat0033
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
4
Collaborative Crowd Sourcing?
•  Meeting at U. Penn including
content providers and
scholars
•  Plan:
•  Collect transcriptions
together
•  Analyze similarities
between manuscripts for
patterns of provenance
•  Manuscripts and images
distributed: need a community
to collect sufficient data
http://brbl-dl.library.yale.edu/vufind/Record/3446275
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
5
Micro Repository Rant: TEI
•  Most transcribing done in TEI
•  Terrible for this use case:
•  Single XML file
•  Single author
•  Single location
•  Hard to link to images
•  Tries to describe too much
•  Impossible to use once
created
•  Creating TEI is good for:
http://www.thedigitalwalters.org/Data/WaltersManuscripts/html/W41/
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
6
Micro Repository Rant: TEI
•  Most transcribing done in TEI
•  Terrible for this use case:
•  Single XML file
•  Single author
•  Single location
•  Hard to link to images
•  Tries to describe too much
•  Impossible to use once
created
•  Creating TEI is good for:
•  The academic exercise of
creating TEI
http://www.thedigitalwalters.org/Data/WaltersManuscripts/html/W41/
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
7
Requirements
•  Distributed image content
•  Consistent, rich API
•  Selection of regions
•  Base, not displayed size
•  Alignment of text with region
•  Distributed creation
•  Distributed curation
•  Multiple texts per region
•  Styling of the text
•  Some semantics
http://oculus-dev.lib.harvard.edu/manifests/view/drs:5981093
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
8
1. Images: BNF next to Yale
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
9
Open Technology: IIIF Image API
Base URL: {scheme}://{host}{/prefix}/{identifier}!
Image Resource:
{base}/{region}/{size}/{rotation}/{quality}.{format}!
!
http://iiif.io/api/image/1.1/
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
10
(Part of the) IIIF Community
•  ARTstor
•  Bibliothèque Nationale de
France
•  Bodleian Libraries, Oxford
University
•  British Library
•  C2MRF
•  Cambridge University
•  Cornell University
•  DPLA
•  Europeana
•  e-codices
•  Harvard University
•  Johns Hopkins University
•  National Library of Denmark
•  National Library of Poland
•  National Library of New Zealand
•  National Library of Norway
•  National Library of Wales
•  Princeton University
•  Stanford University
•  Wellcome Trust
•  UK National Archives
•  Yale University
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
11
2. Crowdsourced Box Drawing
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
12
2. Crowdsourced Box Drawing
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
13
2. Crowdsourced Box Drawing
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
14
2. Crowdsourced Box Drawing
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
15
2. Crowdsourced Box Drawing
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
16
2. Crowdsourced Box Drawing
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
17
2. Crowdsourced Box Drawing
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
18
2. Crowdsourced Box Drawing
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
19
2. Crowdsourced Box Drawing
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
20
Open Technologies
•  Mirador
•  IIIF Community developed viewer
•  Stanford, Harvard, Yale, [LANL]
•  Zooming via Open SeaDragon
•  Princeton, and OSD committers
•  JCrop
•  JQuery plugin for drawing little boxes
•  MongoDB
•  Store information via REST interface
•  W3C Media Fragment image segments
•  Trivially converted to IIIF Image API requests
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
21
Open Technologies
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
22
Open Technologies
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
23
Open Technologies
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
24
Open Technologies
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
25
Open Technologies
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
26
Open Technologies
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
27
Open Technology
•  Line/Column inspiration from TPEN (IIIF compliant)
•  Transcription tool developed at St. Louis
•  http://t-pen.org/TPEN/
•  Line detection flakey, no internal columns
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
28
Open Technologies
•  Inspiration from TPEN (IIIF compliant)
•  Transcription tool developed at St. Louis
•  http://t-pen.org/TPEN/
•  Line detection flakey, no internal columns
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
29
Open Technologies
•  Inspiration from TPEN (IIIF compliant)
•  Transcription tool developed at St. Louis
•  http://t-pen.org/TPEN/
•  Line detection flakey, no internal columns
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
30
Boring (but Open) Metadata
•  Metadata collection to drive the analysis
•  Stored along with the segments
•  Defaults are normally correct
•  Custom extension, not intended for general purpose use
•  Convenient to do inline
•  Other metadata could be added
•  Could be done in a different workflow
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
31
Metadata
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
32
Metadata
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
33
Metadata
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
34
Metadata
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
35
Metadata
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
36
...
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
37
Metadata
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
38
Open Technology: IIIF Presentation API
Text/Image Linking is a subset of a larger challenge:
•  Non-Text / Image Linking
•  Dynamic Images
•  No Image to link to
•  Multiple Images
•  Parts of Images
•  Parts of larger texts
•  Distributed images, texts and links
Need an indirection layer:
•  Solution: align text and image with an abstract Canvas
http://iiif.io/api/presentation/1.0/
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
39
Open Technology: IIIF Presentation API
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
40
Open Technology: IIIF Presentation API
http://iiif.io/api/presentation/1.0/
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
41
Open Technology: IIIF Presentation API
http://iiif.io/api/presentation/1.0/
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
42
Linked Data People...
If you do not want
to know the score,
look away now!
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
43
Linked Data People...
{ "it's" : "just JSON" }
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
44
Web Developers...
If you do not want
to know the score,
look away now!
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
45
Web Developers...
<_:it's>
<_:all>
<_:Linked_Data>;
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
46
Micro Repository Rant 2: RDF Serialization
“RDF/XML was the Semantic Web’s 3 Mile Island incident”
-- Manu Sporny, http://manu.sporny.org/2012/nuclear-rdf/
Or … RDF – Not in my back yard!
•  Serializing a graph is, admittedly, hard
•  RDF/XML is terrible, and too many others
•  Web currently uses JSON as convenient transfer syntax
•  JSON-LD allows transfer of RDF in syntax that does not require full
RDF stack, just a JSON implementation
•  … as available in every web browser
•  Rob's Conclusion: Require JSON-LD
•  http://json-ld.org/
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
47
JSON-LD Context Magic
{ // Canvas resource!
"@context":"http://iiif.io/api/presentation/2/context.json",!
!
!
@context provides mapping for JSON keys into RDF.
!
"sc":"http://www.shared-canvas.org/ns/",!
"oa":"http://www.w3.org/ns/oa#",!
"service":{!
"@type":"@id", !
"@id":"sioc_svcs:has_service"},!
"height":{!
"@type":"xsd:integer", !
"@id":"exif:height"},!
"sequences":{!
"@type":"@id",!
"@id":"sc:hasSequences",!
"@container":"@list"} !
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
48
Open Technologies: REST
•  Experimental IIIF REST specification
•  http://iiif.io/api/annex/rest/
•  For both Presentation and Image
•  Trivial Python/WSGI handler
•  Processes @context and generates identities
•  Stores in MongoDB (but API is agnostic)
•  Follows IIIF Presentation and Open Annotation
•  http://www.w3.org/community/openannotation/
•  Returns the correct JSON-LD
•  Doesn't fully handle image upload yet
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
49
The Future is Now
•  IIIF Image API 2.0
•  Request for Comment period open!
•  http://iiif.io/api/image/2.0/
•  IIIF Presentation API 2.0
•  Ditto!
•  http://iiif.io/api/presentation/2.0/
Please give us feedback: iiif-discuss@googlegroups.com
•  Ongoing work with U.Penn to make a more robust system
Distributed Repositories and Crowd-Sourcing Transcription
Open Repositories 2014, Helsinki, Finland, 11th of June 2014
50
Thank You
Rob Sanderson
azaroth42@gmail.com
azaroth@stanford.edu
t: @azaroth42
Stanford University
http://iiif.io/
iiif-discuss@googlegroups.com

Mais conteúdo relacionado

Semelhante a Open Repositories 2014: Crowdsourced Transcription via IIIF

Nevalainen & Syvalahti - Knotworking as a means to strengthen information ski...
Nevalainen & Syvalahti - Knotworking as a means to strengthen information ski...Nevalainen & Syvalahti - Knotworking as a means to strengthen information ski...
Nevalainen & Syvalahti - Knotworking as a means to strengthen information ski...
IL Group (CILIP Information Literacy Group)
 
Islandora Webinar: Highlighting CUHK Chinese Digital Collections
Islandora Webinar:  Highlighting CUHK Chinese Digital CollectionsIslandora Webinar:  Highlighting CUHK Chinese Digital Collections
Islandora Webinar: Highlighting CUHK Chinese Digital Collections
Erin Tripp
 
Blowing Up the Book: Design Challenges for Online Cultural Heritage Information
Blowing Up the Book: Design Challenges for Online Cultural Heritage InformationBlowing Up the Book: Design Challenges for Online Cultural Heritage Information
Blowing Up the Book: Design Challenges for Online Cultural Heritage Information
Design for Context
 
Institutional repositories
Institutional repositoriesInstitutional repositories
Institutional repositories
Tor Loney
 

Semelhante a Open Repositories 2014: Crowdsourced Transcription via IIIF (20)

Hiberactive: Pro-Active Archiving of Web References from Scholarly Articles
Hiberactive: Pro-Active Archiving of  Web References from Scholarly Articles Hiberactive: Pro-Active Archiving of  Web References from Scholarly Articles
Hiberactive: Pro-Active Archiving of Web References from Scholarly Articles
 
2014 Census of Open Access Repositories in Germany, Austria and Switzerland
2014 Census of Open Access Repositories in Germany, Austria and Switzerland2014 Census of Open Access Repositories in Germany, Austria and Switzerland
2014 Census of Open Access Repositories in Germany, Austria and Switzerland
 
Library collections and the emerging scholarly record
Library collections and the emerging scholarly recordLibrary collections and the emerging scholarly record
Library collections and the emerging scholarly record
 
Building the Abnormal Hieratic Global Portal
Building the Abnormal Hieratic Global PortalBuilding the Abnormal Hieratic Global Portal
Building the Abnormal Hieratic Global Portal
 
Re-appropriating Wikipedia
Re-appropriating WikipediaRe-appropriating Wikipedia
Re-appropriating Wikipedia
 
Nevalainen & Syvalahti - Knotworking as a means to strengthen information ski...
Nevalainen & Syvalahti - Knotworking as a means to strengthen information ski...Nevalainen & Syvalahti - Knotworking as a means to strengthen information ski...
Nevalainen & Syvalahti - Knotworking as a means to strengthen information ski...
 
Islandora Webinar: Highlighting CUHK Chinese Digital Collections
Islandora Webinar:  Highlighting CUHK Chinese Digital CollectionsIslandora Webinar:  Highlighting CUHK Chinese Digital Collections
Islandora Webinar: Highlighting CUHK Chinese Digital Collections
 
Europeana Libraries: the value of a library domain aggregator
Europeana Libraries: the value of a library domain aggregatorEuropeana Libraries: the value of a library domain aggregator
Europeana Libraries: the value of a library domain aggregator
 
Presentation of DanteSources
Presentation of DanteSourcesPresentation of DanteSources
Presentation of DanteSources
 
Re-Reading the British Memorial Project #de2012
Re-Reading the British Memorial Project #de2012Re-Reading the British Memorial Project #de2012
Re-Reading the British Memorial Project #de2012
 
Europeana Libraries: bringing content to the researcher
Europeana Libraries: bringing content to the researcherEuropeana Libraries: bringing content to the researcher
Europeana Libraries: bringing content to the researcher
 
Collection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environmentCollection Directions - Research collections in the network environment
Collection Directions - Research collections in the network environment
 
Advocating Open Access: Before, during and after HEFCE
Advocating Open Access: Before, during and after HEFCEAdvocating Open Access: Before, during and after HEFCE
Advocating Open Access: Before, during and after HEFCE
 
Blowing Up the Book: Design Challenges for Online Cultural Heritage Information
Blowing Up the Book: Design Challenges for Online Cultural Heritage InformationBlowing Up the Book: Design Challenges for Online Cultural Heritage Information
Blowing Up the Book: Design Challenges for Online Cultural Heritage Information
 
PATHS: Using Pathways for Navigating and Personalised Access to Cultural Heri...
PATHS: Using Pathways for Navigating and Personalised Access to Cultural Heri...PATHS: Using Pathways for Navigating and Personalised Access to Cultural Heri...
PATHS: Using Pathways for Navigating and Personalised Access to Cultural Heri...
 
New file
New fileNew file
New file
 
New file
New fileNew file
New file
 
Institutional repositories
Institutional repositoriesInstitutional repositories
Institutional repositories
 
The PhD Abstracts Collections in FLAX: Academic English with the Open Access ...
The PhD Abstracts Collections in FLAX: Academic English with the Open Access ...The PhD Abstracts Collections in FLAX: Academic English with the Open Access ...
The PhD Abstracts Collections in FLAX: Academic English with the Open Access ...
 
Doing DH in Theological Libraries
Doing DH in Theological LibrariesDoing DH in Theological Libraries
Doing DH in Theological Libraries
 

Mais de Robert Sanderson

Mais de Robert Sanderson (20)

Understanding Linked Art
Understanding Linked ArtUnderstanding Linked Art
Understanding Linked Art
 
LUX - Cross Collections Cultural Heritage at Yale
LUX - Cross Collections Cultural Heritage at YaleLUX - Cross Collections Cultural Heritage at Yale
LUX - Cross Collections Cultural Heritage at Yale
 
Zoom as a Paradigm for Linked Open Usable Data
Zoom as a Paradigm for Linked Open Usable DataZoom as a Paradigm for Linked Open Usable Data
Zoom as a Paradigm for Linked Open Usable Data
 
Provenance and Uncertainty in Linked Art
Provenance and Uncertainty in Linked ArtProvenance and Uncertainty in Linked Art
Provenance and Uncertainty in Linked Art
 
Data is our Product: Thoughts on LOD Sustainability
Data is our Product: Thoughts on LOD SustainabilityData is our Product: Thoughts on LOD Sustainability
Data is our Product: Thoughts on LOD Sustainability
 
A Perspective on Wikidata: Ecosystems, Trust, and Usability
A Perspective on Wikidata: Ecosystems, Trust, and UsabilityA Perspective on Wikidata: Ecosystems, Trust, and Usability
A Perspective on Wikidata: Ecosystems, Trust, and Usability
 
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable Data
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable DataLinked Art: Sustainable Cultural Knowledge through Linked Open Usable Data
Linked Art: Sustainable Cultural Knowledge through Linked Open Usable Data
 
Illusions of Grandeur: Trust and Belief in Cultural Heritage Linked Open Data
Illusions of Grandeur: Trust and Belief in Cultural Heritage Linked Open DataIllusions of Grandeur: Trust and Belief in Cultural Heritage Linked Open Data
Illusions of Grandeur: Trust and Belief in Cultural Heritage Linked Open Data
 
Structural Metadata in RDF (IS575)
Structural Metadata in RDF (IS575)Structural Metadata in RDF (IS575)
Structural Metadata in RDF (IS575)
 
Sanderson CNI 2020 Keynote - Cultural Heritage Research Data Ecosystem
Sanderson CNI 2020 Keynote - Cultural Heritage Research Data EcosystemSanderson CNI 2020 Keynote - Cultural Heritage Research Data Ecosystem
Sanderson CNI 2020 Keynote - Cultural Heritage Research Data Ecosystem
 
Tiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data ModelingTiers of Abstraction and Audience in Cultural Heritage Data Modeling
Tiers of Abstraction and Audience in Cultural Heritage Data Modeling
 
The Importance of being LOUD
The Importance of being LOUDThe Importance of being LOUD
The Importance of being LOUD
 
Introduction to Linked Art Model
Introduction to Linked Art ModelIntroduction to Linked Art Model
Introduction to Linked Art Model
 
Standards and Communities: Connected People, Consistent Data, Usable Applicat...
Standards and Communities: Connected People, Consistent Data, Usable Applicat...Standards and Communities: Connected People, Consistent Data, Usable Applicat...
Standards and Communities: Connected People, Consistent Data, Usable Applicat...
 
Strong Opinions, Weakly Held
Strong Opinions, Weakly HeldStrong Opinions, Weakly Held
Strong Opinions, Weakly Held
 
IIIF Discovery Walkthrough
IIIF Discovery WalkthroughIIIF Discovery Walkthrough
IIIF Discovery Walkthrough
 
Linked Art: An Art Museum Profile for CIDOC-CRM
Linked Art: An Art Museum Profile for CIDOC-CRMLinked Art: An Art Museum Profile for CIDOC-CRM
Linked Art: An Art Museum Profile for CIDOC-CRM
 
Euromed2018 Keynote: Usability over Completeness, Community over Committee
Euromed2018 Keynote: Usability over Completeness, Community over CommitteeEuromed2018 Keynote: Usability over Completeness, Community over Committee
Euromed2018 Keynote: Usability over Completeness, Community over Committee
 
Linked Art - Our Linked Open Usable Data Model
Linked Art - Our Linked Open Usable Data ModelLinked Art - Our Linked Open Usable Data Model
Linked Art - Our Linked Open Usable Data Model
 
EuropeanaTech Keynote: Shout it out LOUD
EuropeanaTech Keynote: Shout it out LOUDEuropeanaTech Keynote: Shout it out LOUD
EuropeanaTech Keynote: Shout it out LOUD
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Open Repositories 2014: Crowdsourced Transcription via IIIF

  • 1. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 1 Distributed Repositories of Medieval Calendars and Crowd-Sourcing of Transcription Rob Sanderson azaroth42@gmail.com azaroth@stanford.edu t: @azaroth42 Stanford University Ben Albritton, Stanford University Doug Emery, University of Pennsylvania Will Noel, University of Pennsylvania Dot Porter, University of Pennsylvania http://iiif.io/ This research was primarily funded by the Andrew W. Mellon Foundation
  • 2. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 2 Image Repositories •  Increase in digitization •  Particularly precious, fragile, beautiful objects •  Medieval Manuscripts •  Digitized images online •  Increasingly Open •  At high resolution •  Easy to capture an image •  Very hard to capture the text http://gallica.bnf.fr/ark:/12148/btv1b8449691v/
  • 3. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 3 Calendars •  Ubiquitous in liturgical books •  e.g. Books of Hours •  Structured and often tabular: Date, Day, Saint / Event •  Content varies slightly •  Variation details give us information about the provenance of the object •  Much easier to transcribe •  Good pilot project! http://www.e-codices.unifr.ch/en/bge/lat0033
  • 4. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 4 Collaborative Crowd Sourcing? •  Meeting at U. Penn including content providers and scholars •  Plan: •  Collect transcriptions together •  Analyze similarities between manuscripts for patterns of provenance •  Manuscripts and images distributed: need a community to collect sufficient data http://brbl-dl.library.yale.edu/vufind/Record/3446275
  • 5. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 5 Micro Repository Rant: TEI •  Most transcribing done in TEI •  Terrible for this use case: •  Single XML file •  Single author •  Single location •  Hard to link to images •  Tries to describe too much •  Impossible to use once created •  Creating TEI is good for: http://www.thedigitalwalters.org/Data/WaltersManuscripts/html/W41/
  • 6. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 6 Micro Repository Rant: TEI •  Most transcribing done in TEI •  Terrible for this use case: •  Single XML file •  Single author •  Single location •  Hard to link to images •  Tries to describe too much •  Impossible to use once created •  Creating TEI is good for: •  The academic exercise of creating TEI http://www.thedigitalwalters.org/Data/WaltersManuscripts/html/W41/
  • 7. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 7 Requirements •  Distributed image content •  Consistent, rich API •  Selection of regions •  Base, not displayed size •  Alignment of text with region •  Distributed creation •  Distributed curation •  Multiple texts per region •  Styling of the text •  Some semantics http://oculus-dev.lib.harvard.edu/manifests/view/drs:5981093
  • 8. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 8 1. Images: BNF next to Yale
  • 9. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 9 Open Technology: IIIF Image API Base URL: {scheme}://{host}{/prefix}/{identifier}! Image Resource: {base}/{region}/{size}/{rotation}/{quality}.{format}! ! http://iiif.io/api/image/1.1/
  • 10. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 10 (Part of the) IIIF Community •  ARTstor •  Bibliothèque Nationale de France •  Bodleian Libraries, Oxford University •  British Library •  C2MRF •  Cambridge University •  Cornell University •  DPLA •  Europeana •  e-codices •  Harvard University •  Johns Hopkins University •  National Library of Denmark •  National Library of Poland •  National Library of New Zealand •  National Library of Norway •  National Library of Wales •  Princeton University •  Stanford University •  Wellcome Trust •  UK National Archives •  Yale University
  • 11. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 11 2. Crowdsourced Box Drawing
  • 12. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 12 2. Crowdsourced Box Drawing
  • 13. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 13 2. Crowdsourced Box Drawing
  • 14. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 14 2. Crowdsourced Box Drawing
  • 15. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 15 2. Crowdsourced Box Drawing
  • 16. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 16 2. Crowdsourced Box Drawing
  • 17. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 17 2. Crowdsourced Box Drawing
  • 18. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 18 2. Crowdsourced Box Drawing
  • 19. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 19 2. Crowdsourced Box Drawing
  • 20. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 20 Open Technologies •  Mirador •  IIIF Community developed viewer •  Stanford, Harvard, Yale, [LANL] •  Zooming via Open SeaDragon •  Princeton, and OSD committers •  JCrop •  JQuery plugin for drawing little boxes •  MongoDB •  Store information via REST interface •  W3C Media Fragment image segments •  Trivially converted to IIIF Image API requests
  • 21. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 21 Open Technologies
  • 22. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 22 Open Technologies
  • 23. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 23 Open Technologies
  • 24. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 24 Open Technologies
  • 25. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 25 Open Technologies
  • 26. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 26 Open Technologies
  • 27. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 27 Open Technology •  Line/Column inspiration from TPEN (IIIF compliant) •  Transcription tool developed at St. Louis •  http://t-pen.org/TPEN/ •  Line detection flakey, no internal columns
  • 28. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 28 Open Technologies •  Inspiration from TPEN (IIIF compliant) •  Transcription tool developed at St. Louis •  http://t-pen.org/TPEN/ •  Line detection flakey, no internal columns
  • 29. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 29 Open Technologies •  Inspiration from TPEN (IIIF compliant) •  Transcription tool developed at St. Louis •  http://t-pen.org/TPEN/ •  Line detection flakey, no internal columns
  • 30. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 30 Boring (but Open) Metadata •  Metadata collection to drive the analysis •  Stored along with the segments •  Defaults are normally correct •  Custom extension, not intended for general purpose use •  Convenient to do inline •  Other metadata could be added •  Could be done in a different workflow
  • 31. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 31 Metadata
  • 32. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 32 Metadata
  • 33. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 33 Metadata
  • 34. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 34 Metadata
  • 35. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 35 Metadata
  • 36. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 36 ...
  • 37. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 37 Metadata
  • 38. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 38 Open Technology: IIIF Presentation API Text/Image Linking is a subset of a larger challenge: •  Non-Text / Image Linking •  Dynamic Images •  No Image to link to •  Multiple Images •  Parts of Images •  Parts of larger texts •  Distributed images, texts and links Need an indirection layer: •  Solution: align text and image with an abstract Canvas http://iiif.io/api/presentation/1.0/
  • 39. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 39 Open Technology: IIIF Presentation API
  • 40. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 40 Open Technology: IIIF Presentation API http://iiif.io/api/presentation/1.0/
  • 41. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 41 Open Technology: IIIF Presentation API http://iiif.io/api/presentation/1.0/
  • 42. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 42 Linked Data People... If you do not want to know the score, look away now!
  • 43. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 43 Linked Data People... { "it's" : "just JSON" }
  • 44. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 44 Web Developers... If you do not want to know the score, look away now!
  • 45. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 45 Web Developers... <_:it's> <_:all> <_:Linked_Data>;
  • 46. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 46 Micro Repository Rant 2: RDF Serialization “RDF/XML was the Semantic Web’s 3 Mile Island incident” -- Manu Sporny, http://manu.sporny.org/2012/nuclear-rdf/ Or … RDF – Not in my back yard! •  Serializing a graph is, admittedly, hard •  RDF/XML is terrible, and too many others •  Web currently uses JSON as convenient transfer syntax •  JSON-LD allows transfer of RDF in syntax that does not require full RDF stack, just a JSON implementation •  … as available in every web browser •  Rob's Conclusion: Require JSON-LD •  http://json-ld.org/
  • 47. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 47 JSON-LD Context Magic { // Canvas resource! "@context":"http://iiif.io/api/presentation/2/context.json",! ! ! @context provides mapping for JSON keys into RDF. ! "sc":"http://www.shared-canvas.org/ns/",! "oa":"http://www.w3.org/ns/oa#",! "service":{! "@type":"@id", ! "@id":"sioc_svcs:has_service"},! "height":{! "@type":"xsd:integer", ! "@id":"exif:height"},! "sequences":{! "@type":"@id",! "@id":"sc:hasSequences",! "@container":"@list"} !
  • 48. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 48 Open Technologies: REST •  Experimental IIIF REST specification •  http://iiif.io/api/annex/rest/ •  For both Presentation and Image •  Trivial Python/WSGI handler •  Processes @context and generates identities •  Stores in MongoDB (but API is agnostic) •  Follows IIIF Presentation and Open Annotation •  http://www.w3.org/community/openannotation/ •  Returns the correct JSON-LD •  Doesn't fully handle image upload yet
  • 49. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 49 The Future is Now •  IIIF Image API 2.0 •  Request for Comment period open! •  http://iiif.io/api/image/2.0/ •  IIIF Presentation API 2.0 •  Ditto! •  http://iiif.io/api/presentation/2.0/ Please give us feedback: iiif-discuss@googlegroups.com •  Ongoing work with U.Penn to make a more robust system
  • 50. Distributed Repositories and Crowd-Sourcing Transcription Open Repositories 2014, Helsinki, Finland, 11th of June 2014 50 Thank You Rob Sanderson azaroth42@gmail.com azaroth@stanford.edu t: @azaroth42 Stanford University http://iiif.io/ iiif-discuss@googlegroups.com