The National Library of Wales manages several crowd-sourced cultural heritage datasets including Cymru-1900, Cynefin, and shipping records. Data is stored in Fedora and indexed in SOLR to provide access. Transcriptions are linked to external data sources and other resources to allow new research connections not previously possible. The library aims to directly connect crowd-sourcing platforms to access solutions using IIIF to facilitate transcription and allow researchers to query linked datasets.
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
OR2016 - Managing Crowd sourced Cultural Heritage Datasets
1. Managing Crowd sourced Cultural
Heritage Datasets
National Library of Wales
Glen Robson – Head of Systems
twitter: @glenrobson
2. Plan
• Background to the National Library of Wales
• Crowd Sourcing projects
– Cymru – 1900 – Wales
– Cynefin
– Shipping Records
– WW1 Book of Remembrance
• Providing storage and access
15. Data
• Fields:
– Owner
– Tennant
– Use – arable/forest etc.
– Size (acre, rood, perches)
– Tithe Value (pounds, shilling, pence)
– Geo-coordinates
• Storing in Fedora
– ALTO
– Open Annotations
• JSON-LD
• RDF/XML
– Indexing in SOLR
– Website in the summer
16. Shipping Registers
• 544 merchant vessels registered at the port of
Aberystwyth
• 1856-1914
• Crew lists – name, position, birth date, reason
for leaving, location
• Transcribed by volunteers
• https://www.llgc.org.uk/blog/?p=5716
17.
18. Data Preservation
• Where do we store this data?
– Catalogue – MARC
– Fedora 3 Repository
• Excel files / RDF
• Data being enhanced
– Currently:
• Triple store (sesame) – preservation?
• https://github.com/LlGC-NLW/shippingrecords
– Fedora 4?
19. Enhancements
• Linking out
– Places:- Birth and Ship arrival
• Volunteer using OpenRefine to group places
• Will try and match with GeoNames
– Ships :-
• Added to wikidata by NLW Wikipedian in Residence:
– https://tools.wmflabs.org/reasonator/?&q=23927955
– https://tools.wmflabs.org/reasonator/?&q=24027483
– Adding images, size, weight, creation, destruction, link to
newspapers
– Dutch Shipping to Newspaper linking:
http://bit.ly/1Talish/
20.
21. Research Potential
• By publishing these datasets as Linked Open Data it allows research
that wasn’t possible when these items were physical or even when
they were standalone digital objects.
• Physical:
– Travel to Aberystwyth - x hours/days
– Transcribe data in the reading room – x months/years
– Process back home
• Standalone Digital Object
– Transcribe data at home – x months/years
– Process at home
• Linked Open Data Annotations
– Process at home results in minutes
• Have to take transcriptions with trust
38. Can we do this at scale?
Cynefin
Maps
1838 to 1947
Newspapers
1804 to 1919
Cymru 1914
1914 to 1918
General
Digitisation
Shipping Records
1856 to 1914
Crime and
Punishment
Database
1730 to 1830
Welsh Bibliography
0 to 1970
39. Summary
• Different methods of crowd sourcing:
– Excel
– Outsourcing – Cynefin and wales1900
– IIIF – Mirador & Simple Annotation Server
– WikiData
• Ideally crowd sourcing platform directly connected to access solution
(there will be corrections)
• Transcribing to linked data gives:
– Connection to external data sources (geonames, wikipedia)
– Connection to other resources (newspapers)
– Allows researchers to query the data
• IIIF gives:
– Easy to setup transcription platform
– Work with other peoples content
Notas do Editor
Due to our remote location we’ve focused on digitising as much as possible so people don’t have to come to Aberystwyth. Once digitised they are made freely aviliable online.
Lots of different content
One of our largest collections is our collection of digitised newspapers which consist of 15million articles and 1.1 million digitised images from 1804 to 1919
First crowd source project involved working with Zooniverse to transcribe the place names on the Ordnance Survey’s six-inch to a mile maps c. 1900
Working with partners to expand to whole of UK
The next project we worked on was a crowd sourcing project on Tithe Maps.
Georeference Tithe maps from the 1800s
Transcribe Apportionments
The platform was developed by Klokan technologies
Project coming to an end soon so get busy!
More a volunteering project. We have a volunteer coordinator who organises projects and one of them was on transcribing shipping records.
No digital images
Take transcriptions on trust.
File_221-1_vtls004662587
3 9 year olds
Name: Richard E. James
Age: 9
Born: Aberystwyth
1879-07-27
Joined: At sea
Position: Cabin boy
Stayed with them for 1 year and left to Antwerp
Discharged N
One issue is that you have to take the transcriptions on trust. The items haven’t been digitised so you can’t check the quality of the transcription. So going forward we have been doing it slightly differently using Mirador
May have already heard of Mirador but it is a tool developed by Stanford and Harford Universities and works with IIIF images.
One thing it provides is an annotation tool for transcribing content.
We’ve developed a annotation server that can be plugged into Mirador to store the transcriptions as linked open data in either a jena or seasme database.
The first project to use it was for a project transcribing the Welsh WW1 book of rememberence containing a list of all Welsh soldiures who gave their lives in WW1. This was in collaboration with the Welsh center for international affairs.
Contains information on Name, Rank, home town and Regiment or Ship serve red.
And because the transcriptions are linked data it is possible to link them to other projects like the NLW wales at war project. This is another crowdsourcing effort working with school children to help them learn about the impacts of WW1. It asks them to add full biographies of soldiers including schools attended, birth dates etc.
Because Mirador is built to use IIIF images it is possible to load different types of content for example this is facebook like tagging system for an image.
And its also possible to import existing OCR and allow users to correct OCR.
Going forward we are looking at running transcription projects with external partners including Aberystywth University on transcribing early student records.
A project to transcribe latin manuscripts. One of these manuscripts is held by the British Library but it will be transcribed using Mirador hosted at the NLW.
And finally a archival collection of WW1 tribunal records.