1. MAKING LINKS IN THE BHL
Primary Source Materials as a
Window to a Scientist’s Methods
Constance Rinaldo, Librarian of the Ernst Mayr Library, MCZ, Harvard
TDWG Annual Meeting 2014, Jonkoping, Sweden
2. Connecting Content: Field Notes, Specimens &
Published Literature
• Digitize
• Deposit
• Link
• Repurpose
3. Why Field Notes?
• Archival materials fill in the documentation of the full
research cycle & are primary source material
• Field notes provide unpublished observations,
sketches, weather reports and species lists
• Accessibility & adaptation for today’s tools and
researchers
• We chose William Brewster, an ornithologist who
worked during the late 19th and early 20th centuries
• Test case to connect old and current data: Brewster
species lists & current EOL data
• Connect content from multiple sources to advance
scientific and educational pursuits= open science.
4.
5.
6.
7.
8. Life Cycle Completed
Image digitized for BHL
Observations in notes,
Later digitized for BHL
Original Specimen record
Publication of species description,
digitized for BHL
Full digital specimen record
With links to digitized material
9. Purposeful Gaming
• Digitize horticultural catalogs
• Select tool for transcription of handwritten &
multi column formatted BHL content
• Transcribe field notes & catalogs (each page
twice)
• Crowdsource transcription
• Compare digital outputs
• Extract problem words for game
• Build BHL technical framework for classifying,
comparing & managing multiple OCR outputs
10.
11.
12. Transcription Tool Criteria
• Open source
• Crowdsourcing capability
• User-friendly
• Allow administrative oversight and editing (i.e., reviewing,
correcting, and validating transcriptions)
• Provide transcription file exports that can be efficiently
formatted for use by the game(s)
• Sustainable (tool selected will hopefully be used
permanently for BHL)
• Code easy to install, manage, and troubleshoot
• Technical support
• Multiple transcriptions of a single page
13. Transcription Tools
• FromthePage & Digivol
• Selected 2 tools to fulfill the need for 2
transcriptions of each page
• Built in community of volunteers with Digivol
15. "4058841","Jessica Mitchell","Joseph deVeer","JournalsWilliam00Brew_0013.jpg","Fully
transcribed by Jessica Mitchell. Exported on 21-Oct-2014 from DigiVol
(http://volunteer.ala.org.au)","05-Jun-2014 02:17:15","11-Jun-2014
23:02:51","0","MCZ","1888nMarch 20nRevere Beach, Massachusetts.n Cloudy with
occasional light showers; warm.n To revere Beach with Chadbourne by 9 a.m. train.nLeft
the cars at Point of Pines and first inspected'nthe pines behind the large hotel in hopes of
findingnCrossbills there. There were English Sparrows innabundance and four Tree
Sparrows (S. monticola) butnnothing else save a single Robin. In the bushy thicketsnaround
the outskirts of the grove Song Sparrowsnswarming as usual at this season and,
despitenthe gloomy weather, singing freely. We saw nonenelsewhere along the beach
although they used tonbe numerous during migration time at severalnplaces, especially
Oak Island.n[margin]S.monticola[/margin]n Near the extreme end of the Point we came
onna flock of about 15 Pine Linnets feeding amongnweeds on the side of a dyke
embankment. Firingntwo barrels into these killed
eight.n[margin]Chrysomitrisnpinus[/margin]n Retracing our steps to the station &
crossing thenrailroad we next tried the marshes. There were nonsmall birds there but we
saw a flock of aboutn30 Crows (evidently migrants), about as manynGolden-eye Duck
feeding in the river, and numerousnHerring Gulls.n The rest of the way to Oak Island we
kept alongnthe beach ridge. Pine Linnets are exceedinglynnumerous the entire distance, in
flocks of 5 to 15 birdsneach. We shot nine more specimens. I made onencapital shot at a
single bird passing very swiftlynbefore the strong S. E. wind.n Besides the Linnets we saw a
single Snow Bunting,n& many English Sparrows, the latter feeding on thenwet beach in
flocks. Returned to the city at 12 n.","13"
17. Access to Digitized Texts
• Improved OCR from crowdsourcing & gaming
• Technical infrastructure to manage & compare
multiple text sources
18. Next steps
• Social media campaign: transcription
• Release games/more social media
• Operationalize crowdsourcing of OCR
improvements: data mining possibilities
The story begins….Cal Acad along with MOBOT, HUBot, HUEML, NYBG, AMNH and associated with Smithsonian Fieldbook project
Ernst Mayr Library project: William Brewster, ornithologist, journals& diaries: 1865-1919
Open science resources tools and applications are accelerating the rate at which historical and current biodiversity information can be mobilized, customized and turned into participatory activities. The data can be presented in new formats on the web and mobile devices and is broadly available.
Here we demonstrate some ways in which historical checklists and current knowledge can be melded using tools that support ecological research, management and educational activities. Brewster’s field notes make it possible to track species changes by comparing his checklists from 1892 to current checklists. By linking these varied data sources and tools, the data life cycle can be completed. William Brewster was an ornithologist who worked during the late 19th and early 20th centuries. This poster shows how his field notes, digitized and deposited in the Biodiversity Heritage Library (BHL), can be linked with current data in the Encyclopedia of Life (EOL). This case example demonstrates how open science projects can be used to connect content from multiple sources to advance scientific and educational pursuits.
William Brewster, ornithologist, 1851-1919 He was 15 years old when he made these notes, 14 when he began to jot down his observations.
Specimen digitization as part of CC
Hand Transcribed species lists for March and Nov 1892
Repurposing content and building relevance to now. Build brewster field guide for Cambridge in march and nov in eol (marie/tracy) so that comparisons could be made to current observations in Cambridge
Mention inaturalist tool for current info—still under development
At least for a couple of pages! NEXT step is the Biocaching App (under development): “what’s in my neighborhood” based on Global Biodiversity Information Facility (GBIF) specimen maps with links to field notes. Curated observations can then be shared with GBIF
So what next? Transcriptions! Crowdsourcing! Making it fun.
EOL connections/
Purposeful Gaming & BHL: Clearly we need a better way to get transcriptions done, and added to the BHL (prototype working on it) MOBOT lead, HUEML, Cornell, NYBG
The problem handwriting, multicolumn
93000 pages of seed catalogs to be added by Cornell and NYBG. Also ingesting content from National Agricultural Library (a recently added affiliate to BHL)
Tools investigated in addition to DigiVol and FromThePage
Transcribe Bentham (http://www.transcribe-bentham.da.ulcc.ac.uk/td/Transcribe_Bentham) – eliminated because the code is difficult to install and cannot export structured data
T-PEN (http://t-pen.org/TPEN/) – eliminated because it was the least user-friendly of all tools reviewed – steep learning curve; could not do bulk uploads of images
Transcribr (http://www.archives.gov/citizen-archivist/transcribe/) - eliminated because image import/data export functionality was lacking
Smithsonian Transcription Tool (https://transcription.si.edu/) – wanted to use this tool, but code was not available, and it was problematic to have Smithsonian host our content
Scripto (http://scripto.org/) - eliminated for various technical reasons, e.g. relies on zoom.it which is not well supported and didn’t work when tried. Harvard Library eliminated this tool as it wasn’t the most user-friendly
No tools supported multiple transcriptions of a single page, so it was decided to implement the two top tools to provide two transcriptions per page as required for the game. DigiVol came with an existing community of transcribers. Also, looking at two tools gives us the opportunity to evaluate and compare them as we think about selecting a permanent tool for transcription in the BHL portal.
Field notes page
A transcription: (2000 pagesof Brewster field books digitized twice to render fodder for game.)
Tiltfactor selected to develop game to reconcile different transcriptions: 2 games, one for gamers, one for non-gamers.