Oboyski ecn2013

Notes from Nature
Citizen Science data transcription

Peter Oboyski, Jun Ying Lim, Joyce Gross,
Chris Snyder*, Arfon Smith*, Joanie Ball,
Kip Will, Rosemary Gillespie
Essig Museum of Entomology
* Zooniverse Citizen Science Alliance

How does it work?
•
•
•
•
•
•
•
•

Introduction to CalBug
What is Zooniverse?
What do we provide?
What happens online?
What do we get back?
Technical issues
Maintaining interest
How can you get involved?

What is CalBug?
NSF - ADBC grant
Collaboration among the eight major
entomology collections in California
Digitize 1.2 million specimens
Essig Museum of Entomology
California Academy of Sciences
California State Collection of Arthropods
Bohart Museum, UC Davis
Entomology Research Museum, UC Riverside
San Diego Natural History Museum
Santa Barbara Museum of Natural History
LA County Museum

Stephen Dowlan

CalPhotos
MySQL database

Berkeley Mapper
http://calbug.berkeley.edu

Berkeley Natural History
Museums

• In development
– Integrating point data (specimen records) with
Habitat, Range maps, Elevation, Climate, etc.
– Historical recreation of the environment
– Predict potential impacts of environmental change
– Facilitate land use/management decisions

Digitization workflow
(Optional)
Sort by locality,
date, sex, etc.

Error checking
Manually enter data
into MySQL database

Remove labels, add
unique identifier

Geographic
referencing
Online crowd-sourcing
of manual data entry

Take digital image,
name and save file

Replace labels,
return to collection

Handling & Imaging

Aggregate data in
online cache
Optical Character
Recognition (OCR) &
Automated data parsing

Data Capture

Temporospatial
analyses

Data Manipulation

Why Image Labels?
• Magnify difficult to read labels
• Verbatim archive of label data
– Essential for proofing data
– Useful for taxonomists interested in label data

• Data capture can be done remotely

Digital camera tethered to computer
Average 50-55 images per hour
Including imaging, file renaming, and upload

Filename = EMEC218958 Paracotalpa ursina.jpg

Slide Scanning
average 150 slides per hour
including scan, file renaming, and upload

400 DPI
Seems to
provide high
enough
resolution for
difficult to read
labels while
keeping file
size relatively
small

But not high resolution enough for taxonomic work

Using Citizen Scientist
to transcribe label data

http://www.notesfromnature.org/

Launched April 22, 2013

Images in  Transcriptions out
• We supply jpeg images
– 400 DPI (300 DPI good)
– Deposited as zip file
– Stored in Amazon Cloud

• In development
– Automated service to
upload images to A.C.
– Be able to prioritize
image set

• Zooniverse provides
– MondoDB data dump
– 1 record = 1 transcription
– 4 transcriptions / image

• In development
– Automated daily dump

Reconciling transcriptions
• Drop down lists (Country, State, County, Date)
are compared for exact match
– Occasionally missing, sometimes wrong
– Majority rule

• Free-form text fields (Locality, Collectors)
are much more problematic
– Transcribers asked to record label data verbatim
– Puctuation, capitalization, spacing between words
– Misspelling, expanding abbreviations, interpretations

Reconciling transcriptions
• Developing scripts in R to reconcile free-form text

• Text matching for maximum correspondence among
multiple transcriptions (cf. DNA alignment methods)
• Final result = 1 transcription in our database
with links to the 4 original transcriptions
marked as Citizen Science transcribed record
• Vetting by CalBug personnel still necessary, but we can
prioritize based on record-matching confidence scores

Generating & Maintaining Interest
Number of Notes from Nature transcriptions for CalBug

• Popular media, social media, and press releases
– Only so many occasions for a press release

• Campaigns
– Highlight particular taxa, habitats, geographic regions

• Education
– High quality, high resolution photo of species transcribed
– Create links to other services to learn more about species

• Competitions
– Prizes are worth more than badges
– However, need to watch for bad data in pursuit of prize

How can you get involved?
• Right now you cannot
• iDigBio is interested in getting involved
• iDigBio hosting a hackathon in December
• Begin building up collections of images

Thank you
And a HUGE thank you to the
CalBug Army
who image our specimens
Chris Amy, Maritess Aristorenas, Jazmin Calderon, Alex Carolina, Sonia Castillo, Matthew Chan, Sabina Cook, Alex Darwish, John Davie, Jesson Go, Nick
Grady-Grote, Ginger Haight, Laura Hayes, Dennis Ho, Aubrey Huey, Leah Humphreys, Veronica Hurd, Hanna Huynh, Eseosa Igbinedion, Ilona Istenes, Emma
Kohlsmith, Asia Kwan, Tiffany Kyo, Jerry Lee, Ken Lee, Christina Lew, Maggie Lewis, Alex Lim, Derick Matano, Christian Munevar, Frank Ngo, Kent Nguyen,
Minh Nguyen, Riley O'Brien, Marielle Pinheiro, Rammonhan Reddy, Jessica Rothery, Stacey Rutherford, Anna Szendrenyi, Anni Sheh, Hannah Shin, Erika So,
Mee Thao, Cindy Truong, Darleen Tu, Skyler Valle, Daug Vaughn, Hayden Wong, Yiu Kei Wong, Keane Yang, Kevin Yao, Frances Zhang

Oboyski ecn2013

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (13)

Semelhante a Oboyski ecn2013

Semelhante a Oboyski ecn2013 (20)

Mais de ECNOfficer

Mais de ECNOfficer (20)

Último

Último (20)

Oboyski ecn2013

Notas do Editor