This is a brief overview of a historical collection of stomach content cards put together by Elizabeth Manning located at USGS Patuxent Wildlife Research Center. A consortium of volunteers and scientists are busy developing a plan to get these cards scanned in and ultimately databased and available to researchers and others interested in natural history. Contents include 250,000 dissections of the stomachs of birds, mammals, Reptiles, and Amphibians in North America
2. Goals for the Historical Food Habits project
a) Safeguard data (electronic storage)
• Scan all cards to hard drives (pdf image files)
b) Safeguard cards (physical storage)
• After scanning, move cards to safe long-term storage
c) Transform currently inaccessible data into globally accessible usable
information (i.e. searchable online database)
3. Data Fields for each card
Header (general data: easy to read and comprehend):
• Biological data for the collected specimen (genus, species; sex and age in some
cases)
• Card number (accession number)
• Locality (town, state)
• Where killed (habitat)
• Date (month, day, year)
• Hour (time of death)
Food Habits (‘stomach’ contents - specialist data):
• Characterized contents (either percentages or counts of specific plant and animal
material e.g. Gastropoda, C. florida )
• Collector (name and sometimes a collector number)
• Condition of stomach/gullet (how full)
• Percentage of:
- animal matter - vegetable matter - Gravel, rubbish etc.
• Examination date, examiner name
4. Example cards, with entertaining oddities
- Card number (42444) is high for the very early collection date (1876)
- Is this an error? There are other cards with similar dates/numbers
- 24 years elapsed between sample collection and food contents analysis (1910). That’s
a long time to float in formalin!
- Can YOU read Beal’s handwriting?
6. Feasibility trial to evaluate data entry
• Select high profile species (Wood thrush) with representative card
complexity DONE
• Review summary sheets to outline extent of dataset DONE
• Scan cards DONE, pdf format
• Enter header data DONE, via Excel
• Review food habits data entered via Access DONE*
• Share resultant dataset with Consortium colleagues to determine
best:
• online host for ~250,000 pdfs (est. 75 gb) for transcription
• data entry system for generalist header data transcribed from pdfs
• data entry system for specialist food habits data transcribed from pdfs
* Thanks to Haas, Gorman et al.
7. Hylocichla mustulina (Wood Thrush)
Submissions by state
60
50 Total = 180
Gender: Unknown 81
40 Female 38
Male 61
30
20
10
0
PA
WI
Canada
CT
FL
IL
KS
MA
VA
ME
SC
NY
AL
GA
MI
NC
NJ
TX
MS
DC
TN
Draft
9. Citations collected during HFHabits study demonstrate its
perspective : how do these species affect humans?
Birds of CT – Bishop - 1913
1912
10. Header field issues affecting data entry system design
Header (general data):
• Biological data:
• Species name changes (Turdus mustelina became Hylocichla mustelina)
• Gender noted ~ 50% of the time; age rarely (juvenile/immature or adult)
• Card number (accession).
• Never omitted. Number unrelated to species, date, location. (Likely collectors
were handed out-of-sequence stacks of cards whenever they ran out. )
• Locality (town, state)
• Rarely omitted but inconsistent names.
• Where killed (habitat).
• Neither specific nor consistent terminology e.g. : Woods, woodland and field, upland
grove, timber, cedar grove, rocky woods, oak grove, thicket, low woods, hemlock
woods, hardwoods, near woods, deep woods, deciduous woods, near woodlands.
• Date (month, day, year)
• Consistently included.
• Hour (time of death)
• Included much more often than gender of sample unfortunately.
11. Gordian knot approach for data entry
To transform currently inaccessible data to globally accessible, accurately
transcribed and usable information:
Separate the entry process into two independent but linked parts :
1. Header - generalist data fields
a) Themselves valuable research data
b) Predominantly clearly written and straight-forward therefore easy to enter (Can be
read, entered and proof-read by just about anybody)
c) Once entered online they characterize the entire database, opening it to any investigator
including specialists
d) Focus on these fields first
2. Food habits - specialist data fields
a) Predominantly unclearly written and require specialist (entymologist, botanist, etc.)
expertise to interpret and enter online
b) Principal investigators focused on this area of research are the most likely to commit
resources (grants, grad students, etc.) to wringing out the information – we don’t have the
resources
12. Please let us know:
Thoughts at this point on the best …
• Online host for scanned images and linked evolving database?
• American Bird Network? SORA? Biodiversity Universe?
• Data entry system linked with scanned cards for both:
1. Header data (generalist input)
2. Food habits data (specialist input)
• Access?
• Zooniverse and its Scribe system?
• Other?