iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK

Frontiers of discovery with
Encyclopedia of Life
TraitBank research and other case studies
Cyndy Parr
Smithsonian Institution National Museum of Natural History
parrc@si.edu @cydparr http://www.slideshare.net/csparr

Central challenges
• What are all the organisms on the planet?
• What do we know about them?
• How can we build new knowledge about
them?

GenBank
60 million DNA sequence records
900,000 species
4,000 genomes
How are these related to traits?

Phenomes: the next
frontier
In Phenoscape
57 publications had 565,158 anatomical trait
descriptions for 2,527 kinds of organisms
= 223 traits/organism
In ZFIN
38,189 trait descriptions for 4,727 genes for
Zebrafish
1.9 million species on the planet
= LOTS OF TRAITS

• How is EOL different
• How EOL gets used
• Introducing TraitBank
• Loading up TraitBank
• EOL & TraitBank in research
• Future of TraitBank
Outline

Third party applications
How EOL is different

How EOL gets used
http://www.notesfromnature.org/

http://www.onezoom.org/ http://yanwong.me/

PhyloTiler
http://viburnum.peabody.yale.edu/~piel/Tree_4_color/
Links and images…what about research?

Search groups for
“EOL papers”
at Mendeley.com

Anatolia Zooarchaeology Case Study led by
Alexandria Archive Institute
1. 14 different sites
2. 34+ zooarchaeologists
3. Decoding, cleanup, metadata documentation
4. 220,000+ specimens
5. 450 entities linked to 143 EOL taxon concepts
6. Anatomical entities linked to Uberon.org
7. Biometrics linked to measurement ontology
8. Collaborative analysis
Anatolia Zooarchaeology Case Study led by
Alexandria Archive Institute
1. 14 different sites
2. 34+ zooarchaeologists
3. Decoding, cleanup, metadata documentation
4. 220,000+ specimens
5. 450 entities linked to 143 EOL taxon concepts
6. Anatomical entities linked to Uberon.org
7. Biometrics linked to measurement ontology
8. Collaborative analysis
http://opencontext.org/
Kansa, E., Kansa, S. W., & Arbuckle, B. (2014). Publishing and Pushing:
Mixing Models for Communicating Research Data in Archaeology.
International Journal for Digital Curation, 9.

Page, R. D. M. (2013). BioNames: linking taxonomy,
texts, and trees. PeerJ, 1, e190. doi:10.7717/peerj.190
BioNames.org
Rod Page

But can we do more?
Introducing TraitBank

Search & Download
Data Sources
Data Summaries on
EOL Taxon Pages
Which plants grow well in
acidic soil?
What do water bears eat?
What is the biggest
species of whale?
Structured Data
TraitBank
JSON-LD API

• Numeric data
(measurements)
• Categorical data
(controlled vocabulary)
• Species interactions
• Mostly summaries for
populations, species
• Individual specimens
• Higher taxa
http://eol.org/traitbank
released January 2014

TraitBank Data
glossary
http://eol.org/data_glossary

Term URIs from existing
ontologiesbioportal.bioontologies.org
Subject Area Ontology Example terms
Statistics
Semanticscience Integrated
Ontology (SIO)
mean, minimal value,
standard deviation
Units of
measure
Units of Measurement Ontology
(UO)
meter, years, degree
Celsius
Habitat
information
Environments Ontology (EnvO) wetland, desert, snow field
Attributes of
organisms
Phenotype Quality Ontology (PATO) aerobic, conical, evergreen
Plant attributes Plant Trait Ontology flower color, life cycle habit,
salt tolerance
Animal attributes Vertebrate Trait Ontology body mass, total life span,
onset of fertility
Animal natural
history
Animal Natural History and Life
History Ontology (ETHAN)
nocturnal, oviparous,
scavenger

Term URIs from existing
ontologies
•Where necessary: request terms
•Last resort: create provisional terms with
http://eol.org/schema/terms/xxxx
•Still to do
• create “equivalentTo” or “similarTo” relations
• even more fancy inference

JSON-LD
e.g. http://eol.org/api/traits/1045608?
cache_ttl=2419200
Google Knowledge Graph

TraitBank data sources
Sources include:
Databases
(OBIS, AnAge, Paleodb, Phenoscape)
Literature
(Dryad, Ecological Archives, Data tables)
Natural History Collections
(Label data)
Legacy/unpublished data
Loading up TraitBank

TraitBank
~7 million records
326 traits
1.2 million taxa
40+ datasets
http://eol.org/collections/97700

Text mining
Environments-EOL
Evangelos Pafilis, Hellenic Centre for Marine Research (HCMR), Institute of
Marine Biology, Biotechnology and Aquaculture (IMBBC), Crete, Greece
491,616 habitat terms for 136,548 taxa

Text mining
Automated annotation Manual annotation

Morphological Data from NMNH catalog
Abi Nishimura
Project: Clean-up morphological data from
NMNH KE-Emu catalog and publish to
TraitBank
Goal: Make it easier to access and analyze
this valuable morphological data
Sakurai Midori,
http://eol.org/data_objects/26918624
Raw data from Spectral Tarsier Tarsius tarsier
database search

RESULTS
•Primate data published (320 taxa)
•Comprehensive mammals data to be
published soon (4662 taxa)
•Bird catalog currently being mined
Wan Hong, http://eol.org/data_objects/29203274

Mineralization of tissue in
marine organisms
Jen Hammock with Steve Cairns
For modeling impacts of ocean acidification
143,000 records for 119,000 species and subspecies of Micro- and Macroalgae,
Cnidaria, Polychaetes, Bryozoans, Brachiopods, Sponges, Mollusks,
Echinoderms and Arthropods
Mineralized tissue =
●Biogenic silica
●Calcium carbonate
○ Calcite
○ and/or Aragonite

2013-14 EOL Rubenstein Fellows
EOL & TraitBank research
1. EnvO habitat terms (Pafilis et al.)
2. Altitude Specificity of Flower Coloration (Wright & Seltmann)
3. Morphological impacts of extinction risk in fish (Chang)
4. Butterfly-host plant associations (Ferrer-Parris et al.)
5. Taxon Tree Tool (Lin)
6. Global Biotic Interactions (GLoBI, Poelen & Mungall et al)
http://www.globalbioticinteractions.org/
7. Reol: An R interface for EOL (Banbury, O’Meara)
Banbury, B. L., & O’Meara, B. C. (2014). Ecology and Evolution, 4(12).
doi:10.1002/ece3.1109

Chang crowdsourcingJonathan Chang, UCLA
http://jonathanchang.org/
Amazon Mechanical Turk

1. Character displacement across the Tree of Life
2. Illuminating the Dark Parts of the Tree of Life
3. Evolution in the usage of anatomical concepts in the biodiversity
literature
4. Planning for global change: using species interactions in
conservation
5. No place like home: Defining “habitat” for biodiversity science
6. Assessing risk status of Mexican amphibians
7. Quantifying color from digital imagery: color may determine
species’ responses to habitat edges and to climate change
8. More is less - Identifying global trends in species’ niche width
9. Identifying key species traits associated with climate change
vulnerability
NESCent-EOL-BHL Research
Sprint

Quantifying color from digital imagery
1. Automate processing of almost 300k images (of EOL’s 2.4 million)
2. Identify pinned specimen images
3. Process these for color and pattern information
4. Put this info into TraitBank
Elise Larsen, Yan Wong

Illuminating the Dark Parts of the Tree of
Life
Jessica Oswald, Karen Cranston, Gordon Burleigh, Cyndy Parr
1. Query EOL, GBIF,
GenBank for # records
2. Create score for amount
of information available
3. Map score to phylogeny

Global Genome Initiative Data Portal
For every family:
•Use TraitBank to assemble counts of records in repositories
•Compute a score (percentile) to assess knowledge available relative
to other families
•Make it easy to browse to find families that require effort
Beta launch end of June

• Decorate trees with traits
• NSF Genealogy of Life
• NSF Big Data
• NSF ABI Isotopes and Interactions
• Microsoft/WCMC Global Ecosystem Models
TraitBank future plans

Leveraging social networks
Ahn, J., et al.. (2012). Visually Exploring Social Participation in Encyclopedia of
Life. In 2012 International Conference on Social Informatics (pp. 149–156). IEEE.
Rotman, D., et al. (2014). Motivations affecting initial and long-term participation in
citizen science projects in three countries. In iConference 2014 Proceedings (pp.
110-124).
http://biotracker.umd.edu
• motivation model for citizen scientists
• international attitudes of scientists and
citizens to working together
• factors that increase curation network
activity
• currently working on motivations of EOL
content partners

Annotation of a specimen record
Ovary size and reproductive state
Age markers
Fat status
Body mass and other size
attributes

Annotation of an observation
record

For more information
• See & cite Parr, et al. 2014 Biodiv. Data Journal
• See our TraitBank paper (in review)
http://www.semantic-web-journal.net/content/traitbank-practical
• Open source code https://github.com/EOL/
• APIs at http://eol.org/api
• Become an EOL Curator

Take home messages
• EOL can be useful for research
• TraitBank is already awesome
• Mutualism between collections,
EOL, citizen science
• Let’s collaborate

Atlas of Living Australia • Biodiversity Heritage Library Consortium • Chinese
Academy of Sciences • La Comisión Nacional para el Conocimiento y Uso de la
Biodiversidad (CONABIO) • The Field Museum • Harvard University • El Instituto
Nacional de Biodiversidad (INBio) • Marine Biological Laboratory • Missouri
Botanical Garden • Muséum National d’histoire Naturelle • Naturalis Netherlands
• New Library of Alexandria • Smithsonian Institution • South African National
Biodiversity Institute • All of our content providers and curators
Steve Cairnes • John Keltner • Katie Barker • Jonathan Coddington • Sean Brady •
Tom Orrell • Chris Meyers • Yan Wong • Jon Norenburg • Torsten Dikow • Yurong
He • Jenny Preece and others on BioTracker team • Pensoft Publishing • EOL
Science Advisory Board
Katja Schulz, Jen Hammock, Marie Studer, Jeff Holmes, Nathan Wilson, Patrick
Leary, Jeremy Rice, Lisa Walley, Bob Corrigan, Erick Mata, Dmitry Mozzherin, Abi
Nishimura • Sarah Miller • Anthony Goddard, Mark Westneat and former BioSynC
staff
http://eol.org @cydparr parrc@si.edu
Major Funding for TraitBank provided by the Alfred P. Sloan
Foundation. Fellows program supported by Daniel M.
Rubenstein, Research sprint by Richard Lounsbery Foundation.

1. Terms are not in any existing ontology
e.g., seawater oxygen saturation, eutrophic pond, north-facing bluff
2. Synonyms are not included
e.g., vernal pond/intermittent pond
3. Standard classifications should be mapped
e.g., NatureServe, NOAA
4. Environment estimates vs. well-documented niche
parameters
e.g., text mining results vs. NatureServe habitats, OBIS data vs. niche analyses
Challenges

14 datasets with 25k taxa, 422k interactions, for 3k locations
alpha version of ingestion, normalization, aggregation
alpha version of web API
alpha version of data exports
GLoBI http://globalbioticinteractions.wordpress.com/
Jorrit Poelen, Chris Mungall, James Simon GoMexSi

iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK

Semelhante a iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK (20)

Mais de Cyndy Parr

Mais de Cyndy Parr (19)

Último

Último (20)

iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK

Notas do Editor