An introduction to EOL (http://www.eol.org) and some of the challenges and possible applications for structured, semantic information about biological organisms. Presented at the kick-off meeting of the NSF-Funded Phenotype Ontology Research Coordination Network.
Challenge of Semantics for the Encyclopedia of Life
1. Challenges for semantics in EOL Phenotype Ontology RCN NESCent 25 February 2011 Cynthia Parr National Museum of Natural History Smithsonian Institution
13. Statistics 2.8 million pages – one (or more) per taxon 2 million data objects 500 thousand pages with objects 100+ partner databases 700 curators/1000s contributors/~46,000 members
18. We have an infrastructure . . . Aggregation mechanisms Names resolution Curation mechanisms Public and machine interfaces Version 2 (August) vastly improved support for community interaction Version 3 (???)
So, the approach of EOL is rather different than many other sites. EOL is a giant mashup that creates pages, that are then available for curators to assess and rate, or for anybody to provide comments or tags.
Objects such as these are essentially chunks of text sorted by topic.Each of these credits the source, and can receive comments or ratings, or can be trusted or untrusted by curators.
From this page from LepTree:EpipyropidaePlanthopperparasites
Given this scale, I think ths was the ONLY way we could start.Imagine how large an ontology we’d have to have to fully describe organisms ranging from this tiny Pelagic diatom, 50 microns longWhales, in this case a humpback, many orders of magnitude larger, also pelagic, but physiologically and morphologically quite differentPixie's Parasol, saprophytic organism with complex life cycles (note the collembola on it)An animal like a humpback is characterized in Animal Diversity Web by an ontology with about 400 concepts, just scratches the surface, similarly this Saturnid moth we characterized in the LepTree project with a few more hundred concepts, some of which overlap with the whale but most don’t. The size of ontologies spoken about here is on the order of 5 to 70K conceptsThink about what kind of characters you’d need to characterize this halobacteria – an archaean!!But a scientist studying food webs might want to know characteristics across a wide swath of life.
Represents about 2200 projects, and 1000 instances of data flow or hyperlinks between them. Hundreds of partners, each with their own ontology (in many cases for good reason!) and you can see that the ontology space itself, much less the way you Most of these are NOT using ontologies
One of the things that may be valuable about EOL is the ability to assess the amount of information available for a group of taxaFamily Corvidae, showing the hooded crow here, is where I curate. It has reasonably rich content with 74% of pages having some text though only 27% have images. There are also a large number of unreviewed images (from Wikipedia and Flickr) and text (mostly from Wikipedia) I am working through.This could be expanded to highlight gaps in what we know about organisms – what areas of biology, for example, lack information. Could be used by funding agencies to prioritize grants, by students deciding what needs to be studied.Might show how to find content summaries on current pages
Not biologically relevant concepts but it is a start
Hand wavy, we aren’t actually doing this just yet but we could….Note that by referring to the URIs for the concepts can take advantage of the relationship assertions among the terms, but we don’t need to manage them ourselves, so this might be pointers to the EQ statements described earlier, with enough information here that we can display to humans, but enough info so scientists and ontologists can have the formalisms needed for reasoning
Let’s say we figure out HOW to do it, should we do it?
Good for general public, to the extent that the concepts have understandable labelsThese are from the Animal Diversity Web, put these in the reproduction part of the pageAlong with any other reproduction data we get from other sourcesSome problems – some of our audiences aren’t interested in the fine detail but you never know…how do you decide what to hide?
For scientists, let them download or access the data, providing not only the source of where the info came from but machine-readable URIs that define the concepts, so that they can integrate and perform analyses on the dataDownload data like this, combine it with a phylogeny of rodents and you might be able to test evolutionary hypothesesmiddleman
If querying interfaces or APIs are not your thing, we could easily make the whole web page browsable by semantic web browsers You could do whatever you want with that….
Most ambitious, pie in the sky
Informtics for evolution, systematics, and biodiversity