This document discusses expressing phylogenetic claims through taxonomic data exchange. It argues that taxonomy is a collection of claims about biological relationships rather than just data. Terminology is introduced to clarify claims, such as taxa being sets determined by membership rules and taxonomies being hierarchies of taxa. The importance of designating taxa clearly when making claims is discussed. Methods for exchanging claims about corrections, alignments, and reasoning with logical relationships between taxa are proposed. The focus throughout is on using logic and clear terminology to express phylogenetic science rather than just representing data.
2. Synergy
CoL IRMNG NCBI GBIF EOL Union4
Treebase OpenTree...
Finding inconsistencies = good
but hard
Collecting information is useful
3. 'Data' – BAH!
'data' 'information' 'representation'
'format' 'nomenclature' - how bland.
Distracting.
Claims, not data. Consequential.
4. Terminology
Taxon: a set determined by a membership rule.
['taxon concept']
Character based
Descent based
Conspecifcity based
Taxonomy: a collection of taxa that form a
hierarchy.
Some taxonomies are phylogenetic (all clades).
5. Taxonomies are collections of
claims
X
A
B
C
X includes A, B, and C
A, B, C are mutually disjoint
X, A, B, and C are clades - if phylogenetic.
6. The important claims are about
biology
X includes Y
X1, X2, X3, … are mutually disjoint
X is a clade
X is a species
7. We have to designate taxa somehow, when we
express a claim
Many taxon names are polysemous
To be clear, always say 'in the sense of' some
static document (article or database snapshot)
X = Mammalia sensu
http://dx.doi.org/10.1126/science.1211028
If used multiple ways in some document, give
further qualifcation
Claims about taxa
8. Reasoning with claims
X includes Y and Y includes Z
→ X includes Z
X includes Y
→ X and Y are not disjoint
X and Y are clades →
one includes the other, or they are
disjoint
9. Two ways to be wrong
Wrong about designation
Wrong about science
10. 'Alignment' = estimating
coreference
Alignment claims:
X = Y (X and Y are the same taxon)
Mammaliasensu
http://dx.doi.org/10.1126/science.1211028} =
MammaliasensuNCBI.20140515
Heuristics based on properties and
relations (including names...)
Manual 'curation' if necessary
11. Incertae sedis
Confusing.
X is incertae sedisin A means
(1) A includes X
(2) it's not known which of A's non-incertae-
sedis'children' X belongs to, if any
(2) is not a claim about biology.
Logical content = (1).
13. Exchanging 'corrections'
'Rozellabelongs in Fungi.'
'Rhodophyceae is the same as Rhodophyta.'
'SILVA'sMorganellaisn't the same as Index
Fungorum'sMorganella.'
'Anolisisn't a clade unless it isNoropsis
merged into it.'
14. Interpreting advice
“Rozellais in Fungi.”
Rozella sensuSILVA115 and Fungisensu
SILVA115 belong to a clade disjoint from the
other SILVA115 children of Nucletmycea.
How about let's apply the label 'Fungi' to
such a clade and not to Fungisensu
SILVA115.
15. Notation not so important,
but for example -
includes(X, Y)
disjoint(A, B, C, …)
clade(X)
node(X, A, B, C, …) - abbreviation
species(X)
same(X, Y) notSame(X, Y)
sensu('Name', source)
+ nomenclatural claims
16. On and on
Synthesis
Identifer stability
Alignment details
Compare 'macrotaxonomy' and
'microtaxonomy'
Defense of scrufy
Compare Rod's github proposal
Philosophy of language
17. Separate science from nomenclature.
Use logic to do science.
Always use names withsensu.
Use heuristics to prevent paralysis.
Don't 'represent data' – express claims!
https://github.com/OpenTreeOfLife/reference-taxonomy/wiki/Expressing-phylogenetic-claims
Bottom line
18. Ack
Nico Franz, David Thau, Rod Page
Open Tree: Karen Cranston, Stephen Smith,
Mark Holder, and legions of others
Gerald Jay Sussman
Jonathan A. Rees 2014
Copyright waived CC0 1.0