SlideShare a Scribd company logo
1 of 102
Download to read offline
Abstract: Given a collection of rooted phylogenetic trees with overlapping sets of
leaves, a supertree S is a single tree whose set of leaves is the union of the input sets
of leaves and such that S agrees with each input tree when restricted to the leaves of
the input tree. Typically with trees from real data, no supertree exists, and various
methods may be utilized to reconcile the incompatibilities in the input trees. This talk
focuses on a measure of robustness of a supertree method called its quot;radiusquot; R. For
example, if R = 1/10, then whenever T is a candidate binary tree and for all rooted
triples ab|c in T we have that {a,b,c} occur together in some input tree and that more
than 90% of the input triples involving {a,b,c} are in fact ab|c (a strong assumption),
then the method outputs T as the supertree; but this might fail if 90% is replaced by
89%. It is shown that the maximal possible radius for a method is R = 1/2. Many
familiar methods, both for supertrees and consensus trees, have R = 0, indicating that
they need not output a tree T that would seem to be the natural correct answer.
Some methods with the maximal possible R = 1/2 are indicated. Extensions may be
presented concerning supertree methods that rely on input distance information as
well as the topology of the input trees.
Robustness of supertree methods for reconciling dense
incompatible data
      by Stephen J. Willson
      Department of Mathematics
      Iowa State University
      Ames, Iowa 50011
      USA

      swillson@iastate.edu

      23 October 2007
The Supertree Problem
Different researchers use different methods and different genes and different
collections of taxa to produce their phylogenetic trees.

But we expect a single quot;tree of life.quot;

When can we combine the different trees into a single tree containing all the taxa?
A purely mathematical version:

Here are 3 rooted trees.




      1         3
           2         5      5       7    8




                    46          8
               2

Do they fit together into a single rooted tree (a supertree)?
In this case there are several possible supertrees:




                              7

                                      6       8
                  4
                                  5
                      3
              2
         1




                                      3       5


                          6
                  1
                              7           8                    5       7
                                                      2            6
                                                          43               8
             4                                    1
     2
With biological data it is more likely that no true supertree exists.




     1                                2
                   3        4   5
             2                                6




           1                     6
                  5     4

There are various ways to try to find an approximate supertree that fits as well as
possible. Incompatible data need to be reconciled.
Supertree methods have two aspects:

(a) Extrapolation

(b) Reconciliation of incompatible data

Problem. Study quantitatively how well a supertree method reconciles incompatible
data.
Some Supertree Methods

(1) Matrix Representation with Parsimony (MRP). (Baum 1992, Ragan 1992,
    Sanderson et al 1998)

(2) MinCutSupertree. (Semple and Steel 2000)

(3) PageMinCutSupertree. (Page 2003)

(4) Matrix Representation using Flipping (MRF). (Burleigh et al. 2004)

(5) Most Similar Supertree (MSS). (Creevey et al. 2004)

(6) MaxCutTriplets. (Snir and Rao 2006)

Book: Phylogenetic Supertrees: Combining information to reveal the Tree of Life
(ed. Olaf Bininda-Emonds, 2004)
A rooted phylogenetic tree with leaf set X can be described by its clusters.

The cluster of a vertex consists of all the leaf descendents of that vertex.
                           8


                                   10
                9

         11
                       4            6
                               5          7
    12           3

     1        2
X = {1,2,3,4,5,6,7}.

The cluster of 9 is {1,2,3,4}.
The cluster of 10 is {5,6,7}.
The children of {1,2,3,4} are {1,2,3} and {4}.
In fact, given such rooted input trees, if there exists a supertree, then one can be
found fast by the procedure BUILD (Aho et al. 1981).

A common situation is as follows: One seeks the children of the cluster U.
1. Form a graph A(U) in which the vertices are the members of U.
2. Find edges in A(U) in some manner.
3. If the graph is disconnected, the children of U are the members of the
components of A(U). Otherwise, do something to disconnect A(U).
A common situation occurs when the edges in A(U) have a numerical weight
obtained in some manner.

U = {a,b,c,d,e}

                           c
                  1       2
              b                1
   a                               d
          2
                  1            3
                      e

One method to deal with this is to use a minimum cut:
A cut for a decomposition of U into Y and Z is the sum of the weights of edges
broken by the decomposition.
Cut {a}, {b,c,d,e} is 2.
Cut {a,b}, {c,d,e} is 3.
Cut {a,b,c}, {d,e} is 4.
There is a unique minimum cut {a}, {b,c,d,e} = 2.
Make the children of U = {a,b,c,d,e} be {a} and {b,c,d,e}.
Another method is to use a minimum threshold:


                           c
                  1       2
              b                1
   a                               d
          2
                  1            3
                        e
For a threshold !, let A!(U) include only edges with weight > !.
If !!= 2, A2({a,b,c,d,e}) is



                                   c
                      b
         a                                 d
                                       3
                          e
The graph decomposes into {a}, {b}, {c}, {d,e}.
If !!= 1,



                              c
                             2
                    b
            a                         d
                2
                                  3
                           e
the graph decomposes into {a,b}, {c,d,e}.
If !!= 0,


                         c
                    1   2
                b            1
    a                            d
            2
                    1        3
                      e
the graph does not decompose.
The method of minimum threshold utilizes the smallest threshold ! such that A!(U)
decomposes into at least two components.
In this case, the minimum threshold is !!= 1 and A1(U) is



                              c
                             2
                    b
         a                            d
               2
                                  3
                             e
The children of {a,b,c,d,e} using minimum threshold are {a,b} and {c,d,e}.
There is always a minimum threshold, possibly 0.
It can be computed rapidly.
The minimum threshold tree is the tree found by always utilizing the minimum
threshold criterion.

The issue is how to choose the weights.
Let T be a rooted tree. Given distinct leaves a, b, c,
say ab | c in T if T has a cluster that contains a and b but not c, and
say T contains the rooted triplet ab | c.




                  4            6
                        5            7
              3

  1       2

12|4
14|5
57|3
not 2 4 | 1
not 5 6 | 7
Suppose we are given as data a collection D = {Ti : i quot; #} of rooted trees.

For each distinct a,b,c define sptD(ab|c)
= the normalized triplet support of ab | c as follows:

    Let den(a,b,c) = # {i quot; #: {a,b,c} $ L(Ti)}

    num(ab|c)      = # {i quot; #: ab | c in Ti}.

    sptD(ab|c) = %num(ab|c) / den(a,b,c) if den(a,b,c) > 0
                 &0 if den(a,b,c) = 0.
1
                   3
      2
                               3
                       4            4
          3
                                         5
                                    2
                               1
                           5
            4    5
num(45|2) = 1
den(4,5,2) = 2
sptD(45|2) = 1/2

num(45|3) = 2                  num(34|5) = 1
den(4,5,3) = 3                 den(3,4,5) = 3
sptD(45|3) = 2/3               sptD(34|5) = 1/3

num(35|4) = 0
den(3,4,5) = 3
sptD(35|4) = 0
It is immediate that for all distinct a, b, c,
        0 ' sptD(ab|c) ' 1
       sptD(ab|c) = sptD(ba|c)
       sptD(ab|c) + sptD(ac|b) + sptD(bc|a) ' 1.
Given a, b in U, define
    msptU(a,b) = max {sptD(ab|c): a,b,c are distinct elements of U}.




  1
                   3
      2
                               3
                       4            4
          3
                                         5
                                    2
                               1
                           5
            4    5
num(45|2) = 1
den(4,5,2) = 2
sptD(45|2) = 1/2

num(45|3) = 2                  num(45|1) = 1
den(4,5,3) = 3                 den(4,5,1) = 2
sptD(45|3) = 2/3               sptD(45|1) = 1/2

msptU(4,5) = max{sptD(45|1), sptD(45|2), sptD(45|3)} = 2/3.

msptU(4,5) is the normalized triplet support for edge {4,5}.
The normalized triplet supertree

Construct a tree S as follows:

(1) X is a cluster of S.

Repeat until done:
(2) (i) If U is a cluster of S with two members, U = {a,b}, then the children of U
are {a} and {b}.
     (ii) If U is a cluster of S with more than two members, form A(U) in which the
edge {a,b} has weight msptU(a,b). Find the minimal threshold !!such that A!(U) is
disconnected. The components of A!(U) are the children of U.

The resulting tree S is the normalized triplet supertree or NTS.
Input


  1
                        3
        2
                                              3
                            4                         4
            3
                                                           5
                                                       2
                                              1
                                    5
                  4     5

A({1,2,3,4,5})


                                2
            1
                                             1
                        1
                  1/2
                                    1
                                                      3
                                        1/2
            1/2
                                                  1
                  5
                        2/3              4
The minimum threshold is ! = 1/2.


                        2
         1
                                1
                    1
                            1
                                        3

                                    1
              5
                     2/3      4
The children of {1,2,3,4,5} in the minimal threshold tree are {1} and {2,3,4,5}.
A({2,3,4,5})



         2                3
                1/2
        1/2                1

                          4
          5
                2/3
The minimum threshold is !!= 1/2.
A({2,3,4,5})



               2                3

                                 1

                                4
               5
                        2/3
The minimum threshold is !!= 1/2.
The children of {2,3,4,5} are {2} and {3,4,5}.
A({3,4,5})


             3
                   1/3

                         4
       5
               2/3
The minimum threshold is !!= 1/3.
The children of {3,4,5} are {3} and {4,5}.
The normalized triplet supertree NTS has clusters
{1,2,3,4,5}, {2,3,4,5}, {1}, {2}, {3,4,5}, {3}, {4,5}, {4}, {5}.




      1
          2
              3
                  4     5
Problem. Study how well a supertree method reconciles incompatible data.

Let D = {Ti: i quot; #} be a collection of input trees, where Ti has leaf set L(Ti). We
seek a supertree for D.

We say that collection D is dense if for every three distinct taxa a,b,c in X, there
exists an input tree Ti such that {a,b,c} $ L(Ti).

A special case of dense data arises when each Ti has leafset X. In this case, the
problem is the same as finding a consensus tree.

We study how various methods work on dense data.
Suppose T is a rooted tree and a,b,c are leaves.

     sptT(ab|c) =       1 if ab|c in T
                        0 otherwise

T



      1
          2
              3
                    4       5

sptT(34|2) = 1
sptT(12|4) = 0
sptT(24|1) = 1
Main Theorem. Suppose D is dense and T is a binary rooted tree.

Assume that for all distinct x, y, z in X we have
    | sptD(xy|z) - sptT(xy|z) | < 1/2.

Then the normalized triplet supertree S is T.
The condition
      | spt(xy|z) - sptT(xy|z) | < 1/2
for all x, y, z means

(1) If xy|z in T, then more than half the input {x,y,z} triplets are xy|z.
(2) If xy|z is not in T, then fewer than half the input {x,y,z} triplets are xy|z.

In this situation, the NTS is T.
If T is not binary, there is a more complicated statement.

Theorem. Suppose D is dense.
Assume that for all distinct x, y, z in X we have
    | sptD(xy|z) - sptT(xy|z) | < 1/2.
Then the normalized triplet supertree S is a refinement of T.
Following Atteson 1999, say that the (dense l() radius R of a method of supertree
reconstruction is ) provided that,

(1) Whenever D is a dense collection of input trees and T is a binary tree such that
for all distinct a, b, c
      | sptD(ab|c) - sptT(ab|c) | < )
then the method constructs T; and

(2) For every * > )!there exists a binary tree T and a dense collection D of input
trees such that for all distinct a, b, c
      | sptD(ab|c) - sptT(ab|c) | < *
but the method does not reconstruct T.

Alternatively, call such ) the dense robustness radius R.

Theorem. The normalized triplet supertree method has R + 1/2.
Theorem. R = 1/2 is best possible for any method.




                                 c
                        a   b
            a      b
   c
                            W
            T

Suppose that we have as input n copies of T and m copies of W.
    sptD(ab|c) = n/(n+m) = mspt(a,b)
    sptD(bc|a) = m/(n+m) = mspt(b,c)
    sptD(ac|b) = 0

                            | sptD(ab|c) - sptT(ab|c) | = m/(n+m)
       sptT(ab|c) = 1
                            | sptD(bc|a) - sptT(bc|a) | = m/(n+m)
       sptT(bc|a) = 0
                            | sptD(ac|b) - sptT(ac|b) | = 0
       sptT(ac|b) = 0

                            | sptD(ab|c) - sptW(ab|c) | = n/(n+m)
       sptW(ab|c) = 0
                            | sptD(bc|a) - sptW(bc|a) | = n/(n+m)
       sptW(bc|a) = 1
                            | sptD(ac|b) - sptW(ac|b) | = 0
       sptW(ac|b) = 0
Suppose a method yields a tree Y when for all distinct x, y, z in X
    | sptD(xy|z) - sptY(xy|z) | < * with * > 1/2.

We could choose m > n with m/(n+m) < *.
Then also n/(n+m) < *.
So the method would have to output both T and W.
Theorem. The normalized triplet supertree method has dense robustness radius
       R = 1/2.
Complexity of NTS

Theorem. Let X be a finite set, |X| = n, and let D be a collection of m input rooted
trees with leaves from X.

The computation of the normalized triplet supertree S takes time
        O(n4) + O(n3 m).
Some Supertree Methods

(1) Matrix Representation with Parsimony (MRP). (Baum 1992, Ragan 1992,
    Sanderson et al 1998)

(2) MinCutSupertree. (Semple and Steel 2000)

(3) PageMinCutSupertree. (modified by Page 2003)

(4) Matrix Representation using Flipping (MRF). (Burleigh et al 2004)

(5) Normalized Triplet Supertree (NTS).
Method                   Complexity

NTS                      polynomial
MRP                      NP-hard
MinCutSupertree (orig)   polynomial
MinCutSupertree (Page)   polynomial
triplet MRP              NP-hard
MRF                      NP-hard
Method                   Dense robustness radius

NTS                          1/2
MRP                          0 [R ' 0.05]
MinCutSupertree (orig)       0
MinCutSupertree (Page)       0
triplet MRP                  1/2
triplet MRF                  1/2
Interpretation:

There exists a binary tree T and dense input data D such that
(1) whenever xy|z in T, then at least 99% of the input {x,y,z} triplets are xy|z; but
(2) the MinCutSupertree is not T.
A special case is consensus trees, when all input trees have the same leaf set.


     T              6                    6              6
                5                    5                 5
               4                 4                 3
           3                    2                 2
  12
                                              4
                            3
                        1                 1

                6                    5
               4                     4
                                3
           3
       2                        2
      5
 1
                            6
                        1
Majority consensus tree M


       M                    6
                            5

        1                  4
               23
although if you include clusters in order of occurrence until you fail to get a tree, then
we obtain




                                    6
                                    5
                                4
            1     2     3
Strict Consensus


                2                               6
                                4
                        3               5
        1

Adams Consensus

  Ad
                                            6
                                    5
                    3       4
    1       2

Normalized Triplet Consensus = MinCutSupertree Consensus


                                            6
                                        5
                                4
                        3
    1               2
Consensus tree computation (all input trees have leaf set X)

    Method                       dense robustness radius

    NTS                          1/2
    MRP                          0   [R ' 0.05]
    strict consensus             0
    triplet MRP                  1/2
    triplet MRF                  1/2
    majority consensus           0
    Adams consensus              0
    MinCutSupertree (orig)       0
    MinCutSupertree (Page)       0

NTS is much slower than strict consensus and majority consensus which take time
O(n m). And NTS applies only to rooted trees.
There are many variants on the method, also with radius 1/2, which may possibly
extrapolate better than NTS.

Theorem. Suppose a supertree method utilizes minimal threshold trees with the
edge support sptU(a,b) for the edge {a,b} in A(U). Suppose
(1) if there exists c quot; U with sptD(ab|c) > 1/2, then sptU(a,b) > 1/2; and
(2) if for every c quot; U we have sptD(ab|c) < 1/2, then sptU(a,b) < 1/2.
Then the dense robustness radius is 1/2.
Lemma. Suppose that D is dense and T is a tree with leaf set X = , L(Ti).
Assume that for all distinct x, y, z in X we have
      | sptD(xy|z) - sptT(xy|z) | < 1/2.
Suppose that the tree T has a cluster U with exactly 2 children A and B. Suppose S
is the NTS and U in S. Then A1/2(U) is disconnected and its components are
exactly A and B. The children of U in S are A and B.
Lemma. Suppose that D is dense and T is a tree with leaf set X = , L(Ti).
Assume that for all distinct x, y, z in X we have
      | sptD(xy|z) - sptT(xy|z) | < 1/2.
Suppose that the tree T has a cluster U with exactly 2 children A and B. Suppose S
is the NTS and U in S. Then A1/2(U) is disconnected and its components are
exactly A and B. The children of U in S are A and B.



     a1                     b1
              a2
                                   b2
       a3                    b3
             a4

Consider a1 and a2 in A. Choose b1 in B.
Then a1 a2 | b1 in T.
Hence sptT(a1 a2 | b1) = 1.
Hence sptD(a1 a2 | b1) > 1/2.
Hence mspt(a1, a2) > 1/2.
Hence there is an edge {a1, a2}.
a1         b1
      a2
                 b2
 a3         b3
      a4
a1                       b1
                a2
                                       b2
         a3                      b3
                a4

Similarly there is an edge {ai, aj} in A1/2(U).
a1                     b1
                a2
                                      b2
         a3                     b3
                a4

And there is an edge {bi, bj} in A1/2(U).
Is there an edge {a1, b1}? Consider any c - a1, c- b1.
Say c = a2.
Then a1 c | b1 in T.
sptT(a1 c | b1) = 1.
sptT(a1 b1 | c) = 0.
sptD(a1 b1 | c) < 1/2

Indeed sptD(a1 b1 | c) < 1/2 for all c - a1, c- b1.
Hence mspt(a1, b1) < 1/2.
There is no edge {a1, b1} in A1/2(U).
a1                    b1
               a2
                                    b2
          a3                  b3
               a4

A1/2(U)

The children of U in the NTS are A and B.
General Properties of NTS
Wilkinson et al 2004 propose some quot;desiderata for liberal supertrees.quot;

(1) Uniqueness. The method should give a unique answer.

(2) Plenary. The resulting supertree should contain all the leaves of the input trees.

(3) Order invariance. The method should not be influenced by the order in which
members of D are introduced, or the order of leaves in the adjacency matrix of an
input tree.

(4) Sizeless. The method should not be biased by input trees with large size. Note
that if D = {Ti: i quot; #}, then NTS is precisely the same as the multiset D' where
each Ti is replaced by the set of its rooted triples and star triples.

(5) Shapeless. The method should not be biased by the shape of input trees.
(6) Pareto. Relationships in all input trees should appear in the output.

Theorem. Suppose that D contains at least one input tree containing all the taxa X.
Suppose sptD(ab|c) = 1. Then ab|c in the NTS.

(7) Weightable. The method should allow different trees to be weighted.

(8) Speed. NTS has polynomial time-complexity.

(9) Assessable. The method should allow a measure of the amount of support of the
output supertree. The smallest number ) such that for all x, y, z,
          | sptD(xy|z) - sptT(xy|z) | ' )
gives a measure of the overall support for a tree T. A smaller ) means a better fit.
Real data
Philip, Creevey, and McInerney (Mol. Biol. Evol. 2005) considered 780 single-gene
trees for a set of ten eukaryotes. The numbers of taxa in a given tree ranged widely.
They performed a Most Similar Supertree analysis using the software package
CLANN. They published and analyzed the resulting tree.
I used the 664 trees (out of 780 obtained at the web site
http://bioinf.nuim.ie/supplementary/eukaryotes/ ) that contained humans.

The NTS is topologically the same tree as the one they published.

The NTS satisfied that for all x, y, z
    | sptD(xy|z) - sptT(xy|z) | ' 0.61111
Geometric interpretation
Consider the collection of rooted X-trees, where |X| = n. Every rooted X-tree T is
uniquely determined by its set of rooted triples ab|c. The total number of possible
rooted triples is
           m=         3 . n / = n(n-1)(n-2)/2
                        0!3 1

Let HX = [0,1]m
be the m-dimensional hypercube in which each coordinate corresponds to a rooted
triple ab|c.




HX. There are m dimensions.
HX is a kind of rooted-tree-landscape space.

To any rooted X-tree T there corresponds
     sptT quot; HX
given by (sptT)ab|c = sptT(ab|c).




Each rooted X-tree corresponds to a distinct corner of HX. No two distinct trees
correspond to the same corner. Not all corners correspond to trees.
To every collection D of rooted trees with leaf-sets contained in X, there is a point
     sptD quot; HX
given by (sptD)ab|c = sptD(ab|c).

The l( norm on HX is defined by
     ||u-v||( = max {| uab|c - vab|c |}




If sptT is the upper front left corner, then the red box is
{u quot; HX: || u - sptT ||( < 1/2}.
Theorem. Suppose T is binary and
    || sptD - sptT ||( < 1/2.
Then the NTS is T.




If sptT is the upper front left corner, and sptD is in the red box, then the NTS is T.
The picture is deceptive since m is very large.




If X contains 7 taxa, then m = 105.

The volume of HX is 1.
The volume of the red box is (1/2)105 2 2.465 quot; 10-32.

The total volume of all red boxes for all 10,395 binary rooted trees with 7 taxa is
2.563 quot;!10-28.

But this ignores the 35 constraints that for each a, b, c
           sptD(ab|c) + sptD(ac|b) + sptD(bc|a) ' 1.
For the Philip et al. dataset with 10 eukaryotes, if T = NTS then
          || sptD - sptT ||( = 0.61111




Here dim(HX) = 360.
Closest Tree Problem.
Given the set of input trees D, find a rooted X-tree T that minimizes
    ||sptD - sptT ||(




The NTS solves this problem if the solution T is binary and satisfies
||sptD - sptT ||(!< 1/2.
Closest Tree Problem. Given the set of input rooted trees D, find a rooted X-tree
T that minimizes
     ||sptD - sptT ||(

Note
||sptD - sptT ||(
= max { | sptD(xy|z) - sptT(xy|z) | }
= max { max { sptD(xy|z) : xy|z not in T},
          max { (1 - sptD(xy|z)) : xy|z in T} }

Let DweakD(T) := max {sptD(xy|z) : xy|z not in T}.
Weak Closest Tree Problem. Given the set of input rooted trees D, find a rooted
X-tree T that minimizes
     DweakD(T) := max {sptD(xy|z) : xy|z not in T}.
Weak Closest Tree Problem. Given the set of input trees D, find an X-tree T that
minimizes
     DweakD(T) := max { sptD(xy|z) : xy|z not in T}.


Theorem. The Normalized Triplet Supertree S solves the Weak Closest Tree
Problem.

There is typically not a unique solution.
Theorem. Suppose every input tree in D is binary. Suppose D is dense. Suppose
T solves the Closest Tree Problem. Let S be the Normalized Triplet Supertree.
Then
     DweakD(S) ' ||sptT - sptD ||( ' 2 DweakD(S).
Comparison with MinCutSupertree
Suppose m is a positive integer. If m = 5, we have the input


                                  a                   Y
  a                                   b
          b
              1                           1
                  2                        2
      T
                      3
                                              3
                          4                       4
                              5
                                                          5
with t copies of T and y copies of Y
Z1                                 Z3
                     Z2
                                 3
1
                2                    a
a
                                      b
                 a
  b                                    1
                     b
   2                                    2
                         1
       3
                                        4
                         3
        4                                   5
            5                4
                                     5
Z4              Z5
                             5
            4
            a                a
                                 b
             b
                                  1
                 1
                                   2
                 2
                                    3
                     3
                                        4
                         5

Input also one copy of Z1, ..., Zm.
The data are close to T.

                                  a                   Y
  a                                   b
          b
              1                           1
                  2                        2
      T
                      3
                                              3
                          4                       4
                              5
                                                          5

sptT(ij | a) = 1
sptT(ij | b) = 1
sptT(ab | i) = 1
sptT(jk | i ) = 1 if i<j<k
sptT(uv | w) = 0 otherwise

sptD(ij | a) = (t + y + m - 2) / (t+y+m)
sptD(ij | b) = (t + y + m - 2) / (t+y+m)
sptD(ab | i ) = (t + 1) / (t + y + m)
sptD(jk | i) = (t+y + m - 2) / (t + y + m) if i < j < k
Hence | sptD(uv|w) - sptT(uv|w) | ' (y + m - 1) / (t + y + m)

We will choose t = my + m2 - 2m -1. For all u, v, w, as y 3!(
| sptD(uv|w) - sptT(uv|w) | ' 1/(m+1).

But a check shows that when U = X, {a,b} is not a minimal cut and {a,b} is not a
child of the root in MinCutSupertree.
Hence for each m we have R ' 1/ (m+1).
Hence R = 0.
The same example applies to Page MinCutSupertree.
Distance Methods
Suppose we know the distances on the input trees. Then there are many more
constraints.


                     1
            2                                            1
                                             3
    1           1                1                           2
                        1                            1
        1           3
                2                5           5       7       8
                                     2
                         2
                                             1
                    2        1           1
                             46                  8
                    2

Figure 1. A family D of input rooted trees. Branch lengths in red.
1
                            1
                                            1
                                    1
                    1
                                1
                                    7           1
            1           1                   1
                                116
       1                                        8
                14
                                        5
                            3
                2
        1

The unique supertree with distances from Figure 1.
2           1                             3
                                         2
               2            1
                        1
           1                                             1
                                                     1
       1               3
               2                5        5       7       8
                                    1
                       2
                                         2
                            1
                   1                 1
                            46               8
                   2

Figure 2. A different family of input trees with the same topology as in Figure 1 but
different branchlengths.
1
                           1
                                                    1
                                            1
                               1
                   1                    3           5
                                       1
           1           1       1
                                                1
                                       1
       1                   6
               1       1
                                   7            8
               4
       2

The unique supertree with distances for Figure 2.
5       7
         2                6
             43                    8
     1

Figure 3. The supertree constructed by BUILD or NTS from the input trees in
Figures 1 or 2, ignoring distances.

Moral: The use of distances can give extra resolution.
How do we use distances?

4(a,b) is the length of the path in the graph of T from {a} to the most recent
common ancestor of a and b.


                      1       1
                                      3
                 1                1
                          1
                              7           2
             2       1                1
                           1 16
        1        14                       8
                             5
                         3
                 2
        1

Example:
4(1,4) = 3
4(4,1) = 1
4(7,4) = 2
4(4,7) = 3
d


                        4        5
                    3

               2
       1

d corresponds to the distance from mrca(1,2) to the root.

If 4 comes from the tree, then
4(1,4) - 4(1,2) = d
4(1,5) - 4(1,2) = d
4(1,3) - 4(1,2) < d

d = max{4(1,c) - 4(1,2): c quot; X, c distinct from 1 and 2}
Given D in which each input tree has distances,
in each input tree Ti = (Vi, Ei, r, Xi) compute 4i(a,b) for a and b in Xi.

Let 4D(a,b) be the average of the 4i(a,b).
4D!is a partially defined function on X quot; X.
We wish to use a minimal threshold procedure to build a supertree from D.

Suppose U is a cluster. Form A(U).

Define the support for the edge {a,b} to be the estimated distance from mrca(a,b) to
the root for U. The bigger is this estimate, the more confident we are in the edge
{a,b}.




         spt(1,2)


                             4     5
                         3

                    2
           1

Here spt(1,2) = 4(1,4) - 4(1,2)
Two possible formulas for support:

The primary support of (a,b) is
pspt(a,b) = max { 4(a,c) - 4(a,b), 4(b,c) - 4(b,a): c quot; U}

The confirmed support of (a,b) is
cspt(a,b) = max { min { 4(a,c) - 4(a,b), 4(b,c) - 4(b,a)}: c quot; U}.

If 4!is globally defined and is correct on the tree, these give the same numbers. If 4
contains errors, these can be different numbers.
Algorithm BUILD-WITH-DISTANCES (W. 2004)

Suppose U is a nonempty set of vertices. Define A(U) as follows:

(i) The vertex set consists of the members of U.

(ii) The weight for edge {a,b} is pspt(a,b), provided it is positive.

Find the minimal ! such that A!(U) disconnects, and let the children of U be the
components of A!(U).

The result is the minimal threshold supertree Sp with primary support.


Or if instead the weight for edge {a,b} is cspt(a,b), obtain the minimal threshold
supertree Sc with confirmed support.
Example. Use the primary support function pspt:



                                                0.5
                        1
          2
                                        2
   1                            2                     2.5
                    1
              1                                 1
    1                                       2
                    3           4   5
              2                                       6
                            2
                1

                   1
           2                        1
                                1
           1                        6
                  5         4
U = {1,2,3,4,5,6}.

4(2,1)-4(2,6) = 0                                      4(6,1)-4(6,2) = 0.5
4(2,3)-4(2,6) = 2                                      4(6,3)-4(6,2) = undef
4(2,4)-4(2,6) = 2                                      4(6,4)-4(6,2) = -1.5
4(2,5)-4(2,6) = 0.5                                    4(6,5)-4(6,2) = 0.5

pspt(2,6) = 2
Example. Use the confirmed support function cspt:



                                                 0.5
                         1
          2
                                         2
   1                             2                     2.5
                   1
              1                                  1
    1                                        2
                     3           4   5
              2                                        6
                             2
               1

                     1
           2                         1
                                 1
           1                         6
                   5         4

U = {1,2,3,4,5,6}.

4(2,1)-4(2,6) = 0                                       4(6,1)-4(6,2) = 0.5
4(2,3)-4(2,6) = 2                                       4(6,3)-4(6,2) = undef
4(2,4)-4(2,6) = 2                                       4(6,4)-4(6,2) = -1.5
4(2,5)-4(2,6) = 0.5                                     4(6,5)-4(6,2) = 0.5

cspt(2,6) = 0.5
1.5
                 .75
              .75    1.5   1               1
                                     1.5
                     5
                 1
          1                3                   6
                                     4
      1         2
Sc.
1.67
                         0.5
                                   1.75 3
                  0.25
                                    5
          2.33                              2.17
                          1.67 2.33
              1                             6
                               4
                         2

Sp.




                                                   6
                                        5
                               4
          2          3
      1

NTS = MRP strict consensus of 18 MP trees
Suppose that the input trees D = {Ti: i quot; #} have distances. Say that D is
distance-dense if for each distinct a and b there exists i quot; #!with {a,b} $ L(Ti).

Theorem Suppose that the tree T is binary, the shortest branchlength is 5 > 0, and
the distance function on T is 4T.

Suppose that each tree in D = {Ti: i quot;!#} has distances and D is distance-dense.
For each a and b, let 4D(a,b) be the estimated value
     4D(a,b) = average{4i(a,b): a and b are in L(Ti)}.

(1) Assume for all x and y
     | 4D(x,y) - 4T(x,y) | < (1/2) 5
Then Sc = T.

(2) Assume for all x and y
     | 4D(x,y) - 4T(x,y) | < (1/4) 5
Then Sp = T.

(3) The numbers (1/2) and (1/4) above are the largest possible for the respective
methods.
Following Atteson 1999, say that the (distance-dense l() radius r of a method of
supertree reconstruction from trees with distances is )! provided that,

(1) Suppose D is a distance-dense collection of input trees with distances yielding
the function 4.
Suppose T is a binary tree with function 4T and minimal branchlength 5.
Suppose that for all distinct x, y
     | 4D(x,y) - 4T(x,y) | < ) 5.
Then the method constructs T.

(2) For every * > )! there exists a binary tree T with distances and shortest
branchlength 5, and there exists a distance-dense collection D of input trees such
that for all distinct x, y
      | 4D(x,y) - 4T(x,y) | < *!5.
but the method does not reconstruct T.

Alternatively, call such )! the distance-dense robustness radius r.
Theorem.
(1) The primary supertree method has r = 1/4.
(2) The confirmed supertree method has r = 1/2.
(3) r = 1/2 is best possible for any method.
Summary.

The confirmed supertree method is twice as robust as the primary supertree method
on distance-dense data and has optimal distance-dense robustness radius.
Review
(1) Different methods for finding supertrees can differ significantly on how well they
perform on dense data to reconcile incompatibilities.

(2) For topological input, Triplet MRP and NTS can reconcile incompatibilities close
to a tree better than MRP or MinCutSupertree.

(3) For consensus methods, NTS reconciles incompatibilities close to a tree better
than Majority Consensus and Strict Consensus.

(4) NTS solves an optimization problem in tree-space.

(5) NTS is polynomial-time and has other nice properties.

(6) Distance methods can add resolution if the input trees have distances.

(7) For distance-dense data, the confirmed supertree is more robust than the primary
supertree.
Questions
(1) Is there an analogous quantitative measure for how well a supertree method
extrapolates data?

(2) Which topological methods with radius 1/2 extrapolate well?

(3) What is the complexity of the Closest Tree Problem?

(4) What is the dense robustness radius of MRF and other methods?

(5) What happens if we use a different norm on HX, e.g.
     ||u - v||2 = 6[7 (uab|c - vab|c)2 ]?
Acknowledgments
     Thanks to the Isaac Newton Institute for Mathematical Sciences for its
hospitality at an excellent facility.

    Thanks to the e-Science Institute for hosting this Phyloinformatics Workshop.

     Thanks to the organizers Vincent Moulton, Mike Steel, and Daniel Huson of
the Phylogenetics Programme.

    Thanks to Iowa State University for giving me sabbatical this semester.
Thank you for your attention.
References
Aho, A.V., Y. Sagiv, T.G. Szymanski, and J.D. Ullman (1981). Inferring a tree from
lowest common ancestors with an application to the optimization of relational
expressions. SIAM J. Comput 10 (3), 405-421.

K. Atteson (1999). The performance of neighbor-joining methods of phylogenetic
reconstruction. Algorithmica 25, 251-278.

Baum, B.R. (1992). Combining trees as a way of combining data sets for
phylogenetic inference, and the desirability of combining gene trees. Taxon 41, 3-10.

Burleigh, J.G., O. Eulenstein, D. Fernández-Baca, and M.J. Sanderson (2004). MRF
Supertrees. in Bininda-Emonds, O.R.P. (ed.) Phylogenetic Supertrees: Combining
Information to Reveal the Tree of Life. Kluwer Academic Publishers, Dordrecht, the
Netherlands, pp. 65-85.

Creevey; C.J., D.A. Fitzpatrick, G.K. Philip, R.J. Kinsella, M.J. O'Connell, M.M.
Pentony, S.A. Travers, M. Wilkinson, and J.O. McInerney (2004). Does a tree-like
phylogeny only exist at the tips in the prokaryotes? Proc. R. Soc. Lond. B. Biol. Sci.
271, 2551-2558.
Page, R.D.M. (2003). Modified mincut supertrees, in R. Guigó and D. Gusfield
(Eds), Algorithms in Bioinformatics, Proceedings of the second international
workshop, WABI 2002, Rome, Italy, September 17-21, 2002, Lecture Notes in
Computer Science 2452, Springer-Verlag, Berlin, pp. 537-552.

Philip, G. K., C.J. Creevey, and J.O. McInerney (2005). The Opisthogonta and the
Ecdysozoa may not be clades: Stronger support for the grouping of plant and animal
than for animal and fungi and stronger support for the Coelomata than Ecdysozoa.
Mol. Biol. Evol. 22(5) 1175-1184.

Ragan, M.A. (1992). Phylogenetic inference based on matrix representation of
trees. Mol. Phylogenet. Evol. 1, 53-58.

Sanderson, M.J., A. Purvis, and C. Henze (1998). Phylogenetic supertrees:
Assembling the trees of life. Trends Ecol Evol. 13, 105-109.

Semple, C. and M. Steel (2000). A supertree method for rooted trees. Discrete
Applied Mathematics 105, 147-158.

Steel, M., S. Böcker, and A.W.M. Dress (2000). Simple but fundamental limitations
on supertree and consensus tree methods. Systematic Biology 49(2) 363-368.
Wilkinson, M., J.L. Thorley, D. Pisani, F-J. Lapointe, and J.O. McInerney (2004)
Some desiderata for liberal supertrees, pp. 227-246, in Bininda-Emonds,O.R.P. (ed)
Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, Kluwer,
Dordrecht, The Netherlands.

Willson, S.J. Constructing rooted supertrees using distances (2004). Bulletin of
Mathematical Biology 66, 1755-1783.

More Related Content

What's hot

Relation function
Relation functionRelation function
Relation functionBiswa Nayak
 
Stability criterion of periodic oscillations in a (15)
Stability criterion of periodic oscillations in a (15)Stability criterion of periodic oscillations in a (15)
Stability criterion of periodic oscillations in a (15)Alexander Decker
 
ملزمة الرياضيات للصف السادس الاحيائي الفصل الاول
ملزمة الرياضيات للصف السادس الاحيائي الفصل الاولملزمة الرياضيات للصف السادس الاحيائي الفصل الاول
ملزمة الرياضيات للصف السادس الاحيائي الفصل الاولanasKhalaf4
 
2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث تطبيقات التفاضل
2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث   تطبيقات التفاضل2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث   تطبيقات التفاضل
2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث تطبيقات التفاضلanasKhalaf4
 
Dynamic Programming Over Graphs of Bounded Treewidth
Dynamic Programming Over Graphs of Bounded TreewidthDynamic Programming Over Graphs of Bounded Treewidth
Dynamic Programming Over Graphs of Bounded TreewidthASPAK2014
 
ملزمة الرياضيات للصف السادس التطبيقي الفصل الثاني القطوع المخروطية 2022
ملزمة الرياضيات للصف السادس التطبيقي الفصل الثاني القطوع المخروطية 2022ملزمة الرياضيات للصف السادس التطبيقي الفصل الثاني القطوع المخروطية 2022
ملزمة الرياضيات للصف السادس التطبيقي الفصل الثاني القطوع المخروطية 2022anasKhalaf4
 
A new generalized lindley distribution
A new generalized lindley distributionA new generalized lindley distribution
A new generalized lindley distributionAlexander Decker
 
Partial ordering in soft set context
Partial ordering in soft set contextPartial ordering in soft set context
Partial ordering in soft set contextAlexander Decker
 
ملزمة الرياضيات للصف السادس الاحيائي الفصل الثاني القطوع المخروطية 2022
 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثاني القطوع المخروطية 2022  ملزمة الرياضيات للصف السادس الاحيائي الفصل الثاني القطوع المخروطية 2022
ملزمة الرياضيات للصف السادس الاحيائي الفصل الثاني القطوع المخروطية 2022 anasKhalaf4
 
Some properties of gi closed sets in topological space.docx
Some properties of gi  closed sets in topological space.docxSome properties of gi  closed sets in topological space.docx
Some properties of gi closed sets in topological space.docxAlexander Decker
 
Simultaneous Triple Series Equations Involving Konhauser Biorthogonal Polynom...
Simultaneous Triple Series Equations Involving Konhauser Biorthogonal Polynom...Simultaneous Triple Series Equations Involving Konhauser Biorthogonal Polynom...
Simultaneous Triple Series Equations Involving Konhauser Biorthogonal Polynom...IOSR Journals
 
Sub artex spaces of an artex space over a bi monoid
Sub artex spaces of an artex space over a bi monoidSub artex spaces of an artex space over a bi monoid
Sub artex spaces of an artex space over a bi monoidAlexander Decker
 
FUZZY IDEALS AND FUZZY DOT IDEALS ON BH-ALGEBRAS
FUZZY IDEALS AND FUZZY DOT IDEALS ON BH-ALGEBRASFUZZY IDEALS AND FUZZY DOT IDEALS ON BH-ALGEBRAS
FUZZY IDEALS AND FUZZY DOT IDEALS ON BH-ALGEBRASIAEME Publication
 
A common fixed point of integral type contraction in generalized metric spacess
A  common fixed point of integral type contraction in generalized metric spacessA  common fixed point of integral type contraction in generalized metric spacess
A common fixed point of integral type contraction in generalized metric spacessAlexander Decker
 
Fractional integration and fractional differentiation of the product of m ser...
Fractional integration and fractional differentiation of the product of m ser...Fractional integration and fractional differentiation of the product of m ser...
Fractional integration and fractional differentiation of the product of m ser...Alexander Decker
 

What's hot (18)

Relation function
Relation functionRelation function
Relation function
 
Stability criterion of periodic oscillations in a (15)
Stability criterion of periodic oscillations in a (15)Stability criterion of periodic oscillations in a (15)
Stability criterion of periodic oscillations in a (15)
 
Cubic BF-Algebra
Cubic BF-AlgebraCubic BF-Algebra
Cubic BF-Algebra
 
ملزمة الرياضيات للصف السادس الاحيائي الفصل الاول
ملزمة الرياضيات للصف السادس الاحيائي الفصل الاولملزمة الرياضيات للصف السادس الاحيائي الفصل الاول
ملزمة الرياضيات للصف السادس الاحيائي الفصل الاول
 
2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث تطبيقات التفاضل
2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث   تطبيقات التفاضل2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث   تطبيقات التفاضل
2022 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثالث تطبيقات التفاضل
 
Dynamic Programming Over Graphs of Bounded Treewidth
Dynamic Programming Over Graphs of Bounded TreewidthDynamic Programming Over Graphs of Bounded Treewidth
Dynamic Programming Over Graphs of Bounded Treewidth
 
ملزمة الرياضيات للصف السادس التطبيقي الفصل الثاني القطوع المخروطية 2022
ملزمة الرياضيات للصف السادس التطبيقي الفصل الثاني القطوع المخروطية 2022ملزمة الرياضيات للصف السادس التطبيقي الفصل الثاني القطوع المخروطية 2022
ملزمة الرياضيات للصف السادس التطبيقي الفصل الثاني القطوع المخروطية 2022
 
A new generalized lindley distribution
A new generalized lindley distributionA new generalized lindley distribution
A new generalized lindley distribution
 
Partial ordering in soft set context
Partial ordering in soft set contextPartial ordering in soft set context
Partial ordering in soft set context
 
ملزمة الرياضيات للصف السادس الاحيائي الفصل الثاني القطوع المخروطية 2022
 ملزمة الرياضيات للصف السادس الاحيائي الفصل الثاني القطوع المخروطية 2022  ملزمة الرياضيات للصف السادس الاحيائي الفصل الثاني القطوع المخروطية 2022
ملزمة الرياضيات للصف السادس الاحيائي الفصل الثاني القطوع المخروطية 2022
 
Conjuntos presentac
Conjuntos presentacConjuntos presentac
Conjuntos presentac
 
Some properties of gi closed sets in topological space.docx
Some properties of gi  closed sets in topological space.docxSome properties of gi  closed sets in topological space.docx
Some properties of gi closed sets in topological space.docx
 
X std maths - Relations and functions (ex 1.1)
X std maths - Relations and functions  (ex 1.1)X std maths - Relations and functions  (ex 1.1)
X std maths - Relations and functions (ex 1.1)
 
Simultaneous Triple Series Equations Involving Konhauser Biorthogonal Polynom...
Simultaneous Triple Series Equations Involving Konhauser Biorthogonal Polynom...Simultaneous Triple Series Equations Involving Konhauser Biorthogonal Polynom...
Simultaneous Triple Series Equations Involving Konhauser Biorthogonal Polynom...
 
Sub artex spaces of an artex space over a bi monoid
Sub artex spaces of an artex space over a bi monoidSub artex spaces of an artex space over a bi monoid
Sub artex spaces of an artex space over a bi monoid
 
FUZZY IDEALS AND FUZZY DOT IDEALS ON BH-ALGEBRAS
FUZZY IDEALS AND FUZZY DOT IDEALS ON BH-ALGEBRASFUZZY IDEALS AND FUZZY DOT IDEALS ON BH-ALGEBRAS
FUZZY IDEALS AND FUZZY DOT IDEALS ON BH-ALGEBRAS
 
A common fixed point of integral type contraction in generalized metric spacess
A  common fixed point of integral type contraction in generalized metric spacessA  common fixed point of integral type contraction in generalized metric spacess
A common fixed point of integral type contraction in generalized metric spacess
 
Fractional integration and fractional differentiation of the product of m ser...
Fractional integration and fractional differentiation of the product of m ser...Fractional integration and fractional differentiation of the product of m ser...
Fractional integration and fractional differentiation of the product of m ser...
 

Viewers also liked

GBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaGBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaRoderic Page
 
Automated phylogenetic taxonomy in Fungi. DS. Hibbett
Automated phylogenetic taxonomy in Fungi. DS. HibbettAutomated phylogenetic taxonomy in Fungi. DS. Hibbett
Automated phylogenetic taxonomy in Fungi. DS. HibbettRoderic Page
 
Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Roderic Page
 
iPhy tools for collation and analysis of phylogenomic data. M Blaxter
iPhy tools for collation and analysis of phylogenomic data. M BlaxteriPhy tools for collation and analysis of phylogenomic data. M Blaxter
iPhy tools for collation and analysis of phylogenomic data. M BlaxterRoderic Page
 
Biodiversity Knowledge Graphs
Biodiversity Knowledge GraphsBiodiversity Knowledge Graphs
Biodiversity Knowledge GraphsRoderic Page
 
Elsevier Challenge
Elsevier ChallengeElsevier Challenge
Elsevier ChallengeRoderic Page
 
Why aren't we there yet?
Why aren't we there yet?Why aren't we there yet?
Why aren't we there yet?Roderic Page
 
Phylogenetic Signal with Induction and non-Contradiction - V Berry
Phylogenetic Signal with Induction and non-Contradiction - V BerryPhylogenetic Signal with Induction and non-Contradiction - V Berry
Phylogenetic Signal with Induction and non-Contradiction - V BerryRoderic Page
 
Biodiversity informatics: why aren't we there yet?
Biodiversity informatics: why aren't we there yet?Biodiversity informatics: why aren't we there yet?
Biodiversity informatics: why aren't we there yet?Roderic Page
 
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge GraphBuilding the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge GraphRoderic Page
 
Two graphs, three responses
Two graphs, three responsesTwo graphs, three responses
Two graphs, three responsesRoderic Page
 
Data mining and data linking
Data mining and data linkingData mining and data linking
Data mining and data linkingRoderic Page
 
Surfacing the deep data of taxonomy
Surfacing the deep data of taxonomySurfacing the deep data of taxonomy
Surfacing the deep data of taxonomyRoderic Page
 
A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaRoderic Page
 
Phylogeny VIZBI 2011
Phylogeny VIZBI 2011Phylogeny VIZBI 2011
Phylogeny VIZBI 2011Roderic Page
 

Viewers also liked (18)

GBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaGBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, India
 
Automated phylogenetic taxonomy in Fungi. DS. Hibbett
Automated phylogenetic taxonomy in Fungi. DS. HibbettAutomated phylogenetic taxonomy in Fungi. DS. Hibbett
Automated phylogenetic taxonomy in Fungi. DS. Hibbett
 
Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21
 
iPhy tools for collation and analysis of phylogenomic data. M Blaxter
iPhy tools for collation and analysis of phylogenomic data. M BlaxteriPhy tools for collation and analysis of phylogenomic data. M Blaxter
iPhy tools for collation and analysis of phylogenomic data. M Blaxter
 
Biodiversity Knowledge Graphs
Biodiversity Knowledge GraphsBiodiversity Knowledge Graphs
Biodiversity Knowledge Graphs
 
Elsevier Challenge
Elsevier ChallengeElsevier Challenge
Elsevier Challenge
 
Why aren't we there yet?
Why aren't we there yet?Why aren't we there yet?
Why aren't we there yet?
 
Phylogenetic Signal with Induction and non-Contradiction - V Berry
Phylogenetic Signal with Induction and non-Contradiction - V BerryPhylogenetic Signal with Induction and non-Contradiction - V Berry
Phylogenetic Signal with Induction and non-Contradiction - V Berry
 
Biodiversity informatics: why aren't we there yet?
Biodiversity informatics: why aren't we there yet?Biodiversity informatics: why aren't we there yet?
Biodiversity informatics: why aren't we there yet?
 
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge GraphBuilding the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
 
Open taxonomy
Open taxonomyOpen taxonomy
Open taxonomy
 
Two graphs, three responses
Two graphs, three responsesTwo graphs, three responses
Two graphs, three responses
 
Data mining and data linking
Data mining and data linkingData mining and data linking
Data mining and data linking
 
Phylogeny
PhylogenyPhylogeny
Phylogeny
 
Surfacing the deep data of taxonomy
Surfacing the deep data of taxonomySurfacing the deep data of taxonomy
Surfacing the deep data of taxonomy
 
A search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-BacaA search engine for phylogenetic tree databases - D. Fernándes-Baca
A search engine for phylogenetic tree databases - D. Fernándes-Baca
 
Taxonomy
TaxonomyTaxonomy
Taxonomy
 
Phylogeny VIZBI 2011
Phylogeny VIZBI 2011Phylogeny VIZBI 2011
Phylogeny VIZBI 2011
 

Similar to Robustness of supertree methods for reconciling dense incompatible data. SJ. Willson

New Perspectives on HTML5 CSS3 JavaScript 6th Edition Carey Solutions Manual
New Perspectives on HTML5 CSS3 JavaScript 6th Edition Carey Solutions ManualNew Perspectives on HTML5 CSS3 JavaScript 6th Edition Carey Solutions Manual
New Perspectives on HTML5 CSS3 JavaScript 6th Edition Carey Solutions ManualRileyKents
 
S1 2005 jan
S1 2005 janS1 2005 jan
S1 2005 janpanovsky
 
Discrete Mathematics and Its Applications 7th Edition Rose Solutions Manual
Discrete Mathematics and Its Applications 7th Edition Rose Solutions ManualDiscrete Mathematics and Its Applications 7th Edition Rose Solutions Manual
Discrete Mathematics and Its Applications 7th Edition Rose Solutions ManualTallulahTallulah
 
Core 4 Parametric Equations 2
Core 4 Parametric Equations 2Core 4 Parametric Equations 2
Core 4 Parametric Equations 2davidmiles100
 
Set theory and relation
Set theory and relationSet theory and relation
Set theory and relationankush_kumar
 
Notes and Formulae Mathematics SPM
Notes and Formulae Mathematics SPM Notes and Formulae Mathematics SPM
Notes and Formulae Mathematics SPM Zhang Ewe
 
Notes and-formulae-mathematics
Notes and-formulae-mathematicsNotes and-formulae-mathematics
Notes and-formulae-mathematicsAh Ching
 
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,aTheta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,aijcsa
 
Decision Maths 1 Chapter 3 Algorithms on Graphs (including Floyd A2 content)....
Decision Maths 1 Chapter 3 Algorithms on Graphs (including Floyd A2 content)....Decision Maths 1 Chapter 3 Algorithms on Graphs (including Floyd A2 content)....
Decision Maths 1 Chapter 3 Algorithms on Graphs (including Floyd A2 content)....SintooChauhan6
 
Summative Assessment Paper-4
Summative Assessment Paper-4Summative Assessment Paper-4
Summative Assessment Paper-4APEX INSTITUTE
 
Module 3 similarity
Module 3   similarityModule 3   similarity
Module 3 similaritydionesioable
 
Solucao_Marion_Thornton_Dinamica_Classic (1).pdf
Solucao_Marion_Thornton_Dinamica_Classic (1).pdfSolucao_Marion_Thornton_Dinamica_Classic (1).pdf
Solucao_Marion_Thornton_Dinamica_Classic (1).pdfFranciscoJavierCaedo
 
Answers withexplanations
Answers withexplanationsAnswers withexplanations
Answers withexplanationsGopi Saiteja
 

Similar to Robustness of supertree methods for reconciling dense incompatible data. SJ. Willson (20)

New Perspectives on HTML5 CSS3 JavaScript 6th Edition Carey Solutions Manual
New Perspectives on HTML5 CSS3 JavaScript 6th Edition Carey Solutions ManualNew Perspectives on HTML5 CSS3 JavaScript 6th Edition Carey Solutions Manual
New Perspectives on HTML5 CSS3 JavaScript 6th Edition Carey Solutions Manual
 
S1 2005 jan
S1 2005 janS1 2005 jan
S1 2005 jan
 
Discrete Mathematics and Its Applications 7th Edition Rose Solutions Manual
Discrete Mathematics and Its Applications 7th Edition Rose Solutions ManualDiscrete Mathematics and Its Applications 7th Edition Rose Solutions Manual
Discrete Mathematics and Its Applications 7th Edition Rose Solutions Manual
 
Core 4 Parametric Equations 2
Core 4 Parametric Equations 2Core 4 Parametric Equations 2
Core 4 Parametric Equations 2
 
Math iecep
Math iecepMath iecep
Math iecep
 
Set theory and relation
Set theory and relationSet theory and relation
Set theory and relation
 
Notes and Formulae Mathematics SPM
Notes and Formulae Mathematics SPM Notes and Formulae Mathematics SPM
Notes and Formulae Mathematics SPM
 
Notes and-formulae-mathematics
Notes and-formulae-mathematicsNotes and-formulae-mathematics
Notes and-formulae-mathematics
 
Mathematics formulas
Mathematics formulasMathematics formulas
Mathematics formulas
 
Rumus matematik examonline spa
Rumus matematik examonline spaRumus matematik examonline spa
Rumus matematik examonline spa
 
Em04 il
Em04 ilEm04 il
Em04 il
 
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,aTheta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
Theta θ(g,x) and pi π(g,x) polynomials of hexagonal trapezoid system tb,a
 
Equações 2
Equações 2Equações 2
Equações 2
 
ITA 2013 - aberta
ITA 2013 - abertaITA 2013 - aberta
ITA 2013 - aberta
 
Decision Maths 1 Chapter 3 Algorithms on Graphs (including Floyd A2 content)....
Decision Maths 1 Chapter 3 Algorithms on Graphs (including Floyd A2 content)....Decision Maths 1 Chapter 3 Algorithms on Graphs (including Floyd A2 content)....
Decision Maths 1 Chapter 3 Algorithms on Graphs (including Floyd A2 content)....
 
Summative Assessment Paper-4
Summative Assessment Paper-4Summative Assessment Paper-4
Summative Assessment Paper-4
 
Module 3 similarity
Module 3   similarityModule 3   similarity
Module 3 similarity
 
Solucao_Marion_Thornton_Dinamica_Classic (1).pdf
Solucao_Marion_Thornton_Dinamica_Classic (1).pdfSolucao_Marion_Thornton_Dinamica_Classic (1).pdf
Solucao_Marion_Thornton_Dinamica_Classic (1).pdf
 
IME 2014 - fechada
IME 2014 - fechadaIME 2014 - fechada
IME 2014 - fechada
 
Answers withexplanations
Answers withexplanationsAnswers withexplanations
Answers withexplanations
 

More from Roderic Page

ALEC (A List of Everything Cool)
ALEC (A List of Everything Cool)ALEC (A List of Everything Cool)
ALEC (A List of Everything Cool)Roderic Page
 
Wikidata and the Biodiversity Knowledge Graph
Wikidata and the Biodiversity Knowledge GraphWikidata and the Biodiversity Knowledge Graph
Wikidata and the Biodiversity Knowledge GraphRoderic Page
 
Ozymandias - from an atlas to a knowledge graph of living Australia
Ozymandias - from an atlas to a knowledge graph of living AustraliaOzymandias - from an atlas to a knowledge graph of living Australia
Ozymandias - from an atlas to a knowledge graph of living AustraliaRoderic Page
 
SLiDInG6 talk on biodiversity knowledge graph
SLiDInG6 talk on biodiversity knowledge graphSLiDInG6 talk on biodiversity knowledge graph
SLiDInG6 talk on biodiversity knowledge graphRoderic Page
 
Wild idea for TDWG17 Bitcoins, biodiversity and micropayments
Wild idea for TDWG17 Bitcoins, biodiversity and micropaymentsWild idea for TDWG17 Bitcoins, biodiversity and micropayments
Wild idea for TDWG17 Bitcoins, biodiversity and micropaymentsRoderic Page
 
Towards a biodiversity knowledge graph
Towards a biodiversity knowledge graphTowards a biodiversity knowledge graph
Towards a biodiversity knowledge graphRoderic Page
 
The Sam Adams talk
The Sam Adams talkThe Sam Adams talk
The Sam Adams talkRoderic Page
 
Unknown knowns, long tails, and long data
Unknown knowns, long tails, and long dataUnknown knowns, long tails, and long data
Unknown knowns, long tails, and long dataRoderic Page
 
In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...Roderic Page
 
BHL, BioStor, and beyond
BHL, BioStor, and beyondBHL, BioStor, and beyond
BHL, BioStor, and beyondRoderic Page
 
Cisco Digital Catapult
Cisco Digital CatapultCisco Digital Catapult
Cisco Digital CatapultRoderic Page
 
Built in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stBuilt in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stRoderic Page
 
GrBio Workshop talk
GrBio Workshop talkGrBio Workshop talk
GrBio Workshop talkRoderic Page
 
Visualing phylogenies: a personal view
Visualing phylogenies: a personal viewVisualing phylogenies: a personal view
Visualing phylogenies: a personal viewRoderic Page
 
Biodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldBiodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldRoderic Page
 
Something about links
Something about linksSomething about links
Something about linksRoderic Page
 
Why I blog instead of writing papers
Why I blog instead of writing papersWhy I blog instead of writing papers
Why I blog instead of writing papersRoderic Page
 

More from Roderic Page (20)

ALEC (A List of Everything Cool)
ALEC (A List of Everything Cool)ALEC (A List of Everything Cool)
ALEC (A List of Everything Cool)
 
Wikidata and the Biodiversity Knowledge Graph
Wikidata and the Biodiversity Knowledge GraphWikidata and the Biodiversity Knowledge Graph
Wikidata and the Biodiversity Knowledge Graph
 
BioStor Next
BioStor NextBioStor Next
BioStor Next
 
Ozymandias - from an atlas to a knowledge graph of living Australia
Ozymandias - from an atlas to a knowledge graph of living AustraliaOzymandias - from an atlas to a knowledge graph of living Australia
Ozymandias - from an atlas to a knowledge graph of living Australia
 
SLiDInG6 talk on biodiversity knowledge graph
SLiDInG6 talk on biodiversity knowledge graphSLiDInG6 talk on biodiversity knowledge graph
SLiDInG6 talk on biodiversity knowledge graph
 
Wild idea for TDWG17 Bitcoins, biodiversity and micropayments
Wild idea for TDWG17 Bitcoins, biodiversity and micropaymentsWild idea for TDWG17 Bitcoins, biodiversity and micropayments
Wild idea for TDWG17 Bitcoins, biodiversity and micropayments
 
Towards a biodiversity knowledge graph
Towards a biodiversity knowledge graphTowards a biodiversity knowledge graph
Towards a biodiversity knowledge graph
 
The Sam Adams talk
The Sam Adams talkThe Sam Adams talk
The Sam Adams talk
 
Unknown knowns, long tails, and long data
Unknown knowns, long tails, and long dataUnknown knowns, long tails, and long data
Unknown knowns, long tails, and long data
 
In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...
 
BHL, BioStor, and beyond
BHL, BioStor, and beyondBHL, BioStor, and beyond
BHL, BioStor, and beyond
 
Cisco Digital Catapult
Cisco Digital CatapultCisco Digital Catapult
Cisco Digital Catapult
 
Built in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stBuilt in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21st
 
GrBio Workshop talk
GrBio Workshop talkGrBio Workshop talk
GrBio Workshop talk
 
Visualing phylogenies: a personal view
Visualing phylogenies: a personal viewVisualing phylogenies: a personal view
Visualing phylogenies: a personal view
 
Biodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldBiodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living world
 
GBIF ideas
GBIF ideasGBIF ideas
GBIF ideas
 
Something about links
Something about linksSomething about links
Something about links
 
Why I blog instead of writing papers
Why I blog instead of writing papersWhy I blog instead of writing papers
Why I blog instead of writing papers
 
Social media
Social mediaSocial media
Social media
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 

Robustness of supertree methods for reconciling dense incompatible data. SJ. Willson

  • 1. Abstract: Given a collection of rooted phylogenetic trees with overlapping sets of leaves, a supertree S is a single tree whose set of leaves is the union of the input sets of leaves and such that S agrees with each input tree when restricted to the leaves of the input tree. Typically with trees from real data, no supertree exists, and various methods may be utilized to reconcile the incompatibilities in the input trees. This talk focuses on a measure of robustness of a supertree method called its quot;radiusquot; R. For example, if R = 1/10, then whenever T is a candidate binary tree and for all rooted triples ab|c in T we have that {a,b,c} occur together in some input tree and that more than 90% of the input triples involving {a,b,c} are in fact ab|c (a strong assumption), then the method outputs T as the supertree; but this might fail if 90% is replaced by 89%. It is shown that the maximal possible radius for a method is R = 1/2. Many familiar methods, both for supertrees and consensus trees, have R = 0, indicating that they need not output a tree T that would seem to be the natural correct answer. Some methods with the maximal possible R = 1/2 are indicated. Extensions may be presented concerning supertree methods that rely on input distance information as well as the topology of the input trees.
  • 2. Robustness of supertree methods for reconciling dense incompatible data by Stephen J. Willson Department of Mathematics Iowa State University Ames, Iowa 50011 USA swillson@iastate.edu 23 October 2007
  • 3. The Supertree Problem Different researchers use different methods and different genes and different collections of taxa to produce their phylogenetic trees. But we expect a single quot;tree of life.quot; When can we combine the different trees into a single tree containing all the taxa?
  • 4. A purely mathematical version: Here are 3 rooted trees. 1 3 2 5 5 7 8 46 8 2 Do they fit together into a single rooted tree (a supertree)?
  • 5. In this case there are several possible supertrees: 7 6 8 4 5 3 2 1 3 5 6 1 7 8 5 7 2 6 43 8 4 1 2
  • 6. With biological data it is more likely that no true supertree exists. 1 2 3 4 5 2 6 1 6 5 4 There are various ways to try to find an approximate supertree that fits as well as possible. Incompatible data need to be reconciled.
  • 7. Supertree methods have two aspects: (a) Extrapolation (b) Reconciliation of incompatible data Problem. Study quantitatively how well a supertree method reconciles incompatible data.
  • 8. Some Supertree Methods (1) Matrix Representation with Parsimony (MRP). (Baum 1992, Ragan 1992, Sanderson et al 1998) (2) MinCutSupertree. (Semple and Steel 2000) (3) PageMinCutSupertree. (Page 2003) (4) Matrix Representation using Flipping (MRF). (Burleigh et al. 2004) (5) Most Similar Supertree (MSS). (Creevey et al. 2004) (6) MaxCutTriplets. (Snir and Rao 2006) Book: Phylogenetic Supertrees: Combining information to reveal the Tree of Life (ed. Olaf Bininda-Emonds, 2004)
  • 9. A rooted phylogenetic tree with leaf set X can be described by its clusters. The cluster of a vertex consists of all the leaf descendents of that vertex. 8 10 9 11 4 6 5 7 12 3 1 2 X = {1,2,3,4,5,6,7}. The cluster of 9 is {1,2,3,4}. The cluster of 10 is {5,6,7}. The children of {1,2,3,4} are {1,2,3} and {4}.
  • 10. In fact, given such rooted input trees, if there exists a supertree, then one can be found fast by the procedure BUILD (Aho et al. 1981). A common situation is as follows: One seeks the children of the cluster U. 1. Form a graph A(U) in which the vertices are the members of U. 2. Find edges in A(U) in some manner. 3. If the graph is disconnected, the children of U are the members of the components of A(U). Otherwise, do something to disconnect A(U).
  • 11. A common situation occurs when the edges in A(U) have a numerical weight obtained in some manner. U = {a,b,c,d,e} c 1 2 b 1 a d 2 1 3 e One method to deal with this is to use a minimum cut: A cut for a decomposition of U into Y and Z is the sum of the weights of edges broken by the decomposition. Cut {a}, {b,c,d,e} is 2. Cut {a,b}, {c,d,e} is 3. Cut {a,b,c}, {d,e} is 4. There is a unique minimum cut {a}, {b,c,d,e} = 2. Make the children of U = {a,b,c,d,e} be {a} and {b,c,d,e}.
  • 12. Another method is to use a minimum threshold: c 1 2 b 1 a d 2 1 3 e For a threshold !, let A!(U) include only edges with weight > !. If !!= 2, A2({a,b,c,d,e}) is c b a d 3 e The graph decomposes into {a}, {b}, {c}, {d,e}.
  • 13. If !!= 1, c 2 b a d 2 3 e the graph decomposes into {a,b}, {c,d,e}.
  • 14. If !!= 0, c 1 2 b 1 a d 2 1 3 e the graph does not decompose.
  • 15. The method of minimum threshold utilizes the smallest threshold ! such that A!(U) decomposes into at least two components. In this case, the minimum threshold is !!= 1 and A1(U) is c 2 b a d 2 3 e The children of {a,b,c,d,e} using minimum threshold are {a,b} and {c,d,e}.
  • 16. There is always a minimum threshold, possibly 0. It can be computed rapidly. The minimum threshold tree is the tree found by always utilizing the minimum threshold criterion. The issue is how to choose the weights.
  • 17. Let T be a rooted tree. Given distinct leaves a, b, c, say ab | c in T if T has a cluster that contains a and b but not c, and say T contains the rooted triplet ab | c. 4 6 5 7 3 1 2 12|4 14|5 57|3 not 2 4 | 1 not 5 6 | 7
  • 18. Suppose we are given as data a collection D = {Ti : i quot; #} of rooted trees. For each distinct a,b,c define sptD(ab|c) = the normalized triplet support of ab | c as follows: Let den(a,b,c) = # {i quot; #: {a,b,c} $ L(Ti)} num(ab|c) = # {i quot; #: ab | c in Ti}. sptD(ab|c) = %num(ab|c) / den(a,b,c) if den(a,b,c) > 0 &0 if den(a,b,c) = 0.
  • 19. 1 3 2 3 4 4 3 5 2 1 5 4 5 num(45|2) = 1 den(4,5,2) = 2 sptD(45|2) = 1/2 num(45|3) = 2 num(34|5) = 1 den(4,5,3) = 3 den(3,4,5) = 3 sptD(45|3) = 2/3 sptD(34|5) = 1/3 num(35|4) = 0 den(3,4,5) = 3 sptD(35|4) = 0
  • 20. It is immediate that for all distinct a, b, c, 0 ' sptD(ab|c) ' 1 sptD(ab|c) = sptD(ba|c) sptD(ab|c) + sptD(ac|b) + sptD(bc|a) ' 1.
  • 21. Given a, b in U, define msptU(a,b) = max {sptD(ab|c): a,b,c are distinct elements of U}. 1 3 2 3 4 4 3 5 2 1 5 4 5 num(45|2) = 1 den(4,5,2) = 2 sptD(45|2) = 1/2 num(45|3) = 2 num(45|1) = 1 den(4,5,3) = 3 den(4,5,1) = 2 sptD(45|3) = 2/3 sptD(45|1) = 1/2 msptU(4,5) = max{sptD(45|1), sptD(45|2), sptD(45|3)} = 2/3. msptU(4,5) is the normalized triplet support for edge {4,5}.
  • 22. The normalized triplet supertree Construct a tree S as follows: (1) X is a cluster of S. Repeat until done: (2) (i) If U is a cluster of S with two members, U = {a,b}, then the children of U are {a} and {b}. (ii) If U is a cluster of S with more than two members, form A(U) in which the edge {a,b} has weight msptU(a,b). Find the minimal threshold !!such that A!(U) is disconnected. The components of A!(U) are the children of U. The resulting tree S is the normalized triplet supertree or NTS.
  • 23. Input 1 3 2 3 4 4 3 5 2 1 5 4 5 A({1,2,3,4,5}) 2 1 1 1 1/2 1 3 1/2 1/2 1 5 2/3 4
  • 24. The minimum threshold is ! = 1/2. 2 1 1 1 1 3 1 5 2/3 4 The children of {1,2,3,4,5} in the minimal threshold tree are {1} and {2,3,4,5}.
  • 25. A({2,3,4,5}) 2 3 1/2 1/2 1 4 5 2/3 The minimum threshold is !!= 1/2.
  • 26. A({2,3,4,5}) 2 3 1 4 5 2/3 The minimum threshold is !!= 1/2. The children of {2,3,4,5} are {2} and {3,4,5}.
  • 27. A({3,4,5}) 3 1/3 4 5 2/3 The minimum threshold is !!= 1/3. The children of {3,4,5} are {3} and {4,5}.
  • 28. The normalized triplet supertree NTS has clusters {1,2,3,4,5}, {2,3,4,5}, {1}, {2}, {3,4,5}, {3}, {4,5}, {4}, {5}. 1 2 3 4 5
  • 29. Problem. Study how well a supertree method reconciles incompatible data. Let D = {Ti: i quot; #} be a collection of input trees, where Ti has leaf set L(Ti). We seek a supertree for D. We say that collection D is dense if for every three distinct taxa a,b,c in X, there exists an input tree Ti such that {a,b,c} $ L(Ti). A special case of dense data arises when each Ti has leafset X. In this case, the problem is the same as finding a consensus tree. We study how various methods work on dense data.
  • 30. Suppose T is a rooted tree and a,b,c are leaves. sptT(ab|c) = 1 if ab|c in T 0 otherwise T 1 2 3 4 5 sptT(34|2) = 1 sptT(12|4) = 0 sptT(24|1) = 1
  • 31. Main Theorem. Suppose D is dense and T is a binary rooted tree. Assume that for all distinct x, y, z in X we have | sptD(xy|z) - sptT(xy|z) | < 1/2. Then the normalized triplet supertree S is T.
  • 32. The condition | spt(xy|z) - sptT(xy|z) | < 1/2 for all x, y, z means (1) If xy|z in T, then more than half the input {x,y,z} triplets are xy|z. (2) If xy|z is not in T, then fewer than half the input {x,y,z} triplets are xy|z. In this situation, the NTS is T.
  • 33. If T is not binary, there is a more complicated statement. Theorem. Suppose D is dense. Assume that for all distinct x, y, z in X we have | sptD(xy|z) - sptT(xy|z) | < 1/2. Then the normalized triplet supertree S is a refinement of T.
  • 34. Following Atteson 1999, say that the (dense l() radius R of a method of supertree reconstruction is ) provided that, (1) Whenever D is a dense collection of input trees and T is a binary tree such that for all distinct a, b, c | sptD(ab|c) - sptT(ab|c) | < ) then the method constructs T; and (2) For every * > )!there exists a binary tree T and a dense collection D of input trees such that for all distinct a, b, c | sptD(ab|c) - sptT(ab|c) | < * but the method does not reconstruct T. Alternatively, call such ) the dense robustness radius R. Theorem. The normalized triplet supertree method has R + 1/2.
  • 35. Theorem. R = 1/2 is best possible for any method. c a b a b c W T Suppose that we have as input n copies of T and m copies of W. sptD(ab|c) = n/(n+m) = mspt(a,b) sptD(bc|a) = m/(n+m) = mspt(b,c) sptD(ac|b) = 0 | sptD(ab|c) - sptT(ab|c) | = m/(n+m) sptT(ab|c) = 1 | sptD(bc|a) - sptT(bc|a) | = m/(n+m) sptT(bc|a) = 0 | sptD(ac|b) - sptT(ac|b) | = 0 sptT(ac|b) = 0 | sptD(ab|c) - sptW(ab|c) | = n/(n+m) sptW(ab|c) = 0 | sptD(bc|a) - sptW(bc|a) | = n/(n+m) sptW(bc|a) = 1 | sptD(ac|b) - sptW(ac|b) | = 0 sptW(ac|b) = 0
  • 36. Suppose a method yields a tree Y when for all distinct x, y, z in X | sptD(xy|z) - sptY(xy|z) | < * with * > 1/2. We could choose m > n with m/(n+m) < *. Then also n/(n+m) < *. So the method would have to output both T and W.
  • 37. Theorem. The normalized triplet supertree method has dense robustness radius R = 1/2.
  • 38. Complexity of NTS Theorem. Let X be a finite set, |X| = n, and let D be a collection of m input rooted trees with leaves from X. The computation of the normalized triplet supertree S takes time O(n4) + O(n3 m).
  • 39. Some Supertree Methods (1) Matrix Representation with Parsimony (MRP). (Baum 1992, Ragan 1992, Sanderson et al 1998) (2) MinCutSupertree. (Semple and Steel 2000) (3) PageMinCutSupertree. (modified by Page 2003) (4) Matrix Representation using Flipping (MRF). (Burleigh et al 2004) (5) Normalized Triplet Supertree (NTS).
  • 40. Method Complexity NTS polynomial MRP NP-hard MinCutSupertree (orig) polynomial MinCutSupertree (Page) polynomial triplet MRP NP-hard MRF NP-hard
  • 41. Method Dense robustness radius NTS 1/2 MRP 0 [R ' 0.05] MinCutSupertree (orig) 0 MinCutSupertree (Page) 0 triplet MRP 1/2 triplet MRF 1/2
  • 42. Interpretation: There exists a binary tree T and dense input data D such that (1) whenever xy|z in T, then at least 99% of the input {x,y,z} triplets are xy|z; but (2) the MinCutSupertree is not T.
  • 43. A special case is consensus trees, when all input trees have the same leaf set. T 6 6 6 5 5 5 4 4 3 3 2 2 12 4 3 1 1 6 5 4 4 3 3 2 2 5 1 6 1
  • 44. Majority consensus tree M M 6 5 1 4 23 although if you include clusters in order of occurrence until you fail to get a tree, then we obtain 6 5 4 1 2 3
  • 45. Strict Consensus 2 6 4 3 5 1 Adams Consensus Ad 6 5 3 4 1 2 Normalized Triplet Consensus = MinCutSupertree Consensus 6 5 4 3 1 2
  • 46. Consensus tree computation (all input trees have leaf set X) Method dense robustness radius NTS 1/2 MRP 0 [R ' 0.05] strict consensus 0 triplet MRP 1/2 triplet MRF 1/2 majority consensus 0 Adams consensus 0 MinCutSupertree (orig) 0 MinCutSupertree (Page) 0 NTS is much slower than strict consensus and majority consensus which take time O(n m). And NTS applies only to rooted trees.
  • 47. There are many variants on the method, also with radius 1/2, which may possibly extrapolate better than NTS. Theorem. Suppose a supertree method utilizes minimal threshold trees with the edge support sptU(a,b) for the edge {a,b} in A(U). Suppose (1) if there exists c quot; U with sptD(ab|c) > 1/2, then sptU(a,b) > 1/2; and (2) if for every c quot; U we have sptD(ab|c) < 1/2, then sptU(a,b) < 1/2. Then the dense robustness radius is 1/2.
  • 48. Lemma. Suppose that D is dense and T is a tree with leaf set X = , L(Ti). Assume that for all distinct x, y, z in X we have | sptD(xy|z) - sptT(xy|z) | < 1/2. Suppose that the tree T has a cluster U with exactly 2 children A and B. Suppose S is the NTS and U in S. Then A1/2(U) is disconnected and its components are exactly A and B. The children of U in S are A and B.
  • 49. Lemma. Suppose that D is dense and T is a tree with leaf set X = , L(Ti). Assume that for all distinct x, y, z in X we have | sptD(xy|z) - sptT(xy|z) | < 1/2. Suppose that the tree T has a cluster U with exactly 2 children A and B. Suppose S is the NTS and U in S. Then A1/2(U) is disconnected and its components are exactly A and B. The children of U in S are A and B. a1 b1 a2 b2 a3 b3 a4 Consider a1 and a2 in A. Choose b1 in B. Then a1 a2 | b1 in T. Hence sptT(a1 a2 | b1) = 1. Hence sptD(a1 a2 | b1) > 1/2. Hence mspt(a1, a2) > 1/2. Hence there is an edge {a1, a2}.
  • 50. a1 b1 a2 b2 a3 b3 a4
  • 51. a1 b1 a2 b2 a3 b3 a4 Similarly there is an edge {ai, aj} in A1/2(U).
  • 52. a1 b1 a2 b2 a3 b3 a4 And there is an edge {bi, bj} in A1/2(U).
  • 53. Is there an edge {a1, b1}? Consider any c - a1, c- b1. Say c = a2. Then a1 c | b1 in T. sptT(a1 c | b1) = 1. sptT(a1 b1 | c) = 0. sptD(a1 b1 | c) < 1/2 Indeed sptD(a1 b1 | c) < 1/2 for all c - a1, c- b1. Hence mspt(a1, b1) < 1/2. There is no edge {a1, b1} in A1/2(U).
  • 54. a1 b1 a2 b2 a3 b3 a4 A1/2(U) The children of U in the NTS are A and B.
  • 55. General Properties of NTS Wilkinson et al 2004 propose some quot;desiderata for liberal supertrees.quot; (1) Uniqueness. The method should give a unique answer. (2) Plenary. The resulting supertree should contain all the leaves of the input trees. (3) Order invariance. The method should not be influenced by the order in which members of D are introduced, or the order of leaves in the adjacency matrix of an input tree. (4) Sizeless. The method should not be biased by input trees with large size. Note that if D = {Ti: i quot; #}, then NTS is precisely the same as the multiset D' where each Ti is replaced by the set of its rooted triples and star triples. (5) Shapeless. The method should not be biased by the shape of input trees.
  • 56. (6) Pareto. Relationships in all input trees should appear in the output. Theorem. Suppose that D contains at least one input tree containing all the taxa X. Suppose sptD(ab|c) = 1. Then ab|c in the NTS. (7) Weightable. The method should allow different trees to be weighted. (8) Speed. NTS has polynomial time-complexity. (9) Assessable. The method should allow a measure of the amount of support of the output supertree. The smallest number ) such that for all x, y, z, | sptD(xy|z) - sptT(xy|z) | ' ) gives a measure of the overall support for a tree T. A smaller ) means a better fit.
  • 57. Real data Philip, Creevey, and McInerney (Mol. Biol. Evol. 2005) considered 780 single-gene trees for a set of ten eukaryotes. The numbers of taxa in a given tree ranged widely. They performed a Most Similar Supertree analysis using the software package CLANN. They published and analyzed the resulting tree.
  • 58.
  • 59. I used the 664 trees (out of 780 obtained at the web site http://bioinf.nuim.ie/supplementary/eukaryotes/ ) that contained humans. The NTS is topologically the same tree as the one they published. The NTS satisfied that for all x, y, z | sptD(xy|z) - sptT(xy|z) | ' 0.61111
  • 60. Geometric interpretation Consider the collection of rooted X-trees, where |X| = n. Every rooted X-tree T is uniquely determined by its set of rooted triples ab|c. The total number of possible rooted triples is m= 3 . n / = n(n-1)(n-2)/2 0!3 1 Let HX = [0,1]m be the m-dimensional hypercube in which each coordinate corresponds to a rooted triple ab|c. HX. There are m dimensions.
  • 61. HX is a kind of rooted-tree-landscape space. To any rooted X-tree T there corresponds sptT quot; HX given by (sptT)ab|c = sptT(ab|c). Each rooted X-tree corresponds to a distinct corner of HX. No two distinct trees correspond to the same corner. Not all corners correspond to trees.
  • 62. To every collection D of rooted trees with leaf-sets contained in X, there is a point sptD quot; HX given by (sptD)ab|c = sptD(ab|c). The l( norm on HX is defined by ||u-v||( = max {| uab|c - vab|c |} If sptT is the upper front left corner, then the red box is {u quot; HX: || u - sptT ||( < 1/2}.
  • 63. Theorem. Suppose T is binary and || sptD - sptT ||( < 1/2. Then the NTS is T. If sptT is the upper front left corner, and sptD is in the red box, then the NTS is T.
  • 64. The picture is deceptive since m is very large. If X contains 7 taxa, then m = 105. The volume of HX is 1. The volume of the red box is (1/2)105 2 2.465 quot; 10-32. The total volume of all red boxes for all 10,395 binary rooted trees with 7 taxa is 2.563 quot;!10-28. But this ignores the 35 constraints that for each a, b, c sptD(ab|c) + sptD(ac|b) + sptD(bc|a) ' 1.
  • 65. For the Philip et al. dataset with 10 eukaryotes, if T = NTS then || sptD - sptT ||( = 0.61111 Here dim(HX) = 360.
  • 66. Closest Tree Problem. Given the set of input trees D, find a rooted X-tree T that minimizes ||sptD - sptT ||( The NTS solves this problem if the solution T is binary and satisfies ||sptD - sptT ||(!< 1/2.
  • 67. Closest Tree Problem. Given the set of input rooted trees D, find a rooted X-tree T that minimizes ||sptD - sptT ||( Note ||sptD - sptT ||( = max { | sptD(xy|z) - sptT(xy|z) | } = max { max { sptD(xy|z) : xy|z not in T}, max { (1 - sptD(xy|z)) : xy|z in T} } Let DweakD(T) := max {sptD(xy|z) : xy|z not in T}.
  • 68. Weak Closest Tree Problem. Given the set of input rooted trees D, find a rooted X-tree T that minimizes DweakD(T) := max {sptD(xy|z) : xy|z not in T}.
  • 69. Weak Closest Tree Problem. Given the set of input trees D, find an X-tree T that minimizes DweakD(T) := max { sptD(xy|z) : xy|z not in T}. Theorem. The Normalized Triplet Supertree S solves the Weak Closest Tree Problem. There is typically not a unique solution.
  • 70. Theorem. Suppose every input tree in D is binary. Suppose D is dense. Suppose T solves the Closest Tree Problem. Let S be the Normalized Triplet Supertree. Then DweakD(S) ' ||sptT - sptD ||( ' 2 DweakD(S).
  • 71. Comparison with MinCutSupertree Suppose m is a positive integer. If m = 5, we have the input a Y a b b 1 1 2 2 T 3 3 4 4 5 5 with t copies of T and y copies of Y
  • 72. Z1 Z3 Z2 3 1 2 a a b a b 1 b 2 2 1 3 4 3 4 5 5 4 5
  • 73. Z4 Z5 5 4 a a b b 1 1 2 2 3 3 4 5 Input also one copy of Z1, ..., Zm.
  • 74. The data are close to T. a Y a b b 1 1 2 2 T 3 3 4 4 5 5 sptT(ij | a) = 1 sptT(ij | b) = 1 sptT(ab | i) = 1 sptT(jk | i ) = 1 if i<j<k sptT(uv | w) = 0 otherwise sptD(ij | a) = (t + y + m - 2) / (t+y+m) sptD(ij | b) = (t + y + m - 2) / (t+y+m) sptD(ab | i ) = (t + 1) / (t + y + m) sptD(jk | i) = (t+y + m - 2) / (t + y + m) if i < j < k
  • 75. Hence | sptD(uv|w) - sptT(uv|w) | ' (y + m - 1) / (t + y + m) We will choose t = my + m2 - 2m -1. For all u, v, w, as y 3!( | sptD(uv|w) - sptT(uv|w) | ' 1/(m+1). But a check shows that when U = X, {a,b} is not a minimal cut and {a,b} is not a child of the root in MinCutSupertree. Hence for each m we have R ' 1/ (m+1). Hence R = 0.
  • 76. The same example applies to Page MinCutSupertree.
  • 77. Distance Methods Suppose we know the distances on the input trees. Then there are many more constraints. 1 2 1 3 1 1 1 2 1 1 1 3 2 5 5 7 8 2 2 1 2 1 1 46 8 2 Figure 1. A family D of input rooted trees. Branch lengths in red.
  • 78. 1 1 1 1 1 1 7 1 1 1 1 116 1 8 14 5 3 2 1 The unique supertree with distances from Figure 1.
  • 79. 2 1 3 2 2 1 1 1 1 1 1 3 2 5 5 7 8 1 2 2 1 1 1 46 8 2 Figure 2. A different family of input trees with the same topology as in Figure 1 but different branchlengths.
  • 80. 1 1 1 1 1 1 3 5 1 1 1 1 1 1 1 6 1 1 7 8 4 2 The unique supertree with distances for Figure 2.
  • 81. 5 7 2 6 43 8 1 Figure 3. The supertree constructed by BUILD or NTS from the input trees in Figures 1 or 2, ignoring distances. Moral: The use of distances can give extra resolution.
  • 82. How do we use distances? 4(a,b) is the length of the path in the graph of T from {a} to the most recent common ancestor of a and b. 1 1 3 1 1 1 7 2 2 1 1 1 16 1 14 8 5 3 2 1 Example: 4(1,4) = 3 4(4,1) = 1 4(7,4) = 2 4(4,7) = 3
  • 83. d 4 5 3 2 1 d corresponds to the distance from mrca(1,2) to the root. If 4 comes from the tree, then 4(1,4) - 4(1,2) = d 4(1,5) - 4(1,2) = d 4(1,3) - 4(1,2) < d d = max{4(1,c) - 4(1,2): c quot; X, c distinct from 1 and 2}
  • 84. Given D in which each input tree has distances, in each input tree Ti = (Vi, Ei, r, Xi) compute 4i(a,b) for a and b in Xi. Let 4D(a,b) be the average of the 4i(a,b). 4D!is a partially defined function on X quot; X.
  • 85. We wish to use a minimal threshold procedure to build a supertree from D. Suppose U is a cluster. Form A(U). Define the support for the edge {a,b} to be the estimated distance from mrca(a,b) to the root for U. The bigger is this estimate, the more confident we are in the edge {a,b}. spt(1,2) 4 5 3 2 1 Here spt(1,2) = 4(1,4) - 4(1,2)
  • 86. Two possible formulas for support: The primary support of (a,b) is pspt(a,b) = max { 4(a,c) - 4(a,b), 4(b,c) - 4(b,a): c quot; U} The confirmed support of (a,b) is cspt(a,b) = max { min { 4(a,c) - 4(a,b), 4(b,c) - 4(b,a)}: c quot; U}. If 4!is globally defined and is correct on the tree, these give the same numbers. If 4 contains errors, these can be different numbers.
  • 87. Algorithm BUILD-WITH-DISTANCES (W. 2004) Suppose U is a nonempty set of vertices. Define A(U) as follows: (i) The vertex set consists of the members of U. (ii) The weight for edge {a,b} is pspt(a,b), provided it is positive. Find the minimal ! such that A!(U) disconnects, and let the children of U be the components of A!(U). The result is the minimal threshold supertree Sp with primary support. Or if instead the weight for edge {a,b} is cspt(a,b), obtain the minimal threshold supertree Sc with confirmed support.
  • 88. Example. Use the primary support function pspt: 0.5 1 2 2 1 2 2.5 1 1 1 1 2 3 4 5 2 6 2 1 1 2 1 1 1 6 5 4 U = {1,2,3,4,5,6}. 4(2,1)-4(2,6) = 0 4(6,1)-4(6,2) = 0.5 4(2,3)-4(2,6) = 2 4(6,3)-4(6,2) = undef 4(2,4)-4(2,6) = 2 4(6,4)-4(6,2) = -1.5 4(2,5)-4(2,6) = 0.5 4(6,5)-4(6,2) = 0.5 pspt(2,6) = 2
  • 89. Example. Use the confirmed support function cspt: 0.5 1 2 2 1 2 2.5 1 1 1 1 2 3 4 5 2 6 2 1 1 2 1 1 1 6 5 4 U = {1,2,3,4,5,6}. 4(2,1)-4(2,6) = 0 4(6,1)-4(6,2) = 0.5 4(2,3)-4(2,6) = 2 4(6,3)-4(6,2) = undef 4(2,4)-4(2,6) = 2 4(6,4)-4(6,2) = -1.5 4(2,5)-4(2,6) = 0.5 4(6,5)-4(6,2) = 0.5 cspt(2,6) = 0.5
  • 90. 1.5 .75 .75 1.5 1 1 1.5 5 1 1 3 6 4 1 2 Sc.
  • 91. 1.67 0.5 1.75 3 0.25 5 2.33 2.17 1.67 2.33 1 6 4 2 Sp. 6 5 4 2 3 1 NTS = MRP strict consensus of 18 MP trees
  • 92. Suppose that the input trees D = {Ti: i quot; #} have distances. Say that D is distance-dense if for each distinct a and b there exists i quot; #!with {a,b} $ L(Ti). Theorem Suppose that the tree T is binary, the shortest branchlength is 5 > 0, and the distance function on T is 4T. Suppose that each tree in D = {Ti: i quot;!#} has distances and D is distance-dense. For each a and b, let 4D(a,b) be the estimated value 4D(a,b) = average{4i(a,b): a and b are in L(Ti)}. (1) Assume for all x and y | 4D(x,y) - 4T(x,y) | < (1/2) 5 Then Sc = T. (2) Assume for all x and y | 4D(x,y) - 4T(x,y) | < (1/4) 5 Then Sp = T. (3) The numbers (1/2) and (1/4) above are the largest possible for the respective methods.
  • 93. Following Atteson 1999, say that the (distance-dense l() radius r of a method of supertree reconstruction from trees with distances is )! provided that, (1) Suppose D is a distance-dense collection of input trees with distances yielding the function 4. Suppose T is a binary tree with function 4T and minimal branchlength 5. Suppose that for all distinct x, y | 4D(x,y) - 4T(x,y) | < ) 5. Then the method constructs T. (2) For every * > )! there exists a binary tree T with distances and shortest branchlength 5, and there exists a distance-dense collection D of input trees such that for all distinct x, y | 4D(x,y) - 4T(x,y) | < *!5. but the method does not reconstruct T. Alternatively, call such )! the distance-dense robustness radius r.
  • 94. Theorem. (1) The primary supertree method has r = 1/4. (2) The confirmed supertree method has r = 1/2. (3) r = 1/2 is best possible for any method.
  • 95. Summary. The confirmed supertree method is twice as robust as the primary supertree method on distance-dense data and has optimal distance-dense robustness radius.
  • 96. Review (1) Different methods for finding supertrees can differ significantly on how well they perform on dense data to reconcile incompatibilities. (2) For topological input, Triplet MRP and NTS can reconcile incompatibilities close to a tree better than MRP or MinCutSupertree. (3) For consensus methods, NTS reconciles incompatibilities close to a tree better than Majority Consensus and Strict Consensus. (4) NTS solves an optimization problem in tree-space. (5) NTS is polynomial-time and has other nice properties. (6) Distance methods can add resolution if the input trees have distances. (7) For distance-dense data, the confirmed supertree is more robust than the primary supertree.
  • 97. Questions (1) Is there an analogous quantitative measure for how well a supertree method extrapolates data? (2) Which topological methods with radius 1/2 extrapolate well? (3) What is the complexity of the Closest Tree Problem? (4) What is the dense robustness radius of MRF and other methods? (5) What happens if we use a different norm on HX, e.g. ||u - v||2 = 6[7 (uab|c - vab|c)2 ]?
  • 98. Acknowledgments Thanks to the Isaac Newton Institute for Mathematical Sciences for its hospitality at an excellent facility. Thanks to the e-Science Institute for hosting this Phyloinformatics Workshop. Thanks to the organizers Vincent Moulton, Mike Steel, and Daniel Huson of the Phylogenetics Programme. Thanks to Iowa State University for giving me sabbatical this semester.
  • 99. Thank you for your attention.
  • 100. References Aho, A.V., Y. Sagiv, T.G. Szymanski, and J.D. Ullman (1981). Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J. Comput 10 (3), 405-421. K. Atteson (1999). The performance of neighbor-joining methods of phylogenetic reconstruction. Algorithmica 25, 251-278. Baum, B.R. (1992). Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon 41, 3-10. Burleigh, J.G., O. Eulenstein, D. Fernández-Baca, and M.J. Sanderson (2004). MRF Supertrees. in Bininda-Emonds, O.R.P. (ed.) Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life. Kluwer Academic Publishers, Dordrecht, the Netherlands, pp. 65-85. Creevey; C.J., D.A. Fitzpatrick, G.K. Philip, R.J. Kinsella, M.J. O'Connell, M.M. Pentony, S.A. Travers, M. Wilkinson, and J.O. McInerney (2004). Does a tree-like phylogeny only exist at the tips in the prokaryotes? Proc. R. Soc. Lond. B. Biol. Sci. 271, 2551-2558.
  • 101. Page, R.D.M. (2003). Modified mincut supertrees, in R. Guigó and D. Gusfield (Eds), Algorithms in Bioinformatics, Proceedings of the second international workshop, WABI 2002, Rome, Italy, September 17-21, 2002, Lecture Notes in Computer Science 2452, Springer-Verlag, Berlin, pp. 537-552. Philip, G. K., C.J. Creevey, and J.O. McInerney (2005). The Opisthogonta and the Ecdysozoa may not be clades: Stronger support for the grouping of plant and animal than for animal and fungi and stronger support for the Coelomata than Ecdysozoa. Mol. Biol. Evol. 22(5) 1175-1184. Ragan, M.A. (1992). Phylogenetic inference based on matrix representation of trees. Mol. Phylogenet. Evol. 1, 53-58. Sanderson, M.J., A. Purvis, and C. Henze (1998). Phylogenetic supertrees: Assembling the trees of life. Trends Ecol Evol. 13, 105-109. Semple, C. and M. Steel (2000). A supertree method for rooted trees. Discrete Applied Mathematics 105, 147-158. Steel, M., S. Böcker, and A.W.M. Dress (2000). Simple but fundamental limitations on supertree and consensus tree methods. Systematic Biology 49(2) 363-368.
  • 102. Wilkinson, M., J.L. Thorley, D. Pisani, F-J. Lapointe, and J.O. McInerney (2004) Some desiderata for liberal supertrees, pp. 227-246, in Bininda-Emonds,O.R.P. (ed) Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, Kluwer, Dordrecht, The Netherlands. Willson, S.J. Constructing rooted supertrees using distances (2004). Bulletin of Mathematical Biology 66, 1755-1783.