SlideShare uma empresa Scribd logo
1 de 55
Baixar para ler offline
Computing the all-pairs quartet
distance on a set of evolutionary trees
                                Martin Stissing 1
                              Thomas Mailund 1,2
                    Christian Nørgaard Storm Pedersen 1
                             Gerth Stølting Brodal 1
                                Rolf Fagerberg 3


(1) University of Aarhus, (2) Oxford University, (3) University of Southern Denmark
Quartets and quartet distance
                                                         a
Quartet: Four named species in an
                                                                                       d
unrooted tree
                                                                                   c
                                                                b
Quartet topology: The topology of                        a

the quartet induced by the tree                                                        d
                                                                                   c
                                                                b




    butterfly quartet   butterfly quartet   butterfly quartet       star quartet




Quartet distance: The number of quartets that differs in
topology between the two trees
Quartets and quartet distance



        C                             A   C
A

B   D       E                     B
                                      D   E
Quartets and quartet distance
                A   C          A   B
                        ABCD           
                B   D          C   D


        C                                      A   C
A

B   D       E                              B
                                               D   E
Quartets and quartet distance
                A   C          A   B
                        ABCD           
                B   D          C   D
                A   C          A   B
                        ABCE           
        C       B   E          C   E           A   C
A

B   D       E                              B
                                               D   E
Quartets and quartet distance
                A   C          A   B
                        ABCD           
                B   D          C   D
                A   C          A   B
                        ABCE           
        C       B   E          C   E           A   C
A               A   D          A   D
                        ABDE           
                B   E          B   E
B   D       E                              B
                                               D   E
Quartets and quartet distance
                A   C          A   B
                        ABCD           
                B   D          C   D
                A   C          A   B
                        ABCE           
        C       B   E          C   E           A   C
A               A   D          A   D
                        ABDE           
                B   E          B   E
B   D       E   A   D          A   D       B
                        ACDE                  D   E
                C   E          C   E
Quartets and quartet distance
                A   C          A   B
                        ABCD           
                B   D          C   D
                A   C          A   B
                        ABCE           
        C       B   E          C   E           A   C
A               A   D          A   D
                        ABDE           
                B   E          B   E
B   D       E   A   D          A   D       B
                        ACDE                  D   E
                C   E          C   E
                B   D          B   D
                        BCDE           
                C   E          C   E
Quartets and quartet distance
                A      C           A       B
                            ABCD               
                B      D           C       D
                A      C           A       B
                            ABCE               
        C       B      E           C       E           A    C
A               A      D           A       D
                            ABDE               
                B      E           B       E
B   D       E   A      D           A       D       B
                            ACDE                       D   E
                C      E           C       E
                B      D           B       D
                            BCDE               
                C      E           C       E



        Quartet distance = binom(5,4) - 3 = 5 - 3 = 2
Previous work
G. Estabrook, F. McMorris, and C. Meacham.
Comparison of undirected phylogenetic trees based on subtrees of
four evolutionary units. Systematic Zoology., 27:27-33, 1978.

W. Day and C. R. Doucette.
Expected behaviour of quartet distances between undirected tree.
Proc. 18th International Numerical Taxonomy Conference, 1984

C. R. Doucette
An efficient algorithm to compute quartet dissimilarity measures.   O(n3)
Bachelor of Science (Honours) Dissertation.
Memorial University of Newfoundland, 1985

M. Steel and D. Penny.
Distribution of tree comparison metrics—some new results.
Systematic Biology, 42(2):126–141, 1993.

D. Bryant, J. Tsang, P. E. Kearney, and M. Li.                      O(n2)
Computing the quartet distance between evolutionary trees.
Proc. 11th Symp. on Discrete Algorithms (SODA), 285–286, 2000.

G. S. Brodal, R. Fagerberg, and C. N. S. Pedersen                   O(n log2 n)
Computing the quartet distance in time O(n log n).                  O(n log n)
Algorithmica, 38(2): 377-395, 2003.
Previous work
        G. Estabrook, F. McMorris, and C. Meacham.
        Comparison of undirected phylogenetic trees based on subtrees of
        four evolutionary units. Systematic Zoology., 27:27-33, 1978.

        W. Day and C. R. Doucette.
        Expected behaviour of quartet distances between undirected tree.
        Proc. 18th International Numerical Taxonomy Conference, 1984

        C. R. Doucette
        An efficient algorithm to compute quartet dissimilarity measures.   O(n3)
Algorithms are forScience (Honours) Dissertation.
        Bachelor of fully resolved trees (binary trees) only.
        Memorial University of Newfoundland, 1985
Quartet distance between binary trees seems easier
to compute ... and D. Penny.
      M. Steel
        Distribution of tree comparison metrics—some new results.
        Systematic Biology, 42(2):126–141, 1993.

        D. Bryant, J. Tsang, P. E. Kearney, and M. Li.                      O(n2)
        Computing the quartet distance between evolutionary trees.
        Proc. 11th Symp. on Discrete Algorithms (SODA), 285–286, 2000.

        G. S. Brodal, R. Fagerberg, and C. N. S. Pedersen                   O(n log2 n)
        Computing the quartet distance in time O(n log n).                  O(n log n)
        Algorithmica, 38(2): 377-395, 2003.
C. Christiansen, T. Mailund, C. N. S. Pedersen, and M. Randers
                                 Previous work
                 Algorithms for computing the quartet distance between trees of arbitrary degree
                 Proc of Workshop on Algorithms in Bioinformatics (WABI), 77-88, 2005       O(n3)
                                                                                            O(d2n2)
        G. Estabrook, F. McMorris, and C. Meacham. Pedersen, M. Randers, and M. Stissing
                 C. Christiansen, T. Mailund, C. N. S.
        Comparison of undirected phylogeneticdistance between trees ofof
                 Fast calculation of the quartet trees based on subtrees arbitrary degrees
        four evolutionary units. Systematic Zoology., 27:27-33, 1978.
                 Algorithms for Molecular Biology, 1(16), 2006                             O(dn2)
        W. Day and Stissing, C. N. S. Pedersen, T. Mailund, G. S. Brodal, and R. Fagerberg
                 M. C. R. Doucette.
        Expected behaviour the quartet distance between evolutionary trees of bounded degree
                 Computing of quartet distances between undirected tree.
                th
        Proc. 18 Proc. of APBC, 2007
                   International Numerical Taxonomy Conference, 1984                       O(d9 n log n)
        C. R. Doucette
        An efficient algorithm to compute quartet dissimilarity measures.         O(n3)
Algorithms are forScience (Honours) Dissertation.
        Bachelor of fully resolved trees (binary trees) only.
        Memorial University of Newfoundland, 1985
Quartet distance between binary trees seems easier
to compute ... and D. Penny.
      M. Steel
        Distribution of tree comparison metrics—some new results.
        Systematic Biology, 42(2):126–141, 1993.

        D. Bryant, J. Tsang, P. E. Kearney, and M. Li.                            O(n2)
        Computing the quartet distance between evolutionary trees.
        Proc. 11th Symp. on Discrete Algorithms (SODA), 285–286, 2000.

        G. S. Brodal, R. Fagerberg, and C. N. S. Pedersen                         O(n log2 n)
        Computing the quartet distance in time O(n log n).                        O(n log n)
        Algorithmica, 38(2): 377-395, 2003.
QDist
QDist: Implements the O(n2) and O(n log2 n) time algorithm for
computing the quartet distance between binary trees




                 http://www.birc.au.dk/~mailund/qdist.html
QDist
QDist: Implements the O(n2) and O(n log2 n) time algorithm for
computing the quartet distance between binary trees
         ... we want to compare 10,000 trees of size 200 using Qdist, is
         that possible? Takes about 2·10,0002 sec ≈ 6 years ...




                   http://www.birc.au.dk/~mailund/qdist.html
QDist
QDist: Implements the O(n2) and O(n log2 n) time algorithm for
computing the quartet distance between binary trees
         ... we want to compare 10,000 trees of size 200 using Qdist, is
         that possible? Takes about 2·10,0002 sec ≈ 6 years ...

         Immediate solution: Perform O(k2) comparisons between
         two tree using time O(n log2 n) or O(n2) each ...




                   http://www.birc.au.dk/~mailund/qdist.html
QDist
QDist: Implements the O(n2) and O(n log2 n) time algorithm for
computing the quartet distance between binary trees
         ... we want to compare 10,000 trees of size 200 using Qdist, is
         that possible? Takes about 2·10,0002 sec ≈ 6 years ...

         Immediate solution: Perform O(k2) comparisons between
         two tree using time O(n log2 n) or O(n2) each ...

         Faster solution: Exploit similarities between the input trees
         by merging them into a single DAG-structure which allows
         “common subtrees” to be shared.

         All comparisons are performed by comparing the DAG against
         the input trees, or against itself.

         The speed-up depends on compactness of the DAG ...




                   http://www.birc.au.dk/~mailund/qdist.html
Counting shared (directed) quartets
                                                   Directed quartets
Idea: Iterate over all pairs of edges in the two
trees, and count for each pair how many
associated quartets are shared. The problem is
to define “associated” such each quartet is
counted as most once ...



                               b                        b
                          B1                       B2
                   e1                         e2
         a,d A1                     a,d A2

                          C1                       C2
                               c                        c
Counting using colouring

Idea: Alternatively, colour leaves
according to nodes in one tree
and count compatible colourings
in the other



 All colourings in one tree can be produced in O(n log n) –
 Smaller half trick
 Each colouring can be counted in amortized O(log n) –
 Hierarchical decomposition and polynomial manipulations

    Brodal et al.: Computing the Quartet Distance between Evolutionary Trees
    in Time O(n log n), Algoritmica 38(2), 2003.
Comparing sets of trees

Observation: Shared sub-trees
implies shared quartets.
Comparing sets of trees

Observation: We can share
results between trees by merging
them into a DAG.
Experiment – DAG size
The size of the DAG depends on the similarity of the input trees




                      Simulation setup

    Simulating one binary tree with n leaves: evolve kn
    sequences of length m on the k tree, construct k
    neighbor joining trees from the resulting kn sequences
Experiment – Running time
         Running time of “DAG versus DAG” algorithm on a standard Linux PC




The straightforward “all-against-   The “DAG-DAG” approach takes between 20 and 500
all” approach takes 3000            seconds for the same comparisons depending on the
second for comparing 100 trees      similarity of the trees
of size 500
Summary
Results

Utilizing shared tree structure speeds up
the computation significantly ...

Future work

Extension to non-binary trees and trees
with unequal leaf sets

Applications?
Computing the quartet distance
        between evolutionary trees of
              bounded degree

                                Martin Stissing 1
                    Christian Nørgaard Storm Pedersen 1
                              Thomas Mailund 1,2
                             Gerth Stølting Brodal 1
                                Rolf Fagerberg 3


(1) University of Aarhus, (2) Oxford University, (3) University of Southern Denmark
Comparing partially resolved trees
C. Christiansen, T. Mailund, C. N. S. Pedersen, and M. Randers
                                 Previous work
                 Algorithms for computing the quartet distance between trees of arbitrary degree
                 Proc of Workshop on Algorithms in Bioinformatics (WABI), 77-88, 2005       O(n3)
                                                                                            O(d2n2)
        G. Estabrook, F. McMorris, and C. Meacham. Pedersen, M. Randers, and M. Stissing
                 C. Christiansen, T. Mailund, C. N. S.
        Comparison of undirected phylogeneticdistance between trees ofof
                 Fast calculation of the quartet trees based on subtrees arbitrary degrees
        four evolutionary units. Systematic Zoology., 27:27-33, 1978.
                 Algorithms for Molecular Biology, 1(16), 2006                             O(dn2)
        W. Day and Stissing, C. N. S. Pedersen, T. Mailund, G. S. Brodal, and R. Fagerberg
                 M. C. R. Doucette.
        Expected behaviour the quartet distance between evolutionary trees of bounded degree
                 Computing of quartet distances between undirected tree.
                th
        Proc. 18 Proc. of APBC, 2007
                   International Numerical Taxonomy Conference, 1984                       O(d9 n log n)
        C. R. Doucette
        An efficient algorithm to compute quartet dissimilarity measures.         O(n3)
Algorithms are forScience (Honours) Dissertation.
        Bachelor of fully resolved trees (binary trees) only.
        Memorial University of Newfoundland, 1985
Quartet distance between binary trees seems easier
to compute ... and D. Penny.
      M. Steel
        Distribution of tree comparison metrics—some new results.
        Systematic Biology, 42(2):126–141, 1993.

        D. Bryant, J. Tsang, P. E. Kearney, and M. Li.                            O(n2)
        Computing the quartet distance between evolutionary trees.
        Proc. 11th Symp. on Discrete Algorithms (SODA), 285–286, 2000.

        G. S. Brodal, R. Fagerberg, and C. N. S. Pedersen                         O(n log2 n)
        Computing the quartet distance in time O(n log n).                        O(n log n)
        Algorithmica, 38(2): 377-395, 2003.
C. Christiansen, T. Mailund, C. N. S. Pedersen, and M. Randers
                                 Previous work
                 Algorithms for computing the quartet distance between trees of arbitrary degree
                 Proc of Workshop on Algorithms in Bioinformatics (WABI), 77-88, 2005       O(n3)
                                                                                            O(d2n2)
        G. Estabrook, F. McMorris, and C. Meacham. Pedersen, M. Randers, and M. Stissing
                 C. Christiansen, T. Mailund, C. N. S.
        Comparison of undirected phylogeneticdistance between trees ofof
                 Fast calculation of the quartet trees based on subtrees arbitrary degrees
        four evolutionary units. Systematic Zoology., 27:27-33, 1978.
                 Algorithms for Molecular Biology, 1(16), 2006                             O(dn2)
        W. Day and Stissing, C. N. S. Pedersen, T. Mailund, G. S. Brodal, and R. Fagerberg
                 M. C. R. Doucette.
        Expected behaviour the quartet distance between evolutionary trees of bounded degree
                 Computing of quartet distances between undirected tree.
                th
        Proc. 18 Proc. of APBC, 2007
                   International Numerical Taxonomy Conference, 1984                       O(d9 n log n)
        C. R. Doucette
        An efficient algorithm to compute quartet dissimilarity measures.         O(n3)
Algorithms are forScience (Honours) Dissertation.
        Bachelor of fully resolved trees (binary trees) only.
        Memorial University of Newfoundland, 1985
Quartet distance between binary trees seems easier
to compute ... and D. Penny.
      M. Steel
        Distribution of tree comparison metrics—some new results.
        Systematic Biology, 42(2):126–141, 1993.

        D. Bryant, J. Tsang, P. E. Kearney, and M. Li.                            O(n2)
        Computing the quartet distance between evolutionary trees.
        Proc. 11th Symp. on Discrete Algorithms (SODA), 285–286, 2000.

        G. S. Brodal, R. Fagerberg, and C. N. S. Pedersen                         O(n log2 n)
        Computing the quartet distance in time O(n log n).                        O(n log n)
        Algorithmica, 38(2): 377-395, 2003.
Algorithms for partially resolved trees
Idea: Iterate over all pairs of edges (or nodes) in
the two trees, and count for each pair how many
associated quartets are shared. The problem is to
define “associated” such each quartet is counted
as most once ...

                                 Different solutions
Type                    Time                           Space

Center based            O(n3)                          O(n)
Edge based              O(|V||V'|·d·d')                O(n2)
Edge based              O(n+|V||V'|·id·id')            O(n+|V||V'|)
Node based              O(n+|V||V'|·min{id,id'})       O(n+|V||V'|)

- n is the number of species/leaves
                                                               “worst case”-tree
- |V|, |V'| are number of internal nodes in T and T'
                                                                 |V|=id=O(n)
- id, id' are internal degree of T and T'
QuartetDist
  QuartetDist: Computes quartet distance (or normalized
  similarity, i.e. quartet fit similarity) between general trees




                                                                         A simple GUI
Implementation in Java by Martin Randers and Chris Christiansen


                                      http://www.birc.au.dk/~chrisc/qdist/
C. Christiansen, T. Mailund, C. N. S. Pedersen, and M. Randers
                                 Previous work
                 Algorithms for computing the quartet distance between trees of arbitrary degree
                 Proc of Workshop on Algorithms in Bioinformatics (WABI), 77-88, 2005       O(n3)
                                                                                            O(d2n2)
        G. Estabrook, F. McMorris, and C. Meacham. Pedersen, M. Randers, and M. Stissing
                 C. Christiansen, T. Mailund, C. N. S.
        Comparison of undirected phylogeneticdistance between trees ofof
                 Fast calculation of the quartet trees based on subtrees arbitrary degrees
        four evolutionary units. Systematic Zoology., 27:27-33, 1978.
                 Algorithms for Molecular Biology, 1(16), 2006                             O(dn2)
        W. Day and Stissing, C. N. S. Pedersen, T. Mailund, G. S. Brodal, and R. Fagerberg
                 M. C. R. Doucette.
        Expected behaviour the quartet distance between evolutionary trees of bounded degree
                 Computing of quartet distances between undirected tree.
                th
        Proc. 18 Proc. of APBC, 2007
                   International Numerical Taxonomy Conference, 1984                       O(d9 n log n)
        C. R. Doucette
        An efficient algorithm to compute quartet dissimilarity measures.         O(n3)
Algorithms are forScience (Honours) Dissertation.
        Bachelor of fully resolved trees (binary trees) only.
        Memorial University of Newfoundland, 1985
Quartet distance between binary trees seems easier
to compute ... and D. Penny.
      M. Steel
        Distribution of tree comparison metrics—some new results.
        Systematic Biology, 42(2):126–141, 1993.

        D. Bryant, J. Tsang, P. E. Kearney, and M. Li.                            O(n2)
        Computing the quartet distance between evolutionary trees.
        Proc. 11th Symp. on Discrete Algorithms (SODA), 285–286, 2000.

        G. S. Brodal, R. Fagerberg, and C. N. S. Pedersen                         O(n log2 n)
        Computing the quartet distance in time O(n log n).                        O(n log n)
        Algorithmica, 38(2): 377-395, 2003.
Counting using colouring

Idea: Alternatively, colour leaves
according to nodes in one tree
and count compatible colourings
in the other


       This approach does not explicitly consider all
       pairs of nodes/edges, which makes it possible to
       achieve a sub-quadratic running time




    Brodal et al.: Computing the Quartet Distance between Evolutionary Trees
    in Time O(n log n), Algoritmica 38(2), 2003.
Counting using colouring

Idea: Alternatively, colour leaves
according to nodes in one tree
and count compatible colourings
in the other


       This approach does not explicitly consider all
       pairs of nodes/edges, which makes it possible to
       achieve a sub-quadratic running time

       Ideas can be generalized to trees of maximum
       degree d . Use d colours instead of 3, use a more
       complex hierarchical decomposition tree ...
    Brodal et al.: Computing the Quartet Distance between Evolutionary Trees
    in Time O(n log n), Algoritmica 38(2), 2003.
Counting using colouring

Idea: Alternatively, colour leaves
according to nodes in one tree
and count compatible colourings
in the other


       This approach does not explicitly consider all
       pairs of nodes/edges, which makes it possible to
       achieve a sub-quadratic running time

       Ideas can be generalized to trees of maximum
       degree d . Use d colours instead of 3, use a more
       complex hierarchical decomposition tree ...
    Brodal et al.: Computing the Quartet Distance between Evolutionary Trees
    in Time O(n log n), Algoritmica 38(2), 2003.
Counting using colouring

Idea: Alternatively, colour leaves
according to nodes in one tree
and count compatible colourings
in the other
                                       Efficiently calculated using hierarchical
                                       decomposition tree
       This approach does not explicitly consider all
       pairs of nodes/edges, which makes it possible to
       achieve a sub-quadratic running time

       Ideas can be generalized to trees of maximum
       degree d . Use d colours instead of 3, use a more
       complex hierarchical decomposition tree ...
    Brodal et al.: Computing the Quartet Distance between Evolutionary Trees
    in Time O(n log n), Algoritmica 38(2), 2003.
Shared “butterfly” quartets
Shared “star” quartets
Colouring
Colouring
Colouring
Colouring
Colouring
Colouring
Colouring
Colouring
Colouring
Colouring




            Smaller-half trick
Colouring




                      Smaller-half trick




In total: All colourings takes time O(n log n) ...
Hierarchical decomposition tree
Let T be an unrooted tree of size n
with degree at most d. A component C
in T is:

A single node in T, or

A connected subset of nodes in T with
degree at most d.
A hierarchical decomposition tree H(T)
of T is a rooted binary tree:

A leaf is simple component (single
node in T).

An internal node is a composite
component of its children.
Hierarchical decomposition tree
Let T be an unrooted tree of size n
with degree at most d. A component C
in T is:

A single node in T, or

A connected subset of nodes in T with
degree at most d.
A hierarchical decomposition tree H(T)
of T is a rooted binary tree:

A leaf is simple component (single
node in T).

An internal node is a composite
component of its children.
       Lemma: A hierarchical decomposition tree H(T) of T of
      height O(d log n) can be constructed in time O(dn)
Hierarchical decomposition tree
Annotation of node C in H(T)

A d-tuple denoting the number of leaves in
C coloured in each of the d colours.

A function F of d × “degree of C” variables
– representing counts of each colour in the
remaining leaves of T


The function F yields the number of
quartets associated to nodes in C and
compatible with colouring ...
Hierarchical decomposition tree
Annotation of node C in H(T)

A d-tuple denoting the number of leaves in
C coloured in each of the d colours.

A function F of d × “degree of C” variables
– representing counts of each colour in the
remaining leaves of T


The function F yields the number of
quartets associated to nodes in C and
compatible with colouring ...

The function at the root is a constant which
counts the total number of quartets
compatible with the current colouring in
time O(1) ...
Hierarchical decomposition tree
Hierarchical decomposition tree
Hierarchical decomposition tree




Observation: F is a polynomial of at most
d2 variables of degree at most 4, i.e. It can
be manipulated in time O(d8) ...

Lemma: Initial annotation takes time
O(d9n). Updating the annotation when
changing a colour takes amortized time
O(d9 log n) ...
Hierarchical decomposition tree


                        Summing it all up
     Colouring leaves and updating H(T) takes amortized time
     O(log n × d9 log n) per node v1 in T1. Counting takes time
     O(1) per node v1 in T1. In total: O(d9 n log2 n).

     ... with “contractions” in H(T) the total time can be improved
Observation: F is n)polynomial of at most
     to O(d9 n log a ...
d2 variables of degree at most 4, i.e. It can
be manipulated in time O(d8) ...

Lemma: Initial annotation takes time
O(d9n). Updating the annotation when
changing a colour takes amortized time
O(d9 log n) ...

Mais conteúdo relacionado

Semelhante a Presentation at APBC 2007

Circlestangentchordtheorem
CirclestangentchordtheoremCirclestangentchordtheorem
Circlestangentchordtheorem
Anand Swami
 
4 1 guided notes
4 1 guided notes4 1 guided notes
4 1 guided notes
MsKendall
 
maths sample paper 2012 class 09
maths sample paper 2012 class 09maths sample paper 2012 class 09
maths sample paper 2012 class 09
aditya36
 
11X1 T07 02 triangle theorems (2010)
11X1 T07 02 triangle theorems (2010)11X1 T07 02 triangle theorems (2010)
11X1 T07 02 triangle theorems (2010)
Nigel Simmons
 
11X1 T08 02 triangle theorems (2011)
11X1 T08 02 triangle theorems (2011)11X1 T08 02 triangle theorems (2011)
11X1 T08 02 triangle theorems (2011)
Nigel Simmons
 
11 x1 t07 02 triangle theorems (2012)
11 x1 t07 02 triangle theorems (2012)11 x1 t07 02 triangle theorems (2012)
11 x1 t07 02 triangle theorems (2012)
Nigel Simmons
 

Semelhante a Presentation at APBC 2007 (10)

Congruent Triangles
Congruent TrianglesCongruent Triangles
Congruent Triangles
 
Circlestangentchordtheorem
CirclestangentchordtheoremCirclestangentchordtheorem
Circlestangentchordtheorem
 
Quadrilateral
QuadrilateralQuadrilateral
Quadrilateral
 
4 1 guided notes
4 1 guided notes4 1 guided notes
4 1 guided notes
 
k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi,...
k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi,...k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi,...
k-Nearest Neighbors in Uncertain Graphs (Michalis Potamias, Francesco Bonchi,...
 
Quadrilaterals
QuadrilateralsQuadrilaterals
Quadrilaterals
 
maths sample paper 2012 class 09
maths sample paper 2012 class 09maths sample paper 2012 class 09
maths sample paper 2012 class 09
 
11X1 T07 02 triangle theorems (2010)
11X1 T07 02 triangle theorems (2010)11X1 T07 02 triangle theorems (2010)
11X1 T07 02 triangle theorems (2010)
 
11X1 T08 02 triangle theorems (2011)
11X1 T08 02 triangle theorems (2011)11X1 T08 02 triangle theorems (2011)
11X1 T08 02 triangle theorems (2011)
 
11 x1 t07 02 triangle theorems (2012)
11 x1 t07 02 triangle theorems (2012)11 x1 t07 02 triangle theorems (2012)
11 x1 t07 02 triangle theorems (2012)
 

Mais de mailund

Ku 05 08 2009
Ku 05 08 2009Ku 05 08 2009
Ku 05 08 2009
mailund
 
Association mapping using local genealogies
Association mapping using local genealogiesAssociation mapping using local genealogies
Association mapping using local genealogies
mailund
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
mailund
 
Probability And Stats Intro
Probability And Stats IntroProbability And Stats Intro
Probability And Stats Intro
mailund
 
Probability And Stats Intro2
Probability And Stats Intro2Probability And Stats Intro2
Probability And Stats Intro2
mailund
 

Mais de mailund (20)

Chapter 9 divide and conquer handouts with notes
Chapter 9   divide and conquer handouts with notesChapter 9   divide and conquer handouts with notes
Chapter 9 divide and conquer handouts with notes
 
Chapter 9 divide and conquer handouts
Chapter 9   divide and conquer handoutsChapter 9   divide and conquer handouts
Chapter 9 divide and conquer handouts
 
Chapter 9 divide and conquer
Chapter 9   divide and conquerChapter 9   divide and conquer
Chapter 9 divide and conquer
 
Chapter 7 recursion handouts with notes
Chapter 7   recursion handouts with notesChapter 7   recursion handouts with notes
Chapter 7 recursion handouts with notes
 
Chapter 7 recursion handouts
Chapter 7   recursion handoutsChapter 7   recursion handouts
Chapter 7 recursion handouts
 
Chapter 7 recursion
Chapter 7   recursionChapter 7   recursion
Chapter 7 recursion
 
Chapter 5 searching and sorting handouts with notes
Chapter 5   searching and sorting handouts with notesChapter 5   searching and sorting handouts with notes
Chapter 5 searching and sorting handouts with notes
 
Chapter 5 searching and sorting handouts
Chapter 5   searching and sorting handoutsChapter 5   searching and sorting handouts
Chapter 5 searching and sorting handouts
 
Chapter 5 searching and sorting
Chapter 5   searching and sortingChapter 5   searching and sorting
Chapter 5 searching and sorting
 
Chapter 4 algorithmic efficiency handouts (with notes)
Chapter 4   algorithmic efficiency handouts (with notes)Chapter 4   algorithmic efficiency handouts (with notes)
Chapter 4 algorithmic efficiency handouts (with notes)
 
Chapter 4 algorithmic efficiency handouts
Chapter 4   algorithmic efficiency handoutsChapter 4   algorithmic efficiency handouts
Chapter 4 algorithmic efficiency handouts
 
Chapter 4 algorithmic efficiency
Chapter 4   algorithmic efficiencyChapter 4   algorithmic efficiency
Chapter 4 algorithmic efficiency
 
Chapter 3 introduction to algorithms slides
Chapter 3 introduction to algorithms slidesChapter 3 introduction to algorithms slides
Chapter 3 introduction to algorithms slides
 
Chapter 3 introduction to algorithms handouts (with notes)
Chapter 3 introduction to algorithms handouts (with notes)Chapter 3 introduction to algorithms handouts (with notes)
Chapter 3 introduction to algorithms handouts (with notes)
 
Chapter 3 introduction to algorithms handouts
Chapter 3 introduction to algorithms handoutsChapter 3 introduction to algorithms handouts
Chapter 3 introduction to algorithms handouts
 
Ku 05 08 2009
Ku 05 08 2009Ku 05 08 2009
Ku 05 08 2009
 
Association mapping using local genealogies
Association mapping using local genealogiesAssociation mapping using local genealogies
Association mapping using local genealogies
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Probability And Stats Intro
Probability And Stats IntroProbability And Stats Intro
Probability And Stats Intro
 
Probability And Stats Intro2
Probability And Stats Intro2Probability And Stats Intro2
Probability And Stats Intro2
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Presentation at APBC 2007

  • 1. Computing the all-pairs quartet distance on a set of evolutionary trees Martin Stissing 1 Thomas Mailund 1,2 Christian Nørgaard Storm Pedersen 1 Gerth Stølting Brodal 1 Rolf Fagerberg 3 (1) University of Aarhus, (2) Oxford University, (3) University of Southern Denmark
  • 2. Quartets and quartet distance a Quartet: Four named species in an d unrooted tree c b Quartet topology: The topology of a the quartet induced by the tree d c b butterfly quartet butterfly quartet butterfly quartet star quartet Quartet distance: The number of quartets that differs in topology between the two trees
  • 3. Quartets and quartet distance C A C A B D E B D E
  • 4. Quartets and quartet distance A C A B ABCD  B D C D C A C A B D E B D E
  • 5. Quartets and quartet distance A C A B ABCD  B D C D A C A B ABCE  C B E C E A C A B D E B D E
  • 6. Quartets and quartet distance A C A B ABCD  B D C D A C A B ABCE  C B E C E A C A A D A D ABDE  B E B E B D E B D E
  • 7. Quartets and quartet distance A C A B ABCD  B D C D A C A B ABCE  C B E C E A C A A D A D ABDE  B E B E B D E A D A D B ACDE  D E C E C E
  • 8. Quartets and quartet distance A C A B ABCD  B D C D A C A B ABCE  C B E C E A C A A D A D ABDE  B E B E B D E A D A D B ACDE  D E C E C E B D B D BCDE  C E C E
  • 9. Quartets and quartet distance A C A B ABCD  B D C D A C A B ABCE  C B E C E A C A A D A D ABDE  B E B E B D E A D A D B ACDE  D E C E C E B D B D BCDE  C E C E Quartet distance = binom(5,4) - 3 = 5 - 3 = 2
  • 10. Previous work G. Estabrook, F. McMorris, and C. Meacham. Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Systematic Zoology., 27:27-33, 1978. W. Day and C. R. Doucette. Expected behaviour of quartet distances between undirected tree. Proc. 18th International Numerical Taxonomy Conference, 1984 C. R. Doucette An efficient algorithm to compute quartet dissimilarity measures. O(n3) Bachelor of Science (Honours) Dissertation. Memorial University of Newfoundland, 1985 M. Steel and D. Penny. Distribution of tree comparison metrics—some new results. Systematic Biology, 42(2):126–141, 1993. D. Bryant, J. Tsang, P. E. Kearney, and M. Li. O(n2) Computing the quartet distance between evolutionary trees. Proc. 11th Symp. on Discrete Algorithms (SODA), 285–286, 2000. G. S. Brodal, R. Fagerberg, and C. N. S. Pedersen O(n log2 n) Computing the quartet distance in time O(n log n). O(n log n) Algorithmica, 38(2): 377-395, 2003.
  • 11. Previous work G. Estabrook, F. McMorris, and C. Meacham. Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Systematic Zoology., 27:27-33, 1978. W. Day and C. R. Doucette. Expected behaviour of quartet distances between undirected tree. Proc. 18th International Numerical Taxonomy Conference, 1984 C. R. Doucette An efficient algorithm to compute quartet dissimilarity measures. O(n3) Algorithms are forScience (Honours) Dissertation. Bachelor of fully resolved trees (binary trees) only. Memorial University of Newfoundland, 1985 Quartet distance between binary trees seems easier to compute ... and D. Penny. M. Steel Distribution of tree comparison metrics—some new results. Systematic Biology, 42(2):126–141, 1993. D. Bryant, J. Tsang, P. E. Kearney, and M. Li. O(n2) Computing the quartet distance between evolutionary trees. Proc. 11th Symp. on Discrete Algorithms (SODA), 285–286, 2000. G. S. Brodal, R. Fagerberg, and C. N. S. Pedersen O(n log2 n) Computing the quartet distance in time O(n log n). O(n log n) Algorithmica, 38(2): 377-395, 2003.
  • 12. C. Christiansen, T. Mailund, C. N. S. Pedersen, and M. Randers Previous work Algorithms for computing the quartet distance between trees of arbitrary degree Proc of Workshop on Algorithms in Bioinformatics (WABI), 77-88, 2005 O(n3) O(d2n2) G. Estabrook, F. McMorris, and C. Meacham. Pedersen, M. Randers, and M. Stissing C. Christiansen, T. Mailund, C. N. S. Comparison of undirected phylogeneticdistance between trees ofof Fast calculation of the quartet trees based on subtrees arbitrary degrees four evolutionary units. Systematic Zoology., 27:27-33, 1978. Algorithms for Molecular Biology, 1(16), 2006 O(dn2) W. Day and Stissing, C. N. S. Pedersen, T. Mailund, G. S. Brodal, and R. Fagerberg M. C. R. Doucette. Expected behaviour the quartet distance between evolutionary trees of bounded degree Computing of quartet distances between undirected tree. th Proc. 18 Proc. of APBC, 2007 International Numerical Taxonomy Conference, 1984 O(d9 n log n) C. R. Doucette An efficient algorithm to compute quartet dissimilarity measures. O(n3) Algorithms are forScience (Honours) Dissertation. Bachelor of fully resolved trees (binary trees) only. Memorial University of Newfoundland, 1985 Quartet distance between binary trees seems easier to compute ... and D. Penny. M. Steel Distribution of tree comparison metrics—some new results. Systematic Biology, 42(2):126–141, 1993. D. Bryant, J. Tsang, P. E. Kearney, and M. Li. O(n2) Computing the quartet distance between evolutionary trees. Proc. 11th Symp. on Discrete Algorithms (SODA), 285–286, 2000. G. S. Brodal, R. Fagerberg, and C. N. S. Pedersen O(n log2 n) Computing the quartet distance in time O(n log n). O(n log n) Algorithmica, 38(2): 377-395, 2003.
  • 13. QDist QDist: Implements the O(n2) and O(n log2 n) time algorithm for computing the quartet distance between binary trees http://www.birc.au.dk/~mailund/qdist.html
  • 14. QDist QDist: Implements the O(n2) and O(n log2 n) time algorithm for computing the quartet distance between binary trees ... we want to compare 10,000 trees of size 200 using Qdist, is that possible? Takes about 2·10,0002 sec ≈ 6 years ... http://www.birc.au.dk/~mailund/qdist.html
  • 15. QDist QDist: Implements the O(n2) and O(n log2 n) time algorithm for computing the quartet distance between binary trees ... we want to compare 10,000 trees of size 200 using Qdist, is that possible? Takes about 2·10,0002 sec ≈ 6 years ... Immediate solution: Perform O(k2) comparisons between two tree using time O(n log2 n) or O(n2) each ... http://www.birc.au.dk/~mailund/qdist.html
  • 16. QDist QDist: Implements the O(n2) and O(n log2 n) time algorithm for computing the quartet distance between binary trees ... we want to compare 10,000 trees of size 200 using Qdist, is that possible? Takes about 2·10,0002 sec ≈ 6 years ... Immediate solution: Perform O(k2) comparisons between two tree using time O(n log2 n) or O(n2) each ... Faster solution: Exploit similarities between the input trees by merging them into a single DAG-structure which allows “common subtrees” to be shared. All comparisons are performed by comparing the DAG against the input trees, or against itself. The speed-up depends on compactness of the DAG ... http://www.birc.au.dk/~mailund/qdist.html
  • 17. Counting shared (directed) quartets Directed quartets Idea: Iterate over all pairs of edges in the two trees, and count for each pair how many associated quartets are shared. The problem is to define “associated” such each quartet is counted as most once ... b b B1 B2 e1 e2 a,d A1 a,d A2 C1 C2 c c
  • 18. Counting using colouring Idea: Alternatively, colour leaves according to nodes in one tree and count compatible colourings in the other All colourings in one tree can be produced in O(n log n) – Smaller half trick Each colouring can be counted in amortized O(log n) – Hierarchical decomposition and polynomial manipulations Brodal et al.: Computing the Quartet Distance between Evolutionary Trees in Time O(n log n), Algoritmica 38(2), 2003.
  • 19. Comparing sets of trees Observation: Shared sub-trees implies shared quartets.
  • 20. Comparing sets of trees Observation: We can share results between trees by merging them into a DAG.
  • 21. Experiment – DAG size The size of the DAG depends on the similarity of the input trees Simulation setup Simulating one binary tree with n leaves: evolve kn sequences of length m on the k tree, construct k neighbor joining trees from the resulting kn sequences
  • 22. Experiment – Running time Running time of “DAG versus DAG” algorithm on a standard Linux PC The straightforward “all-against- The “DAG-DAG” approach takes between 20 and 500 all” approach takes 3000 seconds for the same comparisons depending on the second for comparing 100 trees similarity of the trees of size 500
  • 23. Summary Results Utilizing shared tree structure speeds up the computation significantly ... Future work Extension to non-binary trees and trees with unequal leaf sets Applications?
  • 24. Computing the quartet distance between evolutionary trees of bounded degree Martin Stissing 1 Christian Nørgaard Storm Pedersen 1 Thomas Mailund 1,2 Gerth Stølting Brodal 1 Rolf Fagerberg 3 (1) University of Aarhus, (2) Oxford University, (3) University of Southern Denmark
  • 26. C. Christiansen, T. Mailund, C. N. S. Pedersen, and M. Randers Previous work Algorithms for computing the quartet distance between trees of arbitrary degree Proc of Workshop on Algorithms in Bioinformatics (WABI), 77-88, 2005 O(n3) O(d2n2) G. Estabrook, F. McMorris, and C. Meacham. Pedersen, M. Randers, and M. Stissing C. Christiansen, T. Mailund, C. N. S. Comparison of undirected phylogeneticdistance between trees ofof Fast calculation of the quartet trees based on subtrees arbitrary degrees four evolutionary units. Systematic Zoology., 27:27-33, 1978. Algorithms for Molecular Biology, 1(16), 2006 O(dn2) W. Day and Stissing, C. N. S. Pedersen, T. Mailund, G. S. Brodal, and R. Fagerberg M. C. R. Doucette. Expected behaviour the quartet distance between evolutionary trees of bounded degree Computing of quartet distances between undirected tree. th Proc. 18 Proc. of APBC, 2007 International Numerical Taxonomy Conference, 1984 O(d9 n log n) C. R. Doucette An efficient algorithm to compute quartet dissimilarity measures. O(n3) Algorithms are forScience (Honours) Dissertation. Bachelor of fully resolved trees (binary trees) only. Memorial University of Newfoundland, 1985 Quartet distance between binary trees seems easier to compute ... and D. Penny. M. Steel Distribution of tree comparison metrics—some new results. Systematic Biology, 42(2):126–141, 1993. D. Bryant, J. Tsang, P. E. Kearney, and M. Li. O(n2) Computing the quartet distance between evolutionary trees. Proc. 11th Symp. on Discrete Algorithms (SODA), 285–286, 2000. G. S. Brodal, R. Fagerberg, and C. N. S. Pedersen O(n log2 n) Computing the quartet distance in time O(n log n). O(n log n) Algorithmica, 38(2): 377-395, 2003.
  • 27. C. Christiansen, T. Mailund, C. N. S. Pedersen, and M. Randers Previous work Algorithms for computing the quartet distance between trees of arbitrary degree Proc of Workshop on Algorithms in Bioinformatics (WABI), 77-88, 2005 O(n3) O(d2n2) G. Estabrook, F. McMorris, and C. Meacham. Pedersen, M. Randers, and M. Stissing C. Christiansen, T. Mailund, C. N. S. Comparison of undirected phylogeneticdistance between trees ofof Fast calculation of the quartet trees based on subtrees arbitrary degrees four evolutionary units. Systematic Zoology., 27:27-33, 1978. Algorithms for Molecular Biology, 1(16), 2006 O(dn2) W. Day and Stissing, C. N. S. Pedersen, T. Mailund, G. S. Brodal, and R. Fagerberg M. C. R. Doucette. Expected behaviour the quartet distance between evolutionary trees of bounded degree Computing of quartet distances between undirected tree. th Proc. 18 Proc. of APBC, 2007 International Numerical Taxonomy Conference, 1984 O(d9 n log n) C. R. Doucette An efficient algorithm to compute quartet dissimilarity measures. O(n3) Algorithms are forScience (Honours) Dissertation. Bachelor of fully resolved trees (binary trees) only. Memorial University of Newfoundland, 1985 Quartet distance between binary trees seems easier to compute ... and D. Penny. M. Steel Distribution of tree comparison metrics—some new results. Systematic Biology, 42(2):126–141, 1993. D. Bryant, J. Tsang, P. E. Kearney, and M. Li. O(n2) Computing the quartet distance between evolutionary trees. Proc. 11th Symp. on Discrete Algorithms (SODA), 285–286, 2000. G. S. Brodal, R. Fagerberg, and C. N. S. Pedersen O(n log2 n) Computing the quartet distance in time O(n log n). O(n log n) Algorithmica, 38(2): 377-395, 2003.
  • 28. Algorithms for partially resolved trees Idea: Iterate over all pairs of edges (or nodes) in the two trees, and count for each pair how many associated quartets are shared. The problem is to define “associated” such each quartet is counted as most once ... Different solutions Type Time Space Center based O(n3) O(n) Edge based O(|V||V'|·d·d') O(n2) Edge based O(n+|V||V'|·id·id') O(n+|V||V'|) Node based O(n+|V||V'|·min{id,id'}) O(n+|V||V'|) - n is the number of species/leaves “worst case”-tree - |V|, |V'| are number of internal nodes in T and T' |V|=id=O(n) - id, id' are internal degree of T and T'
  • 29. QuartetDist QuartetDist: Computes quartet distance (or normalized similarity, i.e. quartet fit similarity) between general trees A simple GUI Implementation in Java by Martin Randers and Chris Christiansen http://www.birc.au.dk/~chrisc/qdist/
  • 30. C. Christiansen, T. Mailund, C. N. S. Pedersen, and M. Randers Previous work Algorithms for computing the quartet distance between trees of arbitrary degree Proc of Workshop on Algorithms in Bioinformatics (WABI), 77-88, 2005 O(n3) O(d2n2) G. Estabrook, F. McMorris, and C. Meacham. Pedersen, M. Randers, and M. Stissing C. Christiansen, T. Mailund, C. N. S. Comparison of undirected phylogeneticdistance between trees ofof Fast calculation of the quartet trees based on subtrees arbitrary degrees four evolutionary units. Systematic Zoology., 27:27-33, 1978. Algorithms for Molecular Biology, 1(16), 2006 O(dn2) W. Day and Stissing, C. N. S. Pedersen, T. Mailund, G. S. Brodal, and R. Fagerberg M. C. R. Doucette. Expected behaviour the quartet distance between evolutionary trees of bounded degree Computing of quartet distances between undirected tree. th Proc. 18 Proc. of APBC, 2007 International Numerical Taxonomy Conference, 1984 O(d9 n log n) C. R. Doucette An efficient algorithm to compute quartet dissimilarity measures. O(n3) Algorithms are forScience (Honours) Dissertation. Bachelor of fully resolved trees (binary trees) only. Memorial University of Newfoundland, 1985 Quartet distance between binary trees seems easier to compute ... and D. Penny. M. Steel Distribution of tree comparison metrics—some new results. Systematic Biology, 42(2):126–141, 1993. D. Bryant, J. Tsang, P. E. Kearney, and M. Li. O(n2) Computing the quartet distance between evolutionary trees. Proc. 11th Symp. on Discrete Algorithms (SODA), 285–286, 2000. G. S. Brodal, R. Fagerberg, and C. N. S. Pedersen O(n log2 n) Computing the quartet distance in time O(n log n). O(n log n) Algorithmica, 38(2): 377-395, 2003.
  • 31. Counting using colouring Idea: Alternatively, colour leaves according to nodes in one tree and count compatible colourings in the other This approach does not explicitly consider all pairs of nodes/edges, which makes it possible to achieve a sub-quadratic running time Brodal et al.: Computing the Quartet Distance between Evolutionary Trees in Time O(n log n), Algoritmica 38(2), 2003.
  • 32. Counting using colouring Idea: Alternatively, colour leaves according to nodes in one tree and count compatible colourings in the other This approach does not explicitly consider all pairs of nodes/edges, which makes it possible to achieve a sub-quadratic running time Ideas can be generalized to trees of maximum degree d . Use d colours instead of 3, use a more complex hierarchical decomposition tree ... Brodal et al.: Computing the Quartet Distance between Evolutionary Trees in Time O(n log n), Algoritmica 38(2), 2003.
  • 33. Counting using colouring Idea: Alternatively, colour leaves according to nodes in one tree and count compatible colourings in the other This approach does not explicitly consider all pairs of nodes/edges, which makes it possible to achieve a sub-quadratic running time Ideas can be generalized to trees of maximum degree d . Use d colours instead of 3, use a more complex hierarchical decomposition tree ... Brodal et al.: Computing the Quartet Distance between Evolutionary Trees in Time O(n log n), Algoritmica 38(2), 2003.
  • 34. Counting using colouring Idea: Alternatively, colour leaves according to nodes in one tree and count compatible colourings in the other Efficiently calculated using hierarchical decomposition tree This approach does not explicitly consider all pairs of nodes/edges, which makes it possible to achieve a sub-quadratic running time Ideas can be generalized to trees of maximum degree d . Use d colours instead of 3, use a more complex hierarchical decomposition tree ... Brodal et al.: Computing the Quartet Distance between Evolutionary Trees in Time O(n log n), Algoritmica 38(2), 2003.
  • 46. Colouring Smaller-half trick
  • 47. Colouring Smaller-half trick In total: All colourings takes time O(n log n) ...
  • 48. Hierarchical decomposition tree Let T be an unrooted tree of size n with degree at most d. A component C in T is: A single node in T, or A connected subset of nodes in T with degree at most d. A hierarchical decomposition tree H(T) of T is a rooted binary tree: A leaf is simple component (single node in T). An internal node is a composite component of its children.
  • 49. Hierarchical decomposition tree Let T be an unrooted tree of size n with degree at most d. A component C in T is: A single node in T, or A connected subset of nodes in T with degree at most d. A hierarchical decomposition tree H(T) of T is a rooted binary tree: A leaf is simple component (single node in T). An internal node is a composite component of its children. Lemma: A hierarchical decomposition tree H(T) of T of height O(d log n) can be constructed in time O(dn)
  • 50. Hierarchical decomposition tree Annotation of node C in H(T) A d-tuple denoting the number of leaves in C coloured in each of the d colours. A function F of d × “degree of C” variables – representing counts of each colour in the remaining leaves of T The function F yields the number of quartets associated to nodes in C and compatible with colouring ...
  • 51. Hierarchical decomposition tree Annotation of node C in H(T) A d-tuple denoting the number of leaves in C coloured in each of the d colours. A function F of d × “degree of C” variables – representing counts of each colour in the remaining leaves of T The function F yields the number of quartets associated to nodes in C and compatible with colouring ... The function at the root is a constant which counts the total number of quartets compatible with the current colouring in time O(1) ...
  • 54. Hierarchical decomposition tree Observation: F is a polynomial of at most d2 variables of degree at most 4, i.e. It can be manipulated in time O(d8) ... Lemma: Initial annotation takes time O(d9n). Updating the annotation when changing a colour takes amortized time O(d9 log n) ...
  • 55. Hierarchical decomposition tree Summing it all up Colouring leaves and updating H(T) takes amortized time O(log n × d9 log n) per node v1 in T1. Counting takes time O(1) per node v1 in T1. In total: O(d9 n log2 n). ... with “contractions” in H(T) the total time can be improved Observation: F is n)polynomial of at most to O(d9 n log a ... d2 variables of degree at most 4, i.e. It can be manipulated in time O(d8) ... Lemma: Initial annotation takes time O(d9n). Updating the annotation when changing a colour takes amortized time O(d9 log n) ...