SlideShare uma empresa Scribd logo
1 de 66
Baixar para ler offline
Fixed-Parameter and Approximation Algorithms for
     the Subtree Distance Between Phylogenies


                       Charles Semple
               Biomathematics Research Centre
           Department of Mathematics and Statistics
                   University of Canterbury
                         New Zealand

Joint work with Magnus Bordewich, Catherine McCartin

e-Science Institute, Edinburgh 2007
Subtree Prune and Regraft (SPR)


Example.        r                      r



                           1 SPR
                                       bc          d
                                   a
           a   bc      d
                                       T1
               S

                                           r
                       1 SPR




                                   a           d   b
                                       c
                                           T2
Applications of SPR


Used
   I.    As a search tool for selecting the best tree in
         reconstruction algorithms.
    II. To quantify the dissimilarity between two phylogenetic
         trees.
    III. To provide a lower bound on the number of reticulation
         events in the case of non-tree-like evolution.

For II and III, one wants the minimum number of SPR operations to
      transform one phylogeny into another.

This number is the SPR distance between two phylogenies S and T.
The Mathematical Problem


MINIMUM SPR
Instance: Two rooted binary phylogenetic trees S and T.
Goal: Find a minimum length sequence of single SPR operations that
   transforms S into T.
Measure: The length of the sequence.

Notation: Use dSPR(S, T) to denote this minimum length.

Theorem (Bordewich, S 2004)
MINIMUM SPR is NP-hard.

Overriding goal is to find (with no restrictions) the exact solution or a
  heuristic solution with a guarantee of closeness.
Algorithms for NP-Hard Problems


Fixed-parameter algorithms are a practical way to find optimal
   solutions if the parameter measuring the hardness of the problem
   is small.

Polynomial-time approximation algorithms can efficiently find feasible
   solutions that are sometimes arbitrarily close to the optimal
   solution.
Agreement Forests


A forest of T is a disjoint       Example.       r
   collection of phylogenetic
   subtrees whose union of leaf
   sets is X U r.



                                        a        cd   e   f
                                             b
                                                 S

                                                 r



                                        a        cd   e   f
                                             b
                                                 F1
Agreement Forests


A forest of T is a disjoint       Example.       r
   collection of phylogenetic
   subtrees whose union of leaf
   sets is X U r.



                                        a        cd   e   f
                                             b
                                                 S

                                                 r



                                        a        cd   e   f
                                             b
                                                 F1
Agreement Forests


An agreement forest for S and T is a forest of both S and T.

                     r                              r
Example.




       a        cd       e   f          d          fa    b     c
            b                                 e
                S                                  T

                 r                             r



       a        cd       e   f           a         cd    e     f
            b                                 b
                F1                                 F2
Agreement Forests


An agreement forest for S and T is a forest of both S and T.

                     r                              r
Example.




       a        cd       e   f          d          fa    b     c
            b                                 e
                S                                  T

                 r                             r



       a        cd       e   f           a         cd    e     f
            b                                 b
                F1                                 F2
Agreement Forests


An agreement forest for S and T is a forest of both S and T.

                     r                              r
Example.




                         e
       a        cd           f          d          fa    b     c
            b                                 e
                S                                  T

                 r                             r



       a        cd       e   f           a         cd    e     f
            b                                 b
                F1                                 F2
Theorem. (Bordewich, S, 2004)
Let S and T be two binary phylogenetic trees. Then
          dSPR(S,T) = size of maximum-agreement forest - 1.

    o It’s fast to construct from a maximum-agreement forest for S
      and T a sequence of SPR operations that transforms S into T.
Reducing the Size of the Instance

Subtree Reduction                   Chain Reduction
Fixed-Parameter Algorithms


The underlying idea is to find an algorithm whose running time
  separates the size of the problem instance from the parameter of
  interest.

One way to obtain such an algorithm is to reduce the size of the
  problem instance, while preserving the optimal value (kernalizing
  the problem).

Are the subtree and chain reductions enough to kernalize the
   problem?
Fixed-Parameter Algorithms


Lemma. If n’ denotes the size of the leaf sets of the fully reduced
    trees using the subtree and chain reductions, then
                           n’ < 28dSPR(S,T).

Corollary. (Bordewich, S 2004) MINIMUM SPR is fixed-parameter
     tractable.

1.   Repeatedly apply the subtree and chain rules.
2.   Exhaustively find a maximum-agreement forest by deleting edges
     in S and comparing with T.

Running time is O((56k)k + p(n)) compared with O((2n)k), where
    k=dSPR(S,T) and p(n) is the polynomial bound for reducing the
    trees using the subtree and chain reductions.
Fixed-Parameter Algorithms


A second way to obtain a fixed-parameter algorithm is using the
       method of bounded search trees.



Preliminaries
o       A rooted triple is a binary tree with three leaves.
                                                              a     bc
                                                                  ab|c

o      A tree S displays ab|c if the last common ancestor of ab is a
       descendant of the last common ancestor of ac.
r

Fixed-Parameter Algorithms

                                            r
Observation. If F is an agreement                    d           fa   b   c
                                                             e
  forest for S and T, then F can                                 T
  be obtained from S by
  deleting |F|-1 edges.

                                    a       cd   e       f
                                        b
Recall. If F is maximum, then
                                            S
          |F|-1=dSPR(S,T).
r

Fixed-Parameter Algorithms

                                            r
Observation. If F is an agreement                    d           fa   b   c
                                                             e
  forest for S and T, then F can                                 T
  be obtained from S by
  deleting |F|-1 edges.

                                    a       cd   e       f
                                        b
Recall. If F is maximum, then
                                            S
          |F|-1=dSPR(S,T).
r

Fixed-Parameter Algorithms

                                                r
1.   Find a minimal triple xy|z in S                     d           fa   b   c
                                                                 e
     not in T.
                                                                     T
2.   Branch into 4 computational
     paths ex, ey, ez, er, and repeat
     for each path until no such
     triple.                            a       cd   e       f
                                            b
3.   For each path, find a pair of              S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.
r

Fixed-Parameter Algorithms

                                                r
1.   Find a minimal triple xy|z in S                     d           fa   b   c
                                                                 e
     not in T.
                                                                     T
2.   Branch into 4 computational
     paths ex, ey, ez, er, and repeat
     for each path until no such
     triple.                            a       cd   e       f
                                            b
3.   For each path, find a pair of              S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.
r

Fixed-Parameter Algorithms

                                                  r
1.   Find a minimal triple xy|z in S                       d           fa   b   c
                                                                   e
     not in T.
                                                                       T
2.   Branch into 4 computational
     paths ex, ey, ez, er, and repeat
                                        ea   eb
     for each path until no such
     triple.                            a         cd   e       f
                                             b
3.   For each path, find a pair of                S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.
r

Fixed-Parameter Algorithms

                                                  r
1.   Find a minimal triple xy|z in S                       d           fa   b   c
                                                                   e
     not in T.
                                                                       T
                                             er
2.   Branch into 4 computational
     paths ex, ey, ez, er, and repeat
                                        ea   eb
     for each path until no such
     triple.                            a         cd   e       f
                                              b
3.   For each path, find a pair of                S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.
r

Fixed-Parameter Algorithms

                                                  r
1.   Find a minimal triple xy|z in S                       d           fa   b   c
                                                                   e
     not in T.
                                                                       T
                                             er
2.   Branch into 4 computational
                                                  ed
     paths ex, ey, ez, er, and repeat
                                        ea   eb
     for each path until no such
     triple.                            a         cd   e       f
                                              b
3.   For each path, find a pair of                S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.
r

Fixed-Parameter Algorithms

                                                   r
1.   Find a minimal triple xy|z in S                         d           fa   b   c
                                                                     e
     not in T.
                                                                         T
                                             er
2.   Branch into 4 computational
                                                   ed
     paths ex, ey, ez, er, and repeat
                                        ea   eb
     for each path until no such
     triple.                            a         cd     e       f
                                              b
3.   For each path, find a pair of                S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.                                  ed

                                                    ea
                                                   eb

                                                  er
r

Fixed-Parameter Algorithms

                                                   r
1.   Find a minimal triple xy|z in S                         d           fa   b   c
                                                                     e
     not in T.
                                                                         T
                                             er
2.   Branch into 4 computational
                                                   ed
     paths ex, ey, ez, er, and repeat
                                        ea   eb
     for each path until no such
     triple.                            a         cd     e       f
                                              b
3.   For each path, find a pair of                S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.                                  ed

                                                    ea
                                                   eb

                                                  er
r

Fixed-Parameter Algorithms

                                                 r
1.   Find a minimal triple xy|z in S                       d           fa   b   c
                                                                   e
     not in T.
                                                                       T
2.   Branch into 4 computational
     paths ex, ey, ez, er, and repeat
     for each path until no such
     triple.                            a       cd     e       f
                                            b
3.   For each path, find a pair of              S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.                                ed

                                                  ea
                                                 eb

                                                er
r

Fixed-Parameter Algorithms

                                                        r
1.   Find a minimal triple xy|z in S                                   d           fa   b   c
                                                                               e
                                                  er
     not in T.
                                                                                   T
2.   Branch into 4 computational                              ee
     paths ex, ey, ez, er, and repeat
                                        ea   eb
     for each path until no such
     triple.                            a          cd              e       f
                                             b
3.   For each path, find a pair of                 S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.                                       ed

                                                         ea
                                                        eb

                                                       er
r

Fixed-Parameter Algorithms

                                                        r
1.   Find a minimal triple xy|z in S                                   d           fa   b   c
                                                                               e
                                                  er
     not in T.
                                                                                   T
2.   Branch into 4 computational                              ee
     paths ex, ey, ez, er, and repeat
                                        ea   eb
     for each path until no such
     triple.                            a          cd              e       f
                                             b
3.   For each path, find a pair of                 S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.                                       ed

                                                         ea
                                                        eb

                                                       er
r

Fixed-Parameter Algorithms

                                                   r
1.   Find a minimal triple xy|z in S                         d           fa   b   c
                                                                     e
     not in T.
                                                                         T
                                             er
2.   Branch into 4 computational
                                                   ed
     paths ex, ey, ez, er, and repeat
                                        ea   eb
     for each path until no such
     triple.                            a         cd     e       f
                                              b
3.   For each path, find a pair of                S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.                                  ed

                                                    ea
                                                   eb

                                                  er
r

Fixed-Parameter Algorithms

                                                   r
1.   Find a minimal triple xy|z in S                         d           fa   b   c
                                                                     e
     not in T.
                                                                         T
                                             er
2.   Branch into 4 computational
                                                   ed
     paths ex, ey, ez, er, and repeat
                                        ea   eb
     for each path until no such
     triple.                            a         cd     e       f
                                              b
3.   For each path, find a pair of                S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.                                  ed

                                                    ea
                                                   eb

                                                  er
r

Fixed-Parameter Algorithms

                                                 r
1.   Find a minimal triple xy|z in S                       d           fa   b   c
                                                                   e
     not in T.
                                                                       T
2.   Branch into 4 computational
     paths ex, ey, ez, er, and repeat
     for each path until no such
     triple.                            a       cd     e       f
                                            b
3.   For each path, find a pair of              S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.                                ed

                                                  ea
                                                 eb

                                                er
r

Fixed-Parameter Algorithms

                                                 r
1.   Find a minimal triple xy|z in S                       d           fa   b   c
                                                                   e
     not in T.
                                                                       T
2.   Branch into 4 computational
     paths ex, ey, ez, er, and repeat
     for each path until no such
     triple.                            a       cd     e       f
                                            b
3.   For each path, find a pair of              S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.                                ed

                                                  ea
                                                 eb

                                                er
r

Fixed-Parameter Algorithms                                         vst

                                                 r
1.   Find a minimal triple xy|z in S                       d             fa   b   c
                                                                   e
     not in T.
                                                                         T
2.   Branch into 4 computational
     paths ex, ey, ez, er, and repeat
     for each path until no such
     triple.                            a       cd     e       f
                                            b
3.   For each path, find a pair of              S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.                                ed

                                                  ea
                                                 eb

                                                er
r

Fixed-Parameter Algorithms                                          vst

                                                  r
                                                 es
1.   Find a minimal triple xy|z in S                        d             fa   b   c
                                                                    e
     not in T.
                                                                          T
2.   Branch into 4 computational
     paths ex, ey, ez, er, and repeat
                                        et
     for each path until no such
     triple.                            a        cd     e       f
                                             b
3.   For each path, find a pair of               S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.                                 ed

                                                   ea
                                                  eb

                                                 er
r

Fixed-Parameter Algorithms                                          vst

                                                  r
                                                 es
1.   Find a minimal triple xy|z in S                        d             fa   b   c
                                                                    e
     not in T.
                                                                          T
2.   Branch into 4 computational
     paths ex, ey, ez, er, and repeat
                                        et
     for each path until no such
     triple.                            a        cd     e       f
                                             b
3.   For each path, find a pair of               S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.                                 ed

                                                   ea
                                                  eb

                                                        es
                                                 er

                                                         et
r

Fixed-Parameter Algorithms

                                                 r
1.   Find a minimal triple xy|z in S                       d           fa   b   c
                                                                   e
     not in T.
                                                                       T
2.   Branch into 4 computational
     paths ex, ey, ez, er, and repeat
     for each path until no such
     triple.                            a       cd     e       f
                                            b
3.   For each path, find a pair of              S
     overlapping components ts and tt
     and branch into 2 paths es, et.
     Repeat until no such
     components.                                ed

                                                  ea
                                                 eb

                                                       es
                                                er

                                                        et
r

Fixed-Parameter Algorithms

                                                 r
Key Lemma. dSPR(S,T) <= k if and only                      d           fa   b   c
                                                                   e
     if one of the computational
                                                                       T
     paths of length at most k
     results in an agreement forest
     for S and T.
                                        a       cd     e       f
                                            b
Theorem. (Bordewich, McCartin, S                S
     2007)
The bounded search tree algorithm
     has running time O(4kn4).
                                                ed
Combining the methods of
    kernalization and bounded                     ea
    search gives an FPT algorithm                eb
    with running time O(4kk4+p(n)).
                                                       es
                                                er

                                                        et
Approximation Algorithms


Polynomial-time approximation algorithms can efficiently find feasible
   solutions that are sometimes arbitrarily close to the optimal
   solution.

An r-approximation algorithm A for MINIMUM SPR means that the
   size of an agreement forest (minus 1) for S and T returned by A is
   at most r times dSPR(S,T).

Example. If r=3, then the size of any agreement forest (minus 1) for S
   and T returned by A is at most 3 times dSPR(S,T).

Based on a pairwise comparison, Bonet, St. John, Mahindru, Amenta
   2006 establish a 5-approximation algorithm (O(n)) for MINIMUM
   SPR.
r

Approximation Algorithms

                                                    r
1.    Find a minimal triple xy|z in S                         d           fa   b   c
                                                                      e
      not in T.
                                                                          T
2.    Delete each of ex, ey, ez, er, and
      repeat until no such triple.
3.    Find a pair of overlapping
      components ts and tt and delete      a       cd     e       f
                                               b
      es, et. Repeat until no such                 S
      components.

Theorem. (Bordewich, McCartin, S
                                                   ed
      2007)
This gives a 4-approximation alg. for
      MINIMUM SPR (O(n5)).                           ea
                                                    eb
Deleting only ex, ez, er gives a 3-
                                                          es
     approximation algorithm.                      er

                                                           et
No r, unless
                                                         P=NP
Exact
             (B, S 07)                  (B, M,S 07)
algorithms
             1 + 1/2112                 3
r=1
                          MINIMUM SPR
                                            5
       Maximum                                           1. Travelling
                                            (B S,M, A 06)salesman
       independent
       set in planar                                     problem
       graphs (PTAS)
                                                        2. Maximum
                                                        clique
Modelling Hybridization Events with SPR Operations


Reticulation processes cause        … molecular phylogeneticists will
     species to be a composite of       have failed to find the `true
     DNA regions derived from           tree’, not because their
     different ancestors.               methods are inadequate or
                                        because they have chosen
                                        the wrong genes, but
Processes include
                                        because the history of life
    o horizontal gene transfer,
                                        cannot be properly
    o hybridization, and
                                        represented as a tree.
    o recombination.
                                                 Ford Doolittle, 1999
                                               (Dalhousie University)
Modelling Hybridization Events with SPR Operations


A single SPR operation models a single hybridization event (Maddison
   1997).
                    r                       r
Example.



                                           bc       d
                                     a
            a       bc      d
                                           T
                    S
                r




        a       bc      d
                H
Modelling Hybridization Events with SPR Operations


A single SPR operation models a single hybridization event (Maddison
   1997).
                    r                       r
Example.



                                           bc       d
                                     a
            a       bc      d
                                           T
                    S
                r




        a       bc      d
                H
Modelling Hybridization Events with SPR Operations


A single SPR operation models a single hybridization event (Maddison
   1997).
                    r                       r
Example.



                                           bc       d
                                     a
            a       bc      d
                                           T
                    S
                r




        a       bc      d
                H
Modelling Hybridization Events with SPR Operations


A single SPR operation models a single hybridization event (Maddison
   1997).
                    r                       r
Example.



                                           bc               d
                                     a
            a       bc      d
                                           T
                    S
                r

                                                    r
                            deleting
                            hybrid edges
                                                a       d       b
        a       bc      d                                           c
                H                                           F
A Fundamental Problem for Biologists


Given an initial set of data that correctly repesents the tree-like
   evolution of different parts of various species genomes,

what is the smallest number of reticulation events required that
  simultaneously explains the evolution of the species?

This smallest number
    o Provides a lower bound on the number of such events.
    o Indicates the extent that hybridization has had on the
       evolutionary history of the species under consideration.

Since 1930’s botantists have asked the question: How significant has
   the effect of hybridization been on the New Zealand flora?
Trees and Hybridization Networks


H explains T if T can be obtained from a rooted subtree of H by
   suppressing degree-2 vertices.

Example.



                                                      b   d
            a     bc          d         a     c
                                                  T
                  S




                          c   d                    c      d
           a     b                      a      b
                     H1                         H2
Trees and Hybridization Networks


H explains T if T can be obtained from a rooted subtree of H by
   suppressing degree-2 vertices.

Example.



                                                      b   d
            a     bc          d         a     c
                                                  T
                  S




                          c   d                    c      d
           a     b                      a      b
                     H1                         H2
Trees and Hybridization Networks


H explains T if T can be obtained from a rooted subtree of H by
   suppressing degree-2 vertices.

Example.



                                                      b   d
            a     bc          d         a     c
                                                  T
                  S




                          c   d                    c      d
           a     b                      a      b
                     H1                         H2
The Mathematical Problem


MINIMUM HYBRIDIZATION
Instance: Two rooted binary phylogenetic trees S and T.
Goal: Find a hybridization network H that explains S and T, and
   minimizes the number of hybridization vertices.
Measure: The number of hybridization vertices in H.



Notation: Use h(S, T) to denote this minimum number.
Example: Arbitrary SPR operations not sufficient.




          r



 a        cd    e   f
      b
          F1
o A sequence of SPR operations that avoids creating directed cycles
  to make a hybridization network that explains S and T.

o If one minimizes the length of an (acyclic) sequence, does the
  resulting network minimize the number of hybridization events to
  explain S and T?

o YES, and such a sequence corresponds to an acyclic-agreement
  forest.
Theorem. (Baroni, Grünewald, Moulton, S, 2005)
Let S and T be two binary phylogenetic trees. Then
        h(S,T) = size of maximum-acyclic agreement forest - 1.

    o It’s fast to construct from a maximum-acyclic agreement for S
      and T a hybridization network that realizes h(S,T).

Theorem. (Bordewich, S, 2007)
MINIMUM HYBRIDIZATION is NP-hard.
Reducing the Size of the Instance

Subtree Reduction                   Chain Reduction
Fixed-Parameter Algorithms


Are the subtree and chain reductions enough to kernalize the
   problem?

Lemma. If n’ denotes the size of the leaf sets of the fully reduced
   trees using the subtree and chain reductions, then
                             n’<14h(S,T).

Corollary. (Bordewich, S 2007) MINIMUM HYBRIDIZATION is
   fixed-parameter tractable.

Running time is O((28k)k + p(n)) compared with O((2n)k), where
   k=h(S,T) and p(n) is the polynomial bound for reducing the trees
   using the subtree and chain reductions.
Reducing the Size of the Instance

Cluster Reduction (Baroni 2004)




                                  +
A Grass (Poaceae) Dataset (Grass Phylogeny Working Group,
Düsseldorf)




o Ellstrand, Whitkus, Rieseburg 1996 (Distribution of spontaneous
  plant hybrids)
o For each sequence, used fastDNAml to reconstruct a phylogenetic
  tree (H. Schmidt).
Chloroplast (phyB) sequences   Nuclear (ITS) sequences
Chloroplast (phyB) sequences   Nuclear (ITS) sequences
Chloroplast (phyB) sequences   Nuclear (ITS) sequences
Chloroplast (phyB) sequences   Nuclear (ITS) sequences
Chloroplast (phyB) sequences   Nuclear (ITS) sequences
Chloroplast (phyB) sequences   Nuclear (ITS) sequences
h=3




                                                         h=1




                                                         h=4




                                                         h=0

Chloroplast (phyB) sequences   Nuclear (ITS) sequences
pairwise combination   # overlapping     h(S,T)      running time
                             taxa                        2000 MHz
                                                       CPU, 2GB RAM
ndhF        phyB              40             14            11h
ndhF        rbcL              36             13           11.8h
ndhF        rpoC2             34             12           26.3h
ndhF        waxy              19              9           320s
ndhF        ITS               46         at least 15
phyB        rbcL              21              4             1s
phyB        rpoC2             21              7           180s
phyB        waxy              14              3             1s
phyB        ITS               30              8            19s
rbcL        rpoC2             26             13           29.5h
rbcL        waxy              12              7           230s
rbcL        ITS               29          at least 9
rpoC2       waxy              10              1             1s
rpoC2       ITS               31         at least 10
waxy        ITS               15              8           620s
                                   Bordewich, Linz, St John, S, 2007
Computing dSPR(S,T) and h(S,T)


dSPR(S,T)                          h(S,T)

1.   FPT using kernalization and   1.   FPT using kernalization only
     bounded search tree                (O((28k)k +p(n))). Unknown if
     methods (O(4kk4+p(n))).            a bounded search tree
                                        method exists.

                                   2.   Cluster-based reduction.
2.   No cluster-based reduction.
                                   3.   Unknown if there even is an
3.   3-approximation algorithm.
                                        approximation algorithm.
Reducing the Size of the Instance

Subtree Reduction                            Chain Reduction




                                                                  x3
               an                                                      x2
                          an
                                                                            x1
      a2                           a2                        x3        T’
 a1                                     a1              x2
                                                   x1
           S                   T
                                                        S’

Mais conteúdo relacionado

Mais de Roderic Page

The Sam Adams talk
The Sam Adams talkThe Sam Adams talk
The Sam Adams talkRoderic Page
 
Unknown knowns, long tails, and long data
Unknown knowns, long tails, and long dataUnknown knowns, long tails, and long data
Unknown knowns, long tails, and long dataRoderic Page
 
In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...Roderic Page
 
BHL, BioStor, and beyond
BHL, BioStor, and beyondBHL, BioStor, and beyond
BHL, BioStor, and beyondRoderic Page
 
Cisco Digital Catapult
Cisco Digital CatapultCisco Digital Catapult
Cisco Digital CatapultRoderic Page
 
Built in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stBuilt in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stRoderic Page
 
Two graphs, three responses
Two graphs, three responsesTwo graphs, three responses
Two graphs, three responsesRoderic Page
 
GrBio Workshop talk
GrBio Workshop talkGrBio Workshop talk
GrBio Workshop talkRoderic Page
 
Biodiversity Knowledge Graphs
Biodiversity Knowledge GraphsBiodiversity Knowledge Graphs
Biodiversity Knowledge GraphsRoderic Page
 
Visualing phylogenies: a personal view
Visualing phylogenies: a personal viewVisualing phylogenies: a personal view
Visualing phylogenies: a personal viewRoderic Page
 
Biodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldBiodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldRoderic Page
 
Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Roderic Page
 
GBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaGBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaRoderic Page
 
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge GraphBuilding the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge GraphRoderic Page
 
Biodiversity informatics: why aren't we there yet?
Biodiversity informatics: why aren't we there yet?Biodiversity informatics: why aren't we there yet?
Biodiversity informatics: why aren't we there yet?Roderic Page
 
Something about links
Something about linksSomething about links
Something about linksRoderic Page
 
Why I blog instead of writing papers
Why I blog instead of writing papersWhy I blog instead of writing papers
Why I blog instead of writing papersRoderic Page
 
Surfacing the deep data of taxonomy
Surfacing the deep data of taxonomySurfacing the deep data of taxonomy
Surfacing the deep data of taxonomyRoderic Page
 

Mais de Roderic Page (20)

The Sam Adams talk
The Sam Adams talkThe Sam Adams talk
The Sam Adams talk
 
Unknown knowns, long tails, and long data
Unknown knowns, long tails, and long dataUnknown knowns, long tails, and long data
Unknown knowns, long tails, and long data
 
In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...In praise of grumpy old men: Open versus closed data and the challenge of cre...
In praise of grumpy old men: Open versus closed data and the challenge of cre...
 
BHL, BioStor, and beyond
BHL, BioStor, and beyondBHL, BioStor, and beyond
BHL, BioStor, and beyond
 
Cisco Digital Catapult
Cisco Digital CatapultCisco Digital Catapult
Cisco Digital Catapult
 
Built in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21stBuilt in the 19th century, rebuilt for the 21st
Built in the 19th century, rebuilt for the 21st
 
Two graphs, three responses
Two graphs, three responsesTwo graphs, three responses
Two graphs, three responses
 
GrBio Workshop talk
GrBio Workshop talkGrBio Workshop talk
GrBio Workshop talk
 
Biodiversity Knowledge Graphs
Biodiversity Knowledge GraphsBiodiversity Knowledge Graphs
Biodiversity Knowledge Graphs
 
Visualing phylogenies: a personal view
Visualing phylogenies: a personal viewVisualing phylogenies: a personal view
Visualing phylogenies: a personal view
 
Biodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living worldBiodiversity informatics: digitising the living world
Biodiversity informatics: digitising the living world
 
Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21Ebbe Nielsen Challenge GBIF #gb21
Ebbe Nielsen Challenge GBIF #gb21
 
GBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, IndiaGBIF Science Committee Report GB21, Delhi, India
GBIF Science Committee Report GB21, Delhi, India
 
Building the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge GraphBuilding the Biodiversity Knowledge Graph
Building the Biodiversity Knowledge Graph
 
GBIF ideas
GBIF ideasGBIF ideas
GBIF ideas
 
Biodiversity informatics: why aren't we there yet?
Biodiversity informatics: why aren't we there yet?Biodiversity informatics: why aren't we there yet?
Biodiversity informatics: why aren't we there yet?
 
Something about links
Something about linksSomething about links
Something about links
 
Why I blog instead of writing papers
Why I blog instead of writing papersWhy I blog instead of writing papers
Why I blog instead of writing papers
 
Social media
Social mediaSocial media
Social media
 
Surfacing the deep data of taxonomy
Surfacing the deep data of taxonomySurfacing the deep data of taxonomy
Surfacing the deep data of taxonomy
 

Último

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 

Approximating the subtree distance between phylogenies. C Semple

  • 1. Fixed-Parameter and Approximation Algorithms for the Subtree Distance Between Phylogenies Charles Semple Biomathematics Research Centre Department of Mathematics and Statistics University of Canterbury New Zealand Joint work with Magnus Bordewich, Catherine McCartin e-Science Institute, Edinburgh 2007
  • 2. Subtree Prune and Regraft (SPR) Example. r r 1 SPR bc d a a bc d T1 S r 1 SPR a d b c T2
  • 3. Applications of SPR Used I. As a search tool for selecting the best tree in reconstruction algorithms. II. To quantify the dissimilarity between two phylogenetic trees. III. To provide a lower bound on the number of reticulation events in the case of non-tree-like evolution. For II and III, one wants the minimum number of SPR operations to transform one phylogeny into another. This number is the SPR distance between two phylogenies S and T.
  • 4. The Mathematical Problem MINIMUM SPR Instance: Two rooted binary phylogenetic trees S and T. Goal: Find a minimum length sequence of single SPR operations that transforms S into T. Measure: The length of the sequence. Notation: Use dSPR(S, T) to denote this minimum length. Theorem (Bordewich, S 2004) MINIMUM SPR is NP-hard. Overriding goal is to find (with no restrictions) the exact solution or a heuristic solution with a guarantee of closeness.
  • 5. Algorithms for NP-Hard Problems Fixed-parameter algorithms are a practical way to find optimal solutions if the parameter measuring the hardness of the problem is small. Polynomial-time approximation algorithms can efficiently find feasible solutions that are sometimes arbitrarily close to the optimal solution.
  • 6. Agreement Forests A forest of T is a disjoint Example. r collection of phylogenetic subtrees whose union of leaf sets is X U r. a cd e f b S r a cd e f b F1
  • 7. Agreement Forests A forest of T is a disjoint Example. r collection of phylogenetic subtrees whose union of leaf sets is X U r. a cd e f b S r a cd e f b F1
  • 8. Agreement Forests An agreement forest for S and T is a forest of both S and T. r r Example. a cd e f d fa b c b e S T r r a cd e f a cd e f b b F1 F2
  • 9. Agreement Forests An agreement forest for S and T is a forest of both S and T. r r Example. a cd e f d fa b c b e S T r r a cd e f a cd e f b b F1 F2
  • 10. Agreement Forests An agreement forest for S and T is a forest of both S and T. r r Example. e a cd f d fa b c b e S T r r a cd e f a cd e f b b F1 F2
  • 11. Theorem. (Bordewich, S, 2004) Let S and T be two binary phylogenetic trees. Then dSPR(S,T) = size of maximum-agreement forest - 1. o It’s fast to construct from a maximum-agreement forest for S and T a sequence of SPR operations that transforms S into T.
  • 12. Reducing the Size of the Instance Subtree Reduction Chain Reduction
  • 13. Fixed-Parameter Algorithms The underlying idea is to find an algorithm whose running time separates the size of the problem instance from the parameter of interest. One way to obtain such an algorithm is to reduce the size of the problem instance, while preserving the optimal value (kernalizing the problem). Are the subtree and chain reductions enough to kernalize the problem?
  • 14. Fixed-Parameter Algorithms Lemma. If n’ denotes the size of the leaf sets of the fully reduced trees using the subtree and chain reductions, then n’ < 28dSPR(S,T). Corollary. (Bordewich, S 2004) MINIMUM SPR is fixed-parameter tractable. 1. Repeatedly apply the subtree and chain rules. 2. Exhaustively find a maximum-agreement forest by deleting edges in S and comparing with T. Running time is O((56k)k + p(n)) compared with O((2n)k), where k=dSPR(S,T) and p(n) is the polynomial bound for reducing the trees using the subtree and chain reductions.
  • 15. Fixed-Parameter Algorithms A second way to obtain a fixed-parameter algorithm is using the method of bounded search trees. Preliminaries o A rooted triple is a binary tree with three leaves. a bc ab|c o A tree S displays ab|c if the last common ancestor of ab is a descendant of the last common ancestor of ac.
  • 16. r Fixed-Parameter Algorithms r Observation. If F is an agreement d fa b c e forest for S and T, then F can T be obtained from S by deleting |F|-1 edges. a cd e f b Recall. If F is maximum, then S |F|-1=dSPR(S,T).
  • 17. r Fixed-Parameter Algorithms r Observation. If F is an agreement d fa b c e forest for S and T, then F can T be obtained from S by deleting |F|-1 edges. a cd e f b Recall. If F is maximum, then S |F|-1=dSPR(S,T).
  • 18. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e not in T. T 2. Branch into 4 computational paths ex, ey, ez, er, and repeat for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components.
  • 19. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e not in T. T 2. Branch into 4 computational paths ex, ey, ez, er, and repeat for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components.
  • 20. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e not in T. T 2. Branch into 4 computational paths ex, ey, ez, er, and repeat ea eb for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components.
  • 21. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e not in T. T er 2. Branch into 4 computational paths ex, ey, ez, er, and repeat ea eb for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components.
  • 22. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e not in T. T er 2. Branch into 4 computational ed paths ex, ey, ez, er, and repeat ea eb for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components.
  • 23. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e not in T. T er 2. Branch into 4 computational ed paths ex, ey, ez, er, and repeat ea eb for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components. ed ea eb er
  • 24. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e not in T. T er 2. Branch into 4 computational ed paths ex, ey, ez, er, and repeat ea eb for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components. ed ea eb er
  • 25. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e not in T. T 2. Branch into 4 computational paths ex, ey, ez, er, and repeat for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components. ed ea eb er
  • 26. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e er not in T. T 2. Branch into 4 computational ee paths ex, ey, ez, er, and repeat ea eb for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components. ed ea eb er
  • 27. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e er not in T. T 2. Branch into 4 computational ee paths ex, ey, ez, er, and repeat ea eb for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components. ed ea eb er
  • 28. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e not in T. T er 2. Branch into 4 computational ed paths ex, ey, ez, er, and repeat ea eb for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components. ed ea eb er
  • 29. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e not in T. T er 2. Branch into 4 computational ed paths ex, ey, ez, er, and repeat ea eb for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components. ed ea eb er
  • 30. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e not in T. T 2. Branch into 4 computational paths ex, ey, ez, er, and repeat for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components. ed ea eb er
  • 31. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e not in T. T 2. Branch into 4 computational paths ex, ey, ez, er, and repeat for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components. ed ea eb er
  • 32. r Fixed-Parameter Algorithms vst r 1. Find a minimal triple xy|z in S d fa b c e not in T. T 2. Branch into 4 computational paths ex, ey, ez, er, and repeat for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components. ed ea eb er
  • 33. r Fixed-Parameter Algorithms vst r es 1. Find a minimal triple xy|z in S d fa b c e not in T. T 2. Branch into 4 computational paths ex, ey, ez, er, and repeat et for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components. ed ea eb er
  • 34. r Fixed-Parameter Algorithms vst r es 1. Find a minimal triple xy|z in S d fa b c e not in T. T 2. Branch into 4 computational paths ex, ey, ez, er, and repeat et for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components. ed ea eb es er et
  • 35. r Fixed-Parameter Algorithms r 1. Find a minimal triple xy|z in S d fa b c e not in T. T 2. Branch into 4 computational paths ex, ey, ez, er, and repeat for each path until no such triple. a cd e f b 3. For each path, find a pair of S overlapping components ts and tt and branch into 2 paths es, et. Repeat until no such components. ed ea eb es er et
  • 36. r Fixed-Parameter Algorithms r Key Lemma. dSPR(S,T) <= k if and only d fa b c e if one of the computational T paths of length at most k results in an agreement forest for S and T. a cd e f b Theorem. (Bordewich, McCartin, S S 2007) The bounded search tree algorithm has running time O(4kn4). ed Combining the methods of kernalization and bounded ea search gives an FPT algorithm eb with running time O(4kk4+p(n)). es er et
  • 37. Approximation Algorithms Polynomial-time approximation algorithms can efficiently find feasible solutions that are sometimes arbitrarily close to the optimal solution. An r-approximation algorithm A for MINIMUM SPR means that the size of an agreement forest (minus 1) for S and T returned by A is at most r times dSPR(S,T). Example. If r=3, then the size of any agreement forest (minus 1) for S and T returned by A is at most 3 times dSPR(S,T). Based on a pairwise comparison, Bonet, St. John, Mahindru, Amenta 2006 establish a 5-approximation algorithm (O(n)) for MINIMUM SPR.
  • 38. r Approximation Algorithms r 1. Find a minimal triple xy|z in S d fa b c e not in T. T 2. Delete each of ex, ey, ez, er, and repeat until no such triple. 3. Find a pair of overlapping components ts and tt and delete a cd e f b es, et. Repeat until no such S components. Theorem. (Bordewich, McCartin, S ed 2007) This gives a 4-approximation alg. for MINIMUM SPR (O(n5)). ea eb Deleting only ex, ez, er gives a 3- es approximation algorithm. er et
  • 39. No r, unless P=NP Exact (B, S 07) (B, M,S 07) algorithms 1 + 1/2112 3 r=1 MINIMUM SPR 5 Maximum 1. Travelling (B S,M, A 06)salesman independent set in planar problem graphs (PTAS) 2. Maximum clique
  • 40. Modelling Hybridization Events with SPR Operations Reticulation processes cause … molecular phylogeneticists will species to be a composite of have failed to find the `true DNA regions derived from tree’, not because their different ancestors. methods are inadequate or because they have chosen the wrong genes, but Processes include because the history of life o horizontal gene transfer, cannot be properly o hybridization, and represented as a tree. o recombination. Ford Doolittle, 1999 (Dalhousie University)
  • 41. Modelling Hybridization Events with SPR Operations A single SPR operation models a single hybridization event (Maddison 1997). r r Example. bc d a a bc d T S r a bc d H
  • 42. Modelling Hybridization Events with SPR Operations A single SPR operation models a single hybridization event (Maddison 1997). r r Example. bc d a a bc d T S r a bc d H
  • 43. Modelling Hybridization Events with SPR Operations A single SPR operation models a single hybridization event (Maddison 1997). r r Example. bc d a a bc d T S r a bc d H
  • 44. Modelling Hybridization Events with SPR Operations A single SPR operation models a single hybridization event (Maddison 1997). r r Example. bc d a a bc d T S r r deleting hybrid edges a d b a bc d c H F
  • 45. A Fundamental Problem for Biologists Given an initial set of data that correctly repesents the tree-like evolution of different parts of various species genomes, what is the smallest number of reticulation events required that simultaneously explains the evolution of the species? This smallest number o Provides a lower bound on the number of such events. o Indicates the extent that hybridization has had on the evolutionary history of the species under consideration. Since 1930’s botantists have asked the question: How significant has the effect of hybridization been on the New Zealand flora?
  • 46. Trees and Hybridization Networks H explains T if T can be obtained from a rooted subtree of H by suppressing degree-2 vertices. Example. b d a bc d a c T S c d c d a b a b H1 H2
  • 47. Trees and Hybridization Networks H explains T if T can be obtained from a rooted subtree of H by suppressing degree-2 vertices. Example. b d a bc d a c T S c d c d a b a b H1 H2
  • 48. Trees and Hybridization Networks H explains T if T can be obtained from a rooted subtree of H by suppressing degree-2 vertices. Example. b d a bc d a c T S c d c d a b a b H1 H2
  • 49. The Mathematical Problem MINIMUM HYBRIDIZATION Instance: Two rooted binary phylogenetic trees S and T. Goal: Find a hybridization network H that explains S and T, and minimizes the number of hybridization vertices. Measure: The number of hybridization vertices in H. Notation: Use h(S, T) to denote this minimum number.
  • 50. Example: Arbitrary SPR operations not sufficient. r a cd e f b F1
  • 51. o A sequence of SPR operations that avoids creating directed cycles to make a hybridization network that explains S and T. o If one minimizes the length of an (acyclic) sequence, does the resulting network minimize the number of hybridization events to explain S and T? o YES, and such a sequence corresponds to an acyclic-agreement forest.
  • 52. Theorem. (Baroni, Grünewald, Moulton, S, 2005) Let S and T be two binary phylogenetic trees. Then h(S,T) = size of maximum-acyclic agreement forest - 1. o It’s fast to construct from a maximum-acyclic agreement for S and T a hybridization network that realizes h(S,T). Theorem. (Bordewich, S, 2007) MINIMUM HYBRIDIZATION is NP-hard.
  • 53. Reducing the Size of the Instance Subtree Reduction Chain Reduction
  • 54. Fixed-Parameter Algorithms Are the subtree and chain reductions enough to kernalize the problem? Lemma. If n’ denotes the size of the leaf sets of the fully reduced trees using the subtree and chain reductions, then n’<14h(S,T). Corollary. (Bordewich, S 2007) MINIMUM HYBRIDIZATION is fixed-parameter tractable. Running time is O((28k)k + p(n)) compared with O((2n)k), where k=h(S,T) and p(n) is the polynomial bound for reducing the trees using the subtree and chain reductions.
  • 55. Reducing the Size of the Instance Cluster Reduction (Baroni 2004) +
  • 56. A Grass (Poaceae) Dataset (Grass Phylogeny Working Group, Düsseldorf) o Ellstrand, Whitkus, Rieseburg 1996 (Distribution of spontaneous plant hybrids) o For each sequence, used fastDNAml to reconstruct a phylogenetic tree (H. Schmidt).
  • 57. Chloroplast (phyB) sequences Nuclear (ITS) sequences
  • 58. Chloroplast (phyB) sequences Nuclear (ITS) sequences
  • 59. Chloroplast (phyB) sequences Nuclear (ITS) sequences
  • 60. Chloroplast (phyB) sequences Nuclear (ITS) sequences
  • 61. Chloroplast (phyB) sequences Nuclear (ITS) sequences
  • 62. Chloroplast (phyB) sequences Nuclear (ITS) sequences
  • 63. h=3 h=1 h=4 h=0 Chloroplast (phyB) sequences Nuclear (ITS) sequences
  • 64. pairwise combination # overlapping h(S,T) running time taxa 2000 MHz CPU, 2GB RAM ndhF phyB 40 14 11h ndhF rbcL 36 13 11.8h ndhF rpoC2 34 12 26.3h ndhF waxy 19 9 320s ndhF ITS 46 at least 15 phyB rbcL 21 4 1s phyB rpoC2 21 7 180s phyB waxy 14 3 1s phyB ITS 30 8 19s rbcL rpoC2 26 13 29.5h rbcL waxy 12 7 230s rbcL ITS 29 at least 9 rpoC2 waxy 10 1 1s rpoC2 ITS 31 at least 10 waxy ITS 15 8 620s Bordewich, Linz, St John, S, 2007
  • 65. Computing dSPR(S,T) and h(S,T) dSPR(S,T) h(S,T) 1. FPT using kernalization and 1. FPT using kernalization only bounded search tree (O((28k)k +p(n))). Unknown if methods (O(4kk4+p(n))). a bounded search tree method exists. 2. Cluster-based reduction. 2. No cluster-based reduction. 3. Unknown if there even is an 3. 3-approximation algorithm. approximation algorithm.
  • 66. Reducing the Size of the Instance Subtree Reduction Chain Reduction x3 an x2 an x1 a2 a2 x3 T’ a1 a1 x2 x1 S T S’