SlideShare a Scribd company logo
1 of 117
Genome Rearrangements:
   from Biological Problem to Combinatorial Algorithms
                        (and back)



                   Pavel Pevzner
Department of Computer Science, University of California at San Diego
Genome Rearrangements
                                      Mouse X chromosome


Unknown ancestor
~ 80 M years ago

                                      Human X chromosome

 What is the evolutionary scenario
  for transforming one genome into
  the other?

 What is the organization of the
  ancestral genome?

 Are there any rearrangement
  hotspots in mammalian genomes?
Genome Rearrangements:
               Evolutionary Scenarios

Unknown ancestor
~ 80 M years ago




 What is the evolutionary scenario
  for transforming one genome into
  the other?

 What is the organization of the
  ancestral genome?

 Are there any rearrangement         Reversal flips a segment of
                                           a chromosome
  hotspots in mammalian genomes?
Genome Rearrangements:
                Ancestral Reconstruction




 What is the evolutionary scenario
  for transforming one genome into
  the other?

 What is the organization of the
  ancestral genome?

 Are there any rearrangement
  hotspots in mammalian genomes?
                                      Bourque, Tesler, PP, Genome Res.
Genome Rearrangements:
               Evolutionary “Earthquakes




 What is the evolutionary scenario
  for transforming one genome into
  the other?

 What is the organization of the
  ancestral genome?

 Are there any rearrangement
  hotspots in mammalian genomes?
Rearrangement Hotspots in Tumor Genomes

 Rearrangements may
  disrupt genes and alter
  gene regulation.
 Example: rearrangement in leukemia yields
  “Philadelphia” chromosome:
  Chr 9

           promoter   ABL gene      promoter   BCR gene
  Chr 22

           promoter   BCR gene      promoter c-ab1 oncogene


 Thousands of rearrangements hotspots known for different
  tumors.
Rearrangement Hotspots in Tumor Genomes

MCF7 breast cancer
cell line




 What is the evolutionary scenario
  for transforming one genome into
  the other?

 What is the organization of the
  ancestral genome?

 Where are the rearrangement
  hotspots in mammalian
  genomes?
Three Evolutionary Controversies



  Primate -
   Rodent -
Carnivore Split


Rearrangement
  Hotspots


   Whole
  Genome
 Duplications
Primate–Rodent vs. Primate–Carnivore Split
 Hutley et al., MBE, May 07:
 We have demonstrated with very high     primate-rodent primate-carnivore
 confidence that the rodents diverged         split           split
 before carnivores and primates

 Lunter, PLOS CB, April 07:
 It appears unjustified to continue to
 consider the phylogeny of primates,
 rodents, and canines as contentious




 Arnason et al, PNAS 02
 Canarozzi, PLOS CB 06


 Murphy et al,, Science 01


 Kumar&Hedges, Nature, 98
Can Rearrangement Analysis Resolve the
 Primate–Rodent–Carnivore Controversy?
Susumu Ohno: Two Hypothesis

                 Ohno, 1970, 1973

               Random Breakage
Rearrangement   Hypothesis:
  Hotspots      Genomic architectures are
                shaped by rearrangements
                that occur randomly (no fragile
                regions).

   Whole          Whole Genome Duplication
  Genome           (WGD) Hypothesis: Big
 Duplications      leaps in evolution would have
                   been impossible without
                   whole genome duplications.
Random Breakage Model (RBM)
                 The random breakage hypothesis was embraced
                  by biologists and has become de facto theory of
                  chromosome evolution.
                 Nadeau & Taylor, Proc. Nat'l Acad. Sciences 1984
                  First convincing arguments in favor of the
Rearrangement       Random Breakage Model (RBM)
  Hotspots        RBM implies that there is no rearrangement
                    hotspots
                  RBM was re-iterated in hundreds of papers
Random Breakage Model (RBM)
                 The random breakage hypothesis was embraced
                  by biologists and has become de facto theory of
                  chromosome evolution.
                 Nadeau & Taylor, Proc. Nat'l Acad. Sci. 1984
                  First convincing arguments in favor of the
Rearrangement       Random Breakage Model (RBM)
  Hotspots        RBM implies that there is no rearrangement
                    hotspots
                  RBM was re-iterated in hundreds of papers
                 PP & Tesler, Proc. Nat'l Acad. Sci. 2003
                  Rejected RBM and proposed the Fragile
                    Breakage Model (FBM)
                  Postulated existence of rearrangement hotspots
                    and vast breakpoint re-use
                  FBM implies that the human genome is a mosaic
                    of solid and fragile regions.
Are the Rearrangement Hotspots
               Real?
 The Fragile Breakage Model did not live long: in 2003
  David Sankoff presented convincing arguments against
  FBM:
“… we have shown that breakpoint re-use of the same
  magnitude as found in Pevzner and Tesler, 2003 may
  very well be artifacts in a context where NO re-use
  actually occurred.”
Are the Rearrangement Hotspots Real?
 The Fragile Breakage Model did not live long: in 2003
  David Sankoff presented arguments against FBM
  (Sankoff & Trinh, J. Comp. Biol, 2005)
    “… we have shown that breakpoint re-use of
    the same magnitude as found in Pevzner and
    Tesler, 2003 may very well be artifacts in a
    context where NO re-use actually occurred.”

    Before you criticize people, you should
    walk a mile in their shoes. That way,
    when you criticize them, you are a mile
    away. And you have their shoes.
                             J.K. Lambert
Rebuttal of the Rebuttal of the Rebuttal
   Peng et al., 2006 (PLOS Computational Biology)
    found an error in Sankoff-Trinh rebuttal:
     …If Sankoff & Trinh fixed their ST-Synteny
    algorithm, they would confirm rather than
    reject Pevzner-Tesler’s Fragile Breakage Model

   Sankoff, 2006 (PLOS Computational Biology):
     …Not only did we foist a hastily conceived
    and incorrectly executed simulation on an
    overworked RECOMB conference program
    committee, but worse — nostra maxima culpa
    — we obliged a team of high-powered
    researchers to clean up after us!
Rebuttal of the Rebuttal of the Rebuttal:
  Controversy Resolved... Or Was It?


 Nearly all recent studies support the Fragile
  Breakage Model: van der Wind, 2004,
  Bailey, 2004, Zhao et al., 2004, Murphy et
  al., 2005, Hinsch & Hannenhalli, 2006,
  Ruiz-Herrera et al., 2006, Yue & Haaf,
  2006, Mehan et al., 2007, etc
  Kikuta et al., Genome Res. 2007: “... the
  Nadeau and Taylor hypothesis is not
  possible for the explanation of synteny”
Random vs. Fragile Breakage Debate Continues:
          Complex Rearrangements


 PP & Tesler, PNAS 2003, argued that every evolutionary
  scenario for transforming Mouse into Human genome
  with reversals, translocations, fusions, and fissions
  must result in a large number of breakpoint re-uses, a
  contradiction to the RBM.

 Sankoff, PLoS Comp. Biol. 2006: “We cannot infer
  whether mutually randomized synteny block orderings
  derived from two divergent genomes were created ...
  through processes other than reversals and
  translocations.”
Whole Genome Duplication (WGD)




  Whole
 Genome
Duplications
Genome Rearrangements After WGD
Genome Rearrangements After WGD
Genome Rearrangements After WGD
Whole Genome Duplication Hypothesis Was
 Confirmed After Years of Controversy
                                     “There was a whole-genome
                                      duplication.”
                                     Wolfe, Nature, 1997
                                     “There was no whole-genome
The Whole Genome                     duplication.” Dujon, FEBS,
                                      2000
 Duplication hypothesis first met    “Duplications occurred
                                      independently”      Langkjaer,
 with skepticism but was finally      JMB, 2000
                                     “Continuous duplications”
 confirmed by                         Dujon, Yeast 2003
 Kellis et al., Nature 2004: “Our    “Multiple duplications”
                                      Friedman, Gen. Res, 2003
                                     “Spontaneous duplications”
 analysis resolves the long-          Koszul, EMBO, 2004
 standing controversy on the
 ancestry of the yeast genome.”
Whole Genome Duplication Hypothesis Was Eventually
 Confirmed...But Some Scientists Are Not Convinced

                                           “There was a whole-genome
 Kellis, Birren & Lander, Nature 2004:     duplication.”
                                           Wolfe, Nature, 1997
  “Our analysis resolves the long-         “There was no whole-genome

  standing controversy on the ancestry      duplication.” Dujon, FEBS,
                                            2000
  of the yeast genome.”                    “Duplications occurred
                                            independently”      Langkjaer,
 Martin et al., Biology Direct 2007:       JMB, 2000
  “We believe that the proposal of a       “Continuous duplications”

                                            Dujon, Yeast 2003
  Whole Genome Duplication in the          “Multiple duplications”

  yeast lineage is unwarranted.”            Friedman, Gen. Res, 2003
                                           “Spontaneous duplications”

 To address Martin et al., 2007            Koszul, EMBO, 2004
                                           “Our analysis resolved the
  arguments against WGD, it would           controversy”
  be useful to reconstruct the pre-         Kellis, Nature, 2004
                                           “WGD in the yeast lineage is
  duplicated genomes.                       unwarranted”
                                            Martin, Biology Direct, 2007
                                             ...
Genome Halving Problem




Reconstruction of
the pre-duplicated ancestor:
Genome Halving Problem
From Biological Controversies to
        Algorithmic Problems


  Primate -                 Ancestral
   Rodent -                  Genome
Carnivore Split           Reconstruction


Rearrangement               Breakpoint
  Hotspots                    Re-use
                             Analysis


   Whole                    Genome
  Genome                    Halving
 Duplications               Problem
Algorithmic Background:

Genome Rearrangements
         and
  Breakpoint Graphs
Unichromosomal Genomes:
                  Reversal Distance

• Sorting by reversals: find the shortest series of reversals
  transforming one uni-chromosomal genome into another.
• The number of reversals in such a shortest series is the
  reversal distance between genomes.
• Hannenhalli and PP. (FOCS 1995) gave a polynomial
  algorithm for computing the reversal distance.
  .
 Step    0:       2   -4   -3    5   -8   -7   -6   1
 Step    1:       2    3    4    5   -8   -7   -6   1
 Step    2:      -5   -4   -3   -2   -8   -7   -6   1
 Step    3:      -5   -4   -3   -2   -1    6    7   8
                                                        Sorting by prefix reversals
 Step    4:       1    2    3    4    5    6    7   8   (pancake flipping problem),
                                                        Gates and Papadimitriou,
                                                        Discrete Appl. Math. 1976
Multichromosomal Genomes:
                Genomic Distance
 Genomic Distance between two
  genomes is the minimum number
  of reversals, translocations, fusions,
  and fissions required to transform
  one genome into another.

 Hannenhalli and PP
  (STOC 1995) gave a polynomial
  algorithm for computing the genomic distance.

 These algorithms were followed by many improvements:
  Kaplan et al. 1999, Bader et al. 2001, Tesler 2002, Ozery-Flato
  & Shamir 2003, Tannier & Sagot 2004, Bergeron 2001-07, etc.
Simplifying HP Theory:
    from Linear to Circular Chromosomes

            b
                      A chromosome is represented as a
    a            c    cycle with directed red and
                      undirected black edges:
            d

a       b       c d   red edges encode blocks (genes)

                      black edges connect adjacent
                      blocks
Reversals on Circular Chromosomes

            b                                b
                      reversal
    a            c                   a               c
            d                                d

a       b       c d              a       b       d       c


A reversal replaces two black edges with two other
black edges
Fissions

              b                               b
                        fission
      a            c                  a           c
              d                               d

  a       b       c d             a       b       c d

 A fission splits a single cycle (chromosome) into two.
 A fissions replaces two black edges with two other
  black edges.
Translocations / Fusions

             b                              b
                       fusion
     a            c                 a           c
             d                              d

 a       b       c d            a       b       c d
 A translocation/fusion merges two cycles
  (chromosomes) into a single one.
 They also replace two black edges with two other
  black edges.
2-Breaks

            b                                  b
                         2-break
     a            c                     a            c
             d                                  d

 A 2-Break replaces any 2 black edges with another 2 black
  edges forming matching on the same 4 vertices.
 Reversals, translocations, fusions, and fissions represent all
  possible types of 2-breaks (introduced as DCJ operations by
  Yancopoulos et al., 2005)
2-Break Distance




 The 2-break distance d2(P,Q) is the minimum
  number of 2-breaks required to transform P into Q.

 In contrast to the genomic distance, the 2-break
  distance is easy to compute.
Two Genomes as
Black-Red and Green-Red Cycles

    b

a   P       c   a   b   c d
        d

    c

a   Q       b   a   c   b     d
    d
“Q-style” representation of P

    b

a   P       c
                         c
        d
                     a          b
    c
                         d
a   Q       b

    d
“Q-style” representation of P

    b

a   P       c
                         c
        d
                     a          b
    c
                         d
a   Q       b

    d
Breakpoint Graph:
    Superposition of Genome Graphs


    b                 Breakpoint Graph
                     (Bafna & PP, FOCS 1994)
a   P       c
                                  c
        d                        G(P,Q)
                            a             b
    c
                                 d
a   Q       b

    d
Breakpoint Graph:
GLUING Red Edges with the Same Labels


    b                 Breakpoint Graph
                    (Bafna & PP, FOCS 1994)
a   P       c
                                 c
        d                       G(P,Q)
                           a             b
    c
                                d
a   Q       b

    d
Black-Green Alternating Cycles

 Black and green edges form
  perfect matchings in the
  breakpoint graph. Therefore,       c
  these edges form a collection
  of black-green alternating    a          b
  cycles
                                    d
 Let cycles(P,Q) be the number
  of black-green cycles.        cycles(P,Q)=2
Rearrangements Change Breakpoint
          Graphs and cycle(P,Q)
Transforming genome P into genome Q by 2-breaks
corresponds to transforming the breakpoint graph
G(P,Q) into the identity breakpoint graph G(Q,Q).
       c                c                  c
      G(P,Q)
               b              b        G(Q,Q)      b
a                  a                a
                    G(P',Q)         trivial cycles
      d                d                  d
cycles(P,Q) = 2    cycles(P',Q) = 3 cycles(Q,Q) = 4
                                    =#blocks
Sorting by 2-Breaks

                    2-breaks
      P=P0 → P1 → ... → Pd= Q

     G(P,Q) → G(P1,Q) → ... → G(Q,Q)

cycles(P,Q) cycles → ... → #blocks cycles

      # of black-green cycles increased by
              #blocks - cycles(P,Q)

 How much each 2-break can contribute to the
     increase in the number of cycles?
Each 2-Break Increases
              #Cycles by at Most 1

A 2-break:

 adds 2 new black edges and thus creates at most 2
  new cycles (containing two new black edges)

 removes 2 black edges and thus destroys at least 1
  old cycle (containing two old edges):

  change in the number of cycles: Δcycles ≤ 2-1=1.
2-Break Distance
 Any 2-break increases the number of cycles by at most one
  (Δcycles ≤ 1)

 Any non-trivial cycle can be split into two cycles with a 2-
  break (Δcycles = 1)

 Every sorting by 2-breaks must increase the number of
  cycles by #blocks - cycles(P,Q)

 The 2-break distance between genomes P and Q:

            d2(P,Q) = #blocks - cycles(P,Q)

  (cp. Yancopoulos et al., 2005, Bergeron et al., 2006)
Complex Rearrangements: Transpositions

 Sorting by Transpositions
  Problem: find the shortest
  sequence of transpositions
  transforming one genome into
  another.

 First 1.5-approximation algorithm
  was given by Bafna and P.P.
  SODA 1995. The best known
  result: 1.375-approximation
  algorithm of Elias and Hartman,
  2005.

 The complexity status remains
  unknown.
Transpositions

             b                               b
                       transposition
     a            c                    a             c
             d                               d
 a       b       c d                   c a       b       d

Transpositions cut off a segment of one chromosome
and insert it at some position in the same or another
chromosome
Transpositions Are 3-Breaks


           b                                  b
                         3-break
     a            c                     a           c
             d                                 d


 3-Break replaces any triple of black edges with another triple
  forming matching on the same 6 vertices.

 Transpositions are 3-Breaks.
Sorting by 3-Breaks

                    3-breaks
     P=P0 → P1 → ... → Pd= Q

    G(P,Q) → G(P1,Q) → ... → G(Q,Q)

cycles(P,Q) cycles → ... → #blocks cycles

      # of black-green cycles increased by
              #blocks - cycles(P,Q)

How much each 3-break can contribute to this
               increase?
Each 3-Break Increases
              #Cycles by at Most 2

A 3-Break:

 adds 3 new black edges and thus creates at most 3
  new cycles (containing three new black edges)

 removes 3 black edges and thus destroys at least 1
  old cycle (containing three old edges):

  change in the number of cycles: Δcycle ≤ 3-1=2.
3-Break Distance
 Any 3-break increases the number of cycles by at most TWO
  (Δcycles ≤2)

 Any non-trivial cycle can be split into three cycles with a 3-break
  (Δcycles = 2)

 Every sorting by 3-breaks must increase the number of cycles by
  #blocks - cycles(P,Q)

 The 3-break distance between genomes P and Q:

             d3(P,Q) = (#blocks - cycles(P,Q))/2
3-Break Distance
 Any 3-break increases the number of cycles by at most TWO
  (Δcycles ≤2)

 Any non-trivial cycle can be split into three cycles with a 3-break
  (Δcycles = 2) – WRONG STATEMENT

 Every sorting by 3-breaks must increase the number of cycles by
  #blocks - cycles(P,Q)

 The 3-break distance between genomes P and Q:

            d3(P,Q) = (#blocks - cycles(P,Q))/2
3-Break Distance: Focus on Odd Cycles

 A 3-break can increase the number of odd cycles (i.e.,
  cycles with odd number of black edges) by at most 2
  (Δcyclesodd ≤ 2)

 A non-trivial odd cycle can be split into three odd cycles
  with a 3-break (Δcyclesodd = 2)

 An even cycle can be split into two odd cycles with a 3-
  break (Δcyclesodd=2)

 The 3-Break Distance between genomes P and Q is:

        d3(P,Q) = ( #blocks – cyclesodd(P,Q) ) / 2
Polynomial Algorithm for Multi-Break
                       Rearrangements
 The formulas for k-break distance become complex as k grows
  (without an obvious pattern):




The formula for d20(P,Q) contains over 1,500 terms!

 Alekseyev & PP (SODA 07, Theoretical Computer Science 08) Exact formulas for the
 k-break distance as well as a linear-time algorithm for computing it.
Breakpoint
                               Re-use
                              Analysis




       Algorithmic Problem

   Searching for Rearrangement
Hotspots (Fragile Regions) in Human
             Genome
Random vs. Fragile Breakage Debate: Complex
              Rearrangements


 PP & Tesler, PNAS 2003, argued that every evolutionary
  scenario for transforming Mouse into Human genome
  with reversals and translocations must result in a large
  number of breakpoint re-uses, a contradiction to the
  RBM.

 Sankoff, PLoS CB 2006: “We cannot infer whether
  mutually randomized synteny block orderings derived
  from two divergent genomes were created ... through
  processes other than reversals and translocations.”
  (read: “transpositions” or 3-breaks)
Random Breakage Theory
                   Re-Re-Re-Re-Re-Visited
   Ohno, 1970, Nadeau & Taylor, 1984 introduced RBM

   Pevzner & Tesler, 2003 (PNAS) argued against RBM

   Sankoff & Trinh, 2004 (RECOMB, JCB) argued against
    Pevzner & Tesler, 2003 arguments against RBM

   Peng et al., 2006 (PLOS CB) argued against
    Sankoff & Trinh, 2004 arguments against
    Pevzner & Tesler, 2003 arguments against RBM

   Sankoff, 2006 (PLOS CB) acknowledged an error in Sankoff&Trinh, 2004
    but came up with a new argument against
    Pevzner and Tesler, 2003 arguments against RBM

   Alekseyev and Pevzner, 2007 (PLOS CB) argued against
    Sankoff, 2006 new argument against
    Pevzner & Tesler, 2003 arguments against RBM
Are There Rearrangement Hotspots in
           Human Genome?
Are There Rearrangement Hotspots
          in Human Genome?

“Theorem”. Yes
Are There Rearrangement Hotspots in
             Human Genome?
Theorem. Yes
Proof:
• The formula for the 3-break distance implies that there
  were at least d3(Human,Mouse) = 139 rearrangements
  between human and mouse (including transpositions)
Are There Rearrangement Hotspots in
             Human Genome?
Theorem. Yes
Proof:
• The formula for 3-break distance implies that there
  were at least d3(Human,Mouse) = 139 rearrangements
  between human and mouse (including transpositions)
• Every 3-break creates up to 3 breakpoints
Are There Rearrangement Hotspots in
             Human Genome?
Theorem. Yes
Proof:
• The formula for 3-break distance implies that there
  were at least d3(Human,Mouse) = 139 rearrangements
  between human and mouse (including transpositions)
• Every 3-break creates up to 3 breakpoints
• If there were no breakpoint re-use then after 139
  rearrangements we may see 139*3=417 breakpoints
Are There Rearrangement Hotspots in
             Human Genome?
Theorem. Yes
Proof:
• The formula for 3-break distance implies that there
  were at least d3(Human,Mouse) = 139 rearrangements
  between human and mouse (including transpositions)
• Every rearrangement creates up to 3 breakpoints
• If there were no breakpoint re-use then after 139
  rearrangements we may see 139*3=417 breakpoints
• But there are only 281 breakpoints between Human
  and Mouse!
Are There Rearrangement Hotspots in
             Human Genome?
Theorem. Yes
Proof:
• The formula for 3-break distance implies that there
  were at least d3(Human,Mouse) = 139 rearrangements
  between human and mouse (including transpositions)
• Every rearrangement creates up to 3 breakpoints
• If there were no breakpoint re-use then after 139
  rearrangements we may see 139*3=417 breakpoints
• But there are only 281 breakpoints between Human
  and Mouse
• Is 417 larger than 281?
Are There Rearrangement Hotspots in
             Human Genome?
Theorem. Yes
Proof:
• The formula for 3-break distance implies that there
  were at least d3(Human,Mouse) = 139 rearrangements
  between human and mouse (including transpositions)
• Every rearrangement creates up to 3 breakpoints
• If there were no breakpoint re-use then after 139
  rearrangements we may see 139*3=417 breakpoints
• But there are only 281 breakpoints between Human
  and Mouse
• Is 417 larger than 281?
• Yes, 417 >> 281!
Are There Rearrangement Hotspots in
             Human Genome?
Theorem. Yes
Proof:
• The formula for 3-break distance implies that there
  were at least d3(Human,Mouse) = 139 rearrangements
  between human and mouse (including transpositions)
• Every rearrangement creates up to 3 breakpoints
• If there were no breakpoint re-use then after 139
  rearrangements we may see 139*3=417 breakpoints
• But there are only 281 breakpoints between Human
  and Mouse
• Is 417 larger than 281?
• Yes, 417 >> 281!
Transforming Human Genome into
     Mouse Genome by 3-Breaks
                          3-breaks
    Human = P0 → P1 → ... → Pd = Mouse

              d = d3(Human,Mouse) = 139

Our “proof” assumed that each 3-break makes 3
breakages in a genome, so the total number of breakages
made in this transformation is 3 * 139 = 417.
OOPS! The transformation may include 2-breaks (as a
particular case of 3-breaks). If every 3-break were a 2-break
then the total number of breakages is only 2*139 = 278 <
281, in which case there could be no breakpoint re-uses at
all.
Minimizing the Number of Breakages
Problem. Given genomes P and Q, find a series of k-breaks
transforming P into Q and making the smallest number of
breakages.

Theorem. Any series of k-breaks transforming P into Q
makes at least dk(P,Q)+d2(P,Q) breakages

Theorem. There exists a series of d3(P,Q) 3-breaks
transforming P into Q and making d3(P,Q)+d2(P,Q)
breakages.

d2(Human,Mouse) = 246
d3(Human,Mouse) = 139
minimum number of breakages = 246 + 139 = 385
Are There Rearrangement Hotspots in
             Human Genome?
Theorem. Yes
Proof:
• The formula for 3-break distance implies that there
  were at least d3(Human,Mouse) = 139 rearrangements
  between human and mouse (including transpositions)
• Every rearrangement creates up to 3 breakpoints
• If there were no breakpoint re-use then after 139
  rearrangements we SHOULD SEE AT LEAST 385
  breakpoints (rather than 417 as before)
• But there are only 281 breakpoints between Human
  and Mouse
• Yes, 385 >> 281!
Are there rearrangement hotspots in
             human genome?

 We showed that even if 3-break rearrangements
  were frequent, the argument against the RBM still
  stands (Alekseyev & PP, PLoS Comput. Biol.
  2007)

  We don’t believe that 3-breaks are frequent but
  even if they were, we proved that there exist fragile
  regions in the human genome. It is time to answer
  a more difficult question: “Where are the
  rearrangement hotspots in the human
  genome?” (the detailed analysis appeared in
  Alekseyev and PP, Genome Biol., Dec.1, 2010)
Ancestral
                          Genome
                       Reconstruction




     Algorithmic Problem:

Ancestral Genome Reconstruction
               and
   Multiple Breakpoint Graphs
Ancestral Genomes Reconstruction

 Given a set of genomes, reconstruct genomes
  of their common ancestors.
Algorithms for
   Ancestral Genomes Reconstruction

 GRAPPA: Tang, Moret, Warnow et al . (2001)

 MGR: Bourque and PP (Genome Res. 2002)

 InferCARs: Ma et al. (Genome Res. 2006)

 EMRAE: Zhao and Bourque (Genome Res. 2007)

 MGRA: Alekseyev and PP (Genome. Res.2009)
Ancestral Genomes Reconstruction
     Problem (with a known tree)
 Input: a set of k genomes and a phylogenetic tree T
 Output: genomes at the internal nodes of the tree T
  that minimize the total sum of the 2-break distances
  along the branches of T


 NP-complete in the “simplest” case of k=3.
 What makes it hard?
Ancestral Genomes Reconstruction
     Problem (with a known tree)
 Input: a set of k genomes and a phylogenetic tree T
 Output: genomes at the internal nodes of the tree T
 Objective: minimize the total sum of the 2-break
  distances along the branches of T


 NP-complete in the “simplest” case of k=3.
 What makes it hard? BREAKPOINTS RE-USE
How to Construct the Breakpoint Graph for
           Multiple Genomes?
       b                         b

       P       c                 R    d
   a                         a
           d                      c


       c

   a   Q       b

       d
Constructing Multiple Breakpoint Graph:
     rearranging P in the Q order
      b                          b

      P      c                   R   d
 a                          a
         d                       c


                       c
     c

 a   Q       b    a         b

     d                d
Constructing Multiple Breakpoint Graph:
     rearranging R in the Q order
      b                          b

      P      c                   R   d
 a                          a
         d                       c


                       c
     c

 a   Q       b    a         b

     d                d
Multiple Breakpoint Graph: Still
Gluing Red Edges with the Same Labels
     b                          b

     P       c                  R   d
 a                         a
         d                      c


                      c
     c                         Multiple
                               Breakpoint
                           b   Graph
 a   Q       b   a
                               G(P,Q,R)
     d               d
Multiple Breakpoint Graph of 6 Mammalian Genomes

Multiple
Breakpoint Graph
G(M,R,D,Q,H,C)
of the
Mouse,
Rat,
Dog,
macaQue,
Human, and
Chimpanzee
genomes.
Two Genomes:
Two Ways of Sorting by 2-Breaks
  Transforming P into Q with “black” 2-breaks:

        P = P0 → P1 → ... → Pd-1 → Pd = Q

  G(P,Q) → G(P1,Q) → ... → G(Pd,Q) = G(Q,Q)

  Transforming Q into P with “green” 2-breaks:

           Q = Q0 → Q1 → ... → Qd = P

   G(P,Q) → G(P,Q1) → ... → G(P,Qd) = G(P,P)

 Let's make a black-green chimeric transformation
Transforming G(P,Q) into (an unknown!) G(X,X)
  rather than into (a known) G(Q,Q) as before

 Let X be any genome on a path from P to Q:

  P = P0 → P1 → ... → Pm = X = Qm-d ← ... ← Q1 ← Q0 = Q

 Sorting by 2-breaks is equivalent to finding a shortest
  transformation of G(P,Q) into an identity breakpoint graph
  G(X,X) of a priori unknown genome X:

  G(P0,Q0) → G(P1,Q0) → G(P1,Q1) → G(P1,Q2) → ... → G(X,X)

 The ―black‖ and ―green‖ 2-breaks may arbitrarily alternate.
From 2 Genomes To Multiple Genomes
 We find a transformation of the multiple breakpoint
  graph G(P1,P2,...,Pk) into (a priori unknown!) identity
  multiple breakpoint graph G(X,X,...,X):
  G(P1,P2,...,Pk) → ... → G(X,X,...,X)

 The evolutionary tree T defines
  groups of genomes (to which
  the same 2-breaks may
  be applied simultaneously).


 For example, the groups {M, R} (Mouse and Rat) and
  {Q,H,C} (macaQue, Human, Chimpanzee) correspond
  to branches of T while the groups {M, C} and {R, D} do
  not.
If the Evolutionary Tree Is Unknown



 For the set of 7 mammalian
  genomes (Mouse, Rat, Dog,
  macaQue, Human, Chimpanzee,
  and Opossum), the evolutionary
  tree T was subject of enduring debates

 Depending on the primate – rodent – carnivore split, three
  topologies are possible (only two of them are viable).
Rearrangement Evidence For The Primate-
             Carnivore Split
 Each of these three topologies has an unique
  branch that supports either blue (primate-rodent),
  or green (rodent-carnivore), or red (primate-
  carnivore) hypothesis.

M                 M                 M             O

                  H                     H              H


D            O    D          O      D
 Rearrangement analysis supports the primate – carnivore
  split.
Rearrangement Evidence For The
         Primate-Carnivore Split
 Each of these three topologies has an unique
  branch that supports either blue (primate-rodent),
  or green (rodent-carnivore), or red (primate-
  carnivore) hypothesis.

M                 M                 M               O
                            16                 24
    11            H                     H               H


D            O    D          O      D
 Rearrangement analysis supports the primate – carnivore
  split.
Genome
                    Halving
                    Problem




 Algorithmic Problem:

Genome Halving Problem
WGD and Genome Halving Problem
 Whole Genome Duplication (WGD) of a genome
  R results in a perfect duplicated genome R+R
  where each chromosome is doubled.
 The genome R+R is subjected to rearrangements
  that result in a duplicated genome P.

 Genome Halving Problem:
  Given a duplicated genome
  P, find a perfect duplicated
  genome R+R minimizing
  the rearrangement distance
  between R+R and P.
Genome Halving Problem:
            Previous Results


 Algorithm for reconstructing a pre-duplicated
  genome was proposed in a series of papers by El-
  Mabrouk and Sankoff (culminating in SIAM J
  Comp., 2004).

 The proof of the Genome Halving Theorem is
  rather technical (spans over 30 pages).
Genome Halving Theorem

 Found an error in the original Genome Halving Theorem for
  unichromosomal genomes and proved a theorem that
  adequately deals with all genomes.
  (Alekseyev & PP, SIAM J. Comp. 2007)
 Suggested a new (short) proof of the Genome Halving
  Theorem - our proof is 5 pages long.
  (Alekseyev & PP, IEEE Bioinformatics 2007)
 Proved the Genome Halving Theorem for the harder case of
  transposition-like operations (Alekseyev & PP, SODA 2007)

 Introduced the notion of the contracted breakpoint graph
  that makes difficult problems (like the Genome Halving
  Problem) more transparent.
2-Break Distance Between Duplicated
             Genomes

2-Break Distance between Duplicated Genomes:


             a    a     b b

             a    b a     b
2-Break Distance Between Duplicated
             Genomes

2-Break Distance between Duplicated Genomes:.



               a      a     b b

               a      b a      b
Difficulty: The notion of the breakpoint graph is not
defined for duplicated genomes (the first a in P can
correspond to either the first or the second a in Q).
2-Break Distance Between Duplicated
             Genomes

2-Break Distance between Duplicated Genomes:



             a     a     b b

             a     b a     b
Idea: Establish a correspondence between the
genes in P and Q and “re-label” the
corresponding blocks (e.g., as a1, a2, b1, b2)
2-Break Distance Between Duplicated
             Genomes

2-Break Distance between Duplicated Genomes



            a1    a2   b1 b2

            a1    b1 a2   b2
Treat the labeled genomes as non-duplicated,
construct the breakpoint graph and compute
the 2-break distance between them.
Different Labelings Result in
        Different Breakpoint Graphs
 There are 2 different labelings of the copies of each block.

 One of these labelings (corresponding to a breakpoint
  graph with maximum number of cycles) is an optimal
  labeling.

 Trying all possible labelings takes exponential time: 2#blocks
  invocations of the 2-break distance algorithm.

 Labeling Problem. Construct an optimal labeling that
  results in the maximum number of cycles in the breakpoint
  graph (hence, the smallest 2-break distance).
Constructing Breakpoint Graph of
      Duplicated Genomes

    a

a   P   b   a   a     b b
    b

    b

a   Q   a   a   b a     b

    b
Contracted Genome Graphs:
        GLUING Red Edges

    a
                   b
a   P    b         a
     b

     b
         a     b
a   Q          a

     b
Contracted Breakpoint Graph:
     Gluing Red Edges Yet Again

     a                      Contracted
                            Breakpoint
                    b         Graph
a    P    b         a        G’(P,Q)
      b
                               b
      b                         a
          a     b
a    Q          a

      b
Contracted Breakpoint Graph of
         Duplicated Genomes



 Edges of 3 colors: black, green, and
  red.
                                         b
 Red edges form a matching.              a
 Black edges form black cycles.
 Green edges form green cycles.
What is the Relationship Between the Breakpoint
 Graph and the Contracted Breakpoint Graph?




                 a

            a          b                b
                                         a
                 b
Every Breakpoint Graph Induces a Cycle Decomposition
         of the Contracted Breakpoint Graph




                   a

            a            b

                   b
Why do we care?
Every Breakpoint Graph Induces Cycle Decomposition of
          the Contracted Breakpoint Graph




                   a

             a            b

                   b
Why do we care?
 Because optimal labeling corresponds to a
 maximum cycle decomposition.
Maximum Cycle Decomposition Problem

Open Problem 1: Given a contracted breakpoint
graph, find its maximum black-green cycle
decomposition.



                                ?
Labeling Problem
Open Problem 2: Find a breakpoint graph that
induces a given cycle decomposition of the
contracted breakpoint graph.




      ?
Computing 2-Break Distance Between Duplicated
           Genomes: Two Problems

 breakpoint graph
 inducing max cycle maximum         contracted
 decomposition      cycle           breakpoint
                    decomposition     graph
        a

                                        b
 a           b
                                         a
       b
Computing 2-Break Distance Between Duplicated
        Genomes: Two HARD Problems

  breakpoint graph
  inducing max cycle maximum                               contracted
  decomposition      cycle                                 breakpoint
                     decomposition                           graph
         a

                                                                     b
  a                  b
                                                                      a
            b
These problems are difficult and we do not know how to solve them.
However, we solved them in the case of the Genome Halving Problem.
2-Break Genome Halving Problem


2-Break Genome Halving Problem: Given a
duplicated genome P, find a perfect duplicated
genome R+R minimizing the 2-break distance:

            d2(P,R+R) = #blocks - cycles(P,R+R)

Minimizing d2(P,R+R) is equivalent to finding a
perfect duplicated genome R+R and a labeling of P
and R+R that maximizes cycles(P,R+R).
Contracted Breakpoint Graph
    of a Perfect Duplicated Genome

      a
                     b
a     P    b         a      G'(P,R+R)
       b
                                 b
      b
                                  a
           a     b
a    R+R
                 a
       b
Contracted Breakpoint Graph of a Perfect
 Duplicated Genome has a Special Structure



 Red edges form a matching.
 Black edges form black cycles. G'(P,R+R)
 Green edges form cycles in
  general case but now (for R+R)       b
  these cycles are formed by            a
  parallel (double) green edges
  forming a matching
Finding the “Best” Contracted
Breakpoint Graph for a Given Genome

     a
                 b
 a   P   b       a
     b
“Blue” Problem: Optimal
    Contracted Breakpoint Graph

     a
                 b
a    P   b       a
     b
                              b
                               a
“Magenta” Problem:
    Maximum Cycle Decomposition

     a
                 b
a    P   b       a
     b
                              b
                               a
“Orange” Problem: Labeling

    a
                 b
a   P   b        a
    b
                                 b
                                  a
    b
    a       ?
Solutions to ?, ?, and ? are too complex to be
 presented in this talk (SIAM J. Comp.2007)

       a
                       b
  a   P    b           a
       b
                                        b
                                         a
      b
      a        ?
Acknowledgments
Max Alekseyev
Acknowledgments
Max Alekseyev


David Sankoff
“If you are not criticized, you may not be doing
much.”
                      Donald Rumsfeld
Acknowledgments
Max Alekseyev


David Sankoff
“If you are not criticized, you may not be doing
much.”
                      Donald Rumsfeld

Glenn Tesler,    Math. Dept., UCSD



Qian Peng

More Related Content

What's hot

UC Davis EVE161 Lecture 14 by @phylogenomics
UC Davis EVE161 Lecture 14 by @phylogenomicsUC Davis EVE161 Lecture 14 by @phylogenomics
UC Davis EVE161 Lecture 14 by @phylogenomicsJonathan Eisen
 
Microbial Phylogenomics (EVE161) Class 3: Woese and the Tree of Life
Microbial Phylogenomics (EVE161) Class 3: Woese and the Tree of LifeMicrobial Phylogenomics (EVE161) Class 3: Woese and the Tree of Life
Microbial Phylogenomics (EVE161) Class 3: Woese and the Tree of LifeJonathan Eisen
 
UC Davis EVE161 Lecture 10 by @phylogenomics
UC Davis EVE161 Lecture 10 by @phylogenomicsUC Davis EVE161 Lecture 10 by @phylogenomics
UC Davis EVE161 Lecture 10 by @phylogenomicsJonathan Eisen
 
Protein Evolution: Structure, Function, and Human Health
Protein Evolution: Structure, Function, and Human HealthProtein Evolution: Structure, Function, and Human Health
Protein Evolution: Structure, Function, and Human HealthDan Gaston
 
Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...
Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...
Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...David Bapst
 
Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics Jonathan Eisen
 
Effects of density on spacing patterns and habitat associations of a Neotropi...
Effects of density on spacing patterns and habitat associations of a Neotropi...Effects of density on spacing patterns and habitat associations of a Neotropi...
Effects of density on spacing patterns and habitat associations of a Neotropi...Nicole Angeli
 
Macromolecule evolution
Macromolecule  evolutionMacromolecule  evolution
Macromolecule evolutionPaula Mills
 
Evolution of DNA repair genes, proteins and processes
Evolution of DNA repair genes, proteins and processesEvolution of DNA repair genes, proteins and processes
Evolution of DNA repair genes, proteins and processesJonathan Eisen
 
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingMicrobial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingJonathan Eisen
 
Bioc4700 2014 Guest Lecture
Bioc4700   2014 Guest LectureBioc4700   2014 Guest Lecture
Bioc4700 2014 Guest LectureDan Gaston
 
Gene sharing in microbes: good for the individual, good for the community?
Gene sharing in microbes: good for the individual, good for the community?Gene sharing in microbes: good for the individual, good for the community?
Gene sharing in microbes: good for the individual, good for the community?beiko
 
UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics UC Davis EVE161 Lecture 18 by @phylogenomics
UC Davis EVE161 Lecture 18 by @phylogenomicsJonathan Eisen
 
Is microbial ecology driven by roaming genes?
Is microbial ecology driven by roaming genes?Is microbial ecology driven by roaming genes?
Is microbial ecology driven by roaming genes?beiko
 
281 lec3 mendel_genetics
281 lec3 mendel_genetics281 lec3 mendel_genetics
281 lec3 mendel_geneticshhalhaddad
 
UC Davis EVE161 Lecture 8 - rRNA ecology - by Jonathan Eisen @phylogenomics
UC Davis EVE161 Lecture 8 - rRNA ecology - by Jonathan Eisen @phylogenomicsUC Davis EVE161 Lecture 8 - rRNA ecology - by Jonathan Eisen @phylogenomics
UC Davis EVE161 Lecture 8 - rRNA ecology - by Jonathan Eisen @phylogenomicsJonathan Eisen
 
"Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan E...
"Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan E..."Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan E...
"Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan E...Jonathan Eisen
 
Beiko cms final
Beiko cms finalBeiko cms final
Beiko cms finalbeiko
 
Microbial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 16: Shotgun MetagenomicsMicrobial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 16: Shotgun MetagenomicsJonathan Eisen
 

What's hot (20)

UC Davis EVE161 Lecture 14 by @phylogenomics
UC Davis EVE161 Lecture 14 by @phylogenomicsUC Davis EVE161 Lecture 14 by @phylogenomics
UC Davis EVE161 Lecture 14 by @phylogenomics
 
Microbial Phylogenomics (EVE161) Class 3: Woese and the Tree of Life
Microbial Phylogenomics (EVE161) Class 3: Woese and the Tree of LifeMicrobial Phylogenomics (EVE161) Class 3: Woese and the Tree of Life
Microbial Phylogenomics (EVE161) Class 3: Woese and the Tree of Life
 
UC Davis EVE161 Lecture 10 by @phylogenomics
UC Davis EVE161 Lecture 10 by @phylogenomicsUC Davis EVE161 Lecture 10 by @phylogenomics
UC Davis EVE161 Lecture 10 by @phylogenomics
 
Protein Evolution: Structure, Function, and Human Health
Protein Evolution: Structure, Function, and Human HealthProtein Evolution: Structure, Function, and Human Health
Protein Evolution: Structure, Function, and Human Health
 
Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...
Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...
Budding or Anagenesis? Paraphylyand Ancestor-Descendant Relationships from Ti...
 
Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 15: Shotgun Metagenomics
 
Effects of density on spacing patterns and habitat associations of a Neotropi...
Effects of density on spacing patterns and habitat associations of a Neotropi...Effects of density on spacing patterns and habitat associations of a Neotropi...
Effects of density on spacing patterns and habitat associations of a Neotropi...
 
Macromolecule evolution
Macromolecule  evolutionMacromolecule  evolution
Macromolecule evolution
 
Evolution of DNA repair genes, proteins and processes
Evolution of DNA repair genes, proteins and processesEvolution of DNA repair genes, proteins and processes
Evolution of DNA repair genes, proteins and processes
 
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingMicrobial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
 
Bioc4700 2014 Guest Lecture
Bioc4700   2014 Guest LectureBioc4700   2014 Guest Lecture
Bioc4700 2014 Guest Lecture
 
Gene sharing in microbes: good for the individual, good for the community?
Gene sharing in microbes: good for the individual, good for the community?Gene sharing in microbes: good for the individual, good for the community?
Gene sharing in microbes: good for the individual, good for the community?
 
UC Davis EVE161 Lecture 18 by @phylogenomics
 UC Davis EVE161 Lecture 18 by @phylogenomics UC Davis EVE161 Lecture 18 by @phylogenomics
UC Davis EVE161 Lecture 18 by @phylogenomics
 
Louisville2
Louisville2Louisville2
Louisville2
 
Is microbial ecology driven by roaming genes?
Is microbial ecology driven by roaming genes?Is microbial ecology driven by roaming genes?
Is microbial ecology driven by roaming genes?
 
281 lec3 mendel_genetics
281 lec3 mendel_genetics281 lec3 mendel_genetics
281 lec3 mendel_genetics
 
UC Davis EVE161 Lecture 8 - rRNA ecology - by Jonathan Eisen @phylogenomics
UC Davis EVE161 Lecture 8 - rRNA ecology - by Jonathan Eisen @phylogenomicsUC Davis EVE161 Lecture 8 - rRNA ecology - by Jonathan Eisen @phylogenomics
UC Davis EVE161 Lecture 8 - rRNA ecology - by Jonathan Eisen @phylogenomics
 
"Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan E...
"Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan E..."Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan E...
"Phylogeny-Driven Approaches to Genomics and Metagenomics" talk by Jonathan E...
 
Beiko cms final
Beiko cms finalBeiko cms final
Beiko cms final
 
Microbial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 16: Shotgun MetagenomicsMicrobial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
Microbial Phylogenomics (EVE161) Class 16: Shotgun Metagenomics
 

Similar to Genome Rearrangements: from Biological Problem to Combinatorial Algorithms (and back

Introduction of Animal Genetics & History of Genetics
Introduction of Animal Genetics & History of GeneticsIntroduction of Animal Genetics & History of Genetics
Introduction of Animal Genetics & History of GeneticsAashish Patel
 
Evolutionary stuff like macroevolution
Evolutionary stuff like macroevolutionEvolutionary stuff like macroevolution
Evolutionary stuff like macroevolutionTauqeer Ahmad
 
François Massol - présentation MEE2013
François Massol - présentation MEE2013François Massol - présentation MEE2013
François Massol - présentation MEE2013Seminaire MEE
 
Cytotaxonomy(Its History,Background, Chromosome Evolution in Primates.pptx
Cytotaxonomy(Its History,Background, Chromosome Evolution in Primates.pptxCytotaxonomy(Its History,Background, Chromosome Evolution in Primates.pptx
Cytotaxonomy(Its History,Background, Chromosome Evolution in Primates.pptxIshtiyaqMir3
 
Alyssa garcia amber_bennett_raf.ppt
Alyssa garcia amber_bennett_raf.pptAlyssa garcia amber_bennett_raf.ppt
Alyssa garcia amber_bennett_raf.pptalyssuhh1
 
genetics role in orthodontics
genetics role in orthodonticsgenetics role in orthodontics
genetics role in orthodonticsKumar Adarsh
 
Introduction to Genetics
Introduction to GeneticsIntroduction to Genetics
Introduction to GeneticsCEU
 
History of genetics
History of geneticsHistory of genetics
History of geneticsannaaquino21
 
AAPA Poster. Insights from developmental genetics and reproductive isolation ...
AAPA Poster. Insights from developmental genetics and reproductive isolation ...AAPA Poster. Insights from developmental genetics and reproductive isolation ...
AAPA Poster. Insights from developmental genetics and reproductive isolation ...Craig Knox
 
Gogarten issol2014 version4
Gogarten issol2014 version4Gogarten issol2014 version4
Gogarten issol2014 version4J Peter Gogarten
 
Introduction to Genetics
Introduction to GeneticsIntroduction to Genetics
Introduction to GeneticsCEU
 
485 lec2 history and review (i)
485 lec2 history and review (i)485 lec2 history and review (i)
485 lec2 history and review (i)hhalhaddad
 
Heredity and evolution class 10th Questions
Heredity and evolution class 10th QuestionsHeredity and evolution class 10th Questions
Heredity and evolution class 10th Questionssinghaniya12
 
Evolutionary Genetics by: Kim Jim F. Raborar, RN, MAEd(ue)
Evolutionary Genetics by: Kim Jim F. Raborar, RN, MAEd(ue)Evolutionary Genetics by: Kim Jim F. Raborar, RN, MAEd(ue)
Evolutionary Genetics by: Kim Jim F. Raborar, RN, MAEd(ue)Kim Jim Raborar
 
EVE 161 Winter 2018 Class 15
EVE 161 Winter 2018 Class 15EVE 161 Winter 2018 Class 15
EVE 161 Winter 2018 Class 15Jonathan Eisen
 

Similar to Genome Rearrangements: from Biological Problem to Combinatorial Algorithms (and back (20)

Introduction of Animal Genetics & History of Genetics
Introduction of Animal Genetics & History of GeneticsIntroduction of Animal Genetics & History of Genetics
Introduction of Animal Genetics & History of Genetics
 
Evolutionary stuff like macroevolution
Evolutionary stuff like macroevolutionEvolutionary stuff like macroevolution
Evolutionary stuff like macroevolution
 
François Massol - présentation MEE2013
François Massol - présentation MEE2013François Massol - présentation MEE2013
François Massol - présentation MEE2013
 
Cytotaxonomy(Its History,Background, Chromosome Evolution in Primates.pptx
Cytotaxonomy(Its History,Background, Chromosome Evolution in Primates.pptxCytotaxonomy(Its History,Background, Chromosome Evolution in Primates.pptx
Cytotaxonomy(Its History,Background, Chromosome Evolution in Primates.pptx
 
The Future Of Human Evolution Essay
The Future Of Human Evolution EssayThe Future Of Human Evolution Essay
The Future Of Human Evolution Essay
 
Alyssa garcia amber_bennett_raf.ppt
Alyssa garcia amber_bennett_raf.pptAlyssa garcia amber_bennett_raf.ppt
Alyssa garcia amber_bennett_raf.ppt
 
genetics role in orthodontics
genetics role in orthodonticsgenetics role in orthodontics
genetics role in orthodontics
 
Introduction to Genetics
Introduction to GeneticsIntroduction to Genetics
Introduction to Genetics
 
History of genetics
History of geneticsHistory of genetics
History of genetics
 
Anatomical homology
Anatomical homologyAnatomical homology
Anatomical homology
 
AAPA Poster. Insights from developmental genetics and reproductive isolation ...
AAPA Poster. Insights from developmental genetics and reproductive isolation ...AAPA Poster. Insights from developmental genetics and reproductive isolation ...
AAPA Poster. Insights from developmental genetics and reproductive isolation ...
 
Gogarten issol2014 version4
Gogarten issol2014 version4Gogarten issol2014 version4
Gogarten issol2014 version4
 
Introduction to Genetics
Introduction to GeneticsIntroduction to Genetics
Introduction to Genetics
 
485 lec2 history and review (i)
485 lec2 history and review (i)485 lec2 history and review (i)
485 lec2 history and review (i)
 
Genetics
GeneticsGenetics
Genetics
 
Heredity and evolution class 10th Questions
Heredity and evolution class 10th QuestionsHeredity and evolution class 10th Questions
Heredity and evolution class 10th Questions
 
Evolutionary Genetics by: Kim Jim F. Raborar, RN, MAEd(ue)
Evolutionary Genetics by: Kim Jim F. Raborar, RN, MAEd(ue)Evolutionary Genetics by: Kim Jim F. Raborar, RN, MAEd(ue)
Evolutionary Genetics by: Kim Jim F. Raborar, RN, MAEd(ue)
 
Parasites in Your DNA
Parasites in Your DNAParasites in Your DNA
Parasites in Your DNA
 
Introduction to genetics and breeding
Introduction to genetics and breedingIntroduction to genetics and breeding
Introduction to genetics and breeding
 
EVE 161 Winter 2018 Class 15
EVE 161 Winter 2018 Class 15EVE 161 Winter 2018 Class 15
EVE 161 Winter 2018 Class 15
 

More from Nikolay Vyahhi

Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
Molbiol 2011-13-organelles
Molbiol 2011-13-organellesMolbiol 2011-13-organelles
Molbiol 2011-13-organellesNikolay Vyahhi
 
Molbiol 2011-12-eukaryotic gene-expression
Molbiol 2011-12-eukaryotic gene-expressionMolbiol 2011-12-eukaryotic gene-expression
Molbiol 2011-12-eukaryotic gene-expressionNikolay Vyahhi
 
Molbiol 2011-10-proteins
Molbiol 2011-10-proteinsMolbiol 2011-10-proteins
Molbiol 2011-10-proteinsNikolay Vyahhi
 
Molbiol 2011-09-reparation-recombination
Molbiol 2011-09-reparation-recombinationMolbiol 2011-09-reparation-recombination
Molbiol 2011-09-reparation-recombinationNikolay Vyahhi
 
Molbiol 2011-08-epigenetics
Molbiol 2011-08-epigeneticsMolbiol 2011-08-epigenetics
Molbiol 2011-08-epigeneticsNikolay Vyahhi
 
Molbiol 2011-07-chromosomes-cell-cycle
Molbiol 2011-07-chromosomes-cell-cycleMolbiol 2011-07-chromosomes-cell-cycle
Molbiol 2011-07-chromosomes-cell-cycleNikolay Vyahhi
 
Molbiol 2011-06-transcription-translation
Molbiol 2011-06-transcription-translationMolbiol 2011-06-transcription-translation
Molbiol 2011-06-transcription-translationNikolay Vyahhi
 
Molbiol 2011-05-dna-rna-protein
Molbiol 2011-05-dna-rna-proteinMolbiol 2011-05-dna-rna-protein
Molbiol 2011-05-dna-rna-proteinNikolay Vyahhi
 
Molbiol 2011-04-metabolism
Molbiol 2011-04-metabolismMolbiol 2011-04-metabolism
Molbiol 2011-04-metabolismNikolay Vyahhi
 
Molbiol 2011-03-biochem
Molbiol 2011-03-biochemMolbiol 2011-03-biochem
Molbiol 2011-03-biochemNikolay Vyahhi
 
Molbiol 2011-02-biology
Molbiol 2011-02-biologyMolbiol 2011-02-biology
Molbiol 2011-02-biologyNikolay Vyahhi
 
Molbiol 2011-01-chemistry
Molbiol 2011-01-chemistryMolbiol 2011-01-chemistry
Molbiol 2011-01-chemistryNikolay Vyahhi
 
Molbiol 2011-11-role of-proteins
Molbiol 2011-11-role of-proteinsMolbiol 2011-11-role of-proteins
Molbiol 2011-11-role of-proteinsNikolay Vyahhi
 
Biotech 2011-08-recombinant-dna
Biotech 2011-08-recombinant-dnaBiotech 2011-08-recombinant-dna
Biotech 2011-08-recombinant-dnaNikolay Vyahhi
 
Biotech 2011-02-genetics
Biotech 2011-02-geneticsBiotech 2011-02-genetics
Biotech 2011-02-geneticsNikolay Vyahhi
 
Biotech 2011-10-methods
Biotech 2011-10-methodsBiotech 2011-10-methods
Biotech 2011-10-methodsNikolay Vyahhi
 
Biotech 2011-09-pcr and-in_situ_methods
Biotech 2011-09-pcr and-in_situ_methodsBiotech 2011-09-pcr and-in_situ_methods
Biotech 2011-09-pcr and-in_situ_methodsNikolay Vyahhi
 
Biotech 2011-07-finding-orf-etc
Biotech 2011-07-finding-orf-etcBiotech 2011-07-finding-orf-etc
Biotech 2011-07-finding-orf-etcNikolay Vyahhi
 

More from Nikolay Vyahhi (20)

Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Molbiol 2011-wetlab
Molbiol 2011-wetlabMolbiol 2011-wetlab
Molbiol 2011-wetlab
 
Molbiol 2011-13-organelles
Molbiol 2011-13-organellesMolbiol 2011-13-organelles
Molbiol 2011-13-organelles
 
Molbiol 2011-12-eukaryotic gene-expression
Molbiol 2011-12-eukaryotic gene-expressionMolbiol 2011-12-eukaryotic gene-expression
Molbiol 2011-12-eukaryotic gene-expression
 
Molbiol 2011-10-proteins
Molbiol 2011-10-proteinsMolbiol 2011-10-proteins
Molbiol 2011-10-proteins
 
Molbiol 2011-09-reparation-recombination
Molbiol 2011-09-reparation-recombinationMolbiol 2011-09-reparation-recombination
Molbiol 2011-09-reparation-recombination
 
Molbiol 2011-08-epigenetics
Molbiol 2011-08-epigeneticsMolbiol 2011-08-epigenetics
Molbiol 2011-08-epigenetics
 
Molbiol 2011-07-chromosomes-cell-cycle
Molbiol 2011-07-chromosomes-cell-cycleMolbiol 2011-07-chromosomes-cell-cycle
Molbiol 2011-07-chromosomes-cell-cycle
 
Molbiol 2011-06-transcription-translation
Molbiol 2011-06-transcription-translationMolbiol 2011-06-transcription-translation
Molbiol 2011-06-transcription-translation
 
Molbiol 2011-05-dna-rna-protein
Molbiol 2011-05-dna-rna-proteinMolbiol 2011-05-dna-rna-protein
Molbiol 2011-05-dna-rna-protein
 
Molbiol 2011-04-metabolism
Molbiol 2011-04-metabolismMolbiol 2011-04-metabolism
Molbiol 2011-04-metabolism
 
Molbiol 2011-03-biochem
Molbiol 2011-03-biochemMolbiol 2011-03-biochem
Molbiol 2011-03-biochem
 
Molbiol 2011-02-biology
Molbiol 2011-02-biologyMolbiol 2011-02-biology
Molbiol 2011-02-biology
 
Molbiol 2011-01-chemistry
Molbiol 2011-01-chemistryMolbiol 2011-01-chemistry
Molbiol 2011-01-chemistry
 
Molbiol 2011-11-role of-proteins
Molbiol 2011-11-role of-proteinsMolbiol 2011-11-role of-proteins
Molbiol 2011-11-role of-proteins
 
Biotech 2011-08-recombinant-dna
Biotech 2011-08-recombinant-dnaBiotech 2011-08-recombinant-dna
Biotech 2011-08-recombinant-dna
 
Biotech 2011-02-genetics
Biotech 2011-02-geneticsBiotech 2011-02-genetics
Biotech 2011-02-genetics
 
Biotech 2011-10-methods
Biotech 2011-10-methodsBiotech 2011-10-methods
Biotech 2011-10-methods
 
Biotech 2011-09-pcr and-in_situ_methods
Biotech 2011-09-pcr and-in_situ_methodsBiotech 2011-09-pcr and-in_situ_methods
Biotech 2011-09-pcr and-in_situ_methods
 
Biotech 2011-07-finding-orf-etc
Biotech 2011-07-finding-orf-etcBiotech 2011-07-finding-orf-etc
Biotech 2011-07-finding-orf-etc
 

Recently uploaded

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Recently uploaded (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Genome Rearrangements: from Biological Problem to Combinatorial Algorithms (and back

  • 1. Genome Rearrangements: from Biological Problem to Combinatorial Algorithms (and back) Pavel Pevzner Department of Computer Science, University of California at San Diego
  • 2. Genome Rearrangements Mouse X chromosome Unknown ancestor ~ 80 M years ago Human X chromosome  What is the evolutionary scenario for transforming one genome into the other?  What is the organization of the ancestral genome?  Are there any rearrangement hotspots in mammalian genomes?
  • 3. Genome Rearrangements: Evolutionary Scenarios Unknown ancestor ~ 80 M years ago  What is the evolutionary scenario for transforming one genome into the other?  What is the organization of the ancestral genome?  Are there any rearrangement Reversal flips a segment of a chromosome hotspots in mammalian genomes?
  • 4. Genome Rearrangements: Ancestral Reconstruction  What is the evolutionary scenario for transforming one genome into the other?  What is the organization of the ancestral genome?  Are there any rearrangement hotspots in mammalian genomes? Bourque, Tesler, PP, Genome Res.
  • 5. Genome Rearrangements: Evolutionary “Earthquakes  What is the evolutionary scenario for transforming one genome into the other?  What is the organization of the ancestral genome?  Are there any rearrangement hotspots in mammalian genomes?
  • 6. Rearrangement Hotspots in Tumor Genomes  Rearrangements may disrupt genes and alter gene regulation.  Example: rearrangement in leukemia yields “Philadelphia” chromosome: Chr 9 promoter ABL gene promoter BCR gene Chr 22 promoter BCR gene promoter c-ab1 oncogene  Thousands of rearrangements hotspots known for different tumors.
  • 7. Rearrangement Hotspots in Tumor Genomes MCF7 breast cancer cell line  What is the evolutionary scenario for transforming one genome into the other?  What is the organization of the ancestral genome?  Where are the rearrangement hotspots in mammalian genomes?
  • 8. Three Evolutionary Controversies Primate - Rodent - Carnivore Split Rearrangement Hotspots Whole Genome Duplications
  • 9. Primate–Rodent vs. Primate–Carnivore Split Hutley et al., MBE, May 07: We have demonstrated with very high primate-rodent primate-carnivore confidence that the rodents diverged split split before carnivores and primates Lunter, PLOS CB, April 07: It appears unjustified to continue to consider the phylogeny of primates, rodents, and canines as contentious Arnason et al, PNAS 02 Canarozzi, PLOS CB 06 Murphy et al,, Science 01 Kumar&Hedges, Nature, 98
  • 10. Can Rearrangement Analysis Resolve the Primate–Rodent–Carnivore Controversy?
  • 11. Susumu Ohno: Two Hypothesis Ohno, 1970, 1973  Random Breakage Rearrangement Hypothesis: Hotspots Genomic architectures are shaped by rearrangements that occur randomly (no fragile regions). Whole  Whole Genome Duplication Genome (WGD) Hypothesis: Big Duplications leaps in evolution would have been impossible without whole genome duplications.
  • 12. Random Breakage Model (RBM)  The random breakage hypothesis was embraced by biologists and has become de facto theory of chromosome evolution.  Nadeau & Taylor, Proc. Nat'l Acad. Sciences 1984  First convincing arguments in favor of the Rearrangement Random Breakage Model (RBM) Hotspots  RBM implies that there is no rearrangement hotspots  RBM was re-iterated in hundreds of papers
  • 13. Random Breakage Model (RBM)  The random breakage hypothesis was embraced by biologists and has become de facto theory of chromosome evolution.  Nadeau & Taylor, Proc. Nat'l Acad. Sci. 1984  First convincing arguments in favor of the Rearrangement Random Breakage Model (RBM) Hotspots  RBM implies that there is no rearrangement hotspots  RBM was re-iterated in hundreds of papers  PP & Tesler, Proc. Nat'l Acad. Sci. 2003  Rejected RBM and proposed the Fragile Breakage Model (FBM)  Postulated existence of rearrangement hotspots and vast breakpoint re-use  FBM implies that the human genome is a mosaic of solid and fragile regions.
  • 14. Are the Rearrangement Hotspots Real?  The Fragile Breakage Model did not live long: in 2003 David Sankoff presented convincing arguments against FBM: “… we have shown that breakpoint re-use of the same magnitude as found in Pevzner and Tesler, 2003 may very well be artifacts in a context where NO re-use actually occurred.”
  • 15. Are the Rearrangement Hotspots Real?  The Fragile Breakage Model did not live long: in 2003 David Sankoff presented arguments against FBM (Sankoff & Trinh, J. Comp. Biol, 2005) “… we have shown that breakpoint re-use of the same magnitude as found in Pevzner and Tesler, 2003 may very well be artifacts in a context where NO re-use actually occurred.” Before you criticize people, you should walk a mile in their shoes. That way, when you criticize them, you are a mile away. And you have their shoes. J.K. Lambert
  • 16. Rebuttal of the Rebuttal of the Rebuttal  Peng et al., 2006 (PLOS Computational Biology) found an error in Sankoff-Trinh rebuttal: …If Sankoff & Trinh fixed their ST-Synteny algorithm, they would confirm rather than reject Pevzner-Tesler’s Fragile Breakage Model  Sankoff, 2006 (PLOS Computational Biology): …Not only did we foist a hastily conceived and incorrectly executed simulation on an overworked RECOMB conference program committee, but worse — nostra maxima culpa — we obliged a team of high-powered researchers to clean up after us!
  • 17. Rebuttal of the Rebuttal of the Rebuttal: Controversy Resolved... Or Was It?  Nearly all recent studies support the Fragile Breakage Model: van der Wind, 2004, Bailey, 2004, Zhao et al., 2004, Murphy et al., 2005, Hinsch & Hannenhalli, 2006, Ruiz-Herrera et al., 2006, Yue & Haaf, 2006, Mehan et al., 2007, etc Kikuta et al., Genome Res. 2007: “... the Nadeau and Taylor hypothesis is not possible for the explanation of synteny”
  • 18. Random vs. Fragile Breakage Debate Continues: Complex Rearrangements  PP & Tesler, PNAS 2003, argued that every evolutionary scenario for transforming Mouse into Human genome with reversals, translocations, fusions, and fissions must result in a large number of breakpoint re-uses, a contradiction to the RBM.  Sankoff, PLoS Comp. Biol. 2006: “We cannot infer whether mutually randomized synteny block orderings derived from two divergent genomes were created ... through processes other than reversals and translocations.”
  • 19. Whole Genome Duplication (WGD) Whole Genome Duplications
  • 23. Whole Genome Duplication Hypothesis Was Confirmed After Years of Controversy  “There was a whole-genome duplication.” Wolfe, Nature, 1997  “There was no whole-genome The Whole Genome duplication.” Dujon, FEBS, 2000 Duplication hypothesis first met  “Duplications occurred independently” Langkjaer, with skepticism but was finally JMB, 2000  “Continuous duplications” confirmed by Dujon, Yeast 2003 Kellis et al., Nature 2004: “Our  “Multiple duplications” Friedman, Gen. Res, 2003  “Spontaneous duplications” analysis resolves the long- Koszul, EMBO, 2004 standing controversy on the ancestry of the yeast genome.”
  • 24. Whole Genome Duplication Hypothesis Was Eventually Confirmed...But Some Scientists Are Not Convinced  “There was a whole-genome  Kellis, Birren & Lander, Nature 2004: duplication.” Wolfe, Nature, 1997 “Our analysis resolves the long-  “There was no whole-genome standing controversy on the ancestry duplication.” Dujon, FEBS, 2000 of the yeast genome.”  “Duplications occurred independently” Langkjaer,  Martin et al., Biology Direct 2007: JMB, 2000 “We believe that the proposal of a  “Continuous duplications” Dujon, Yeast 2003 Whole Genome Duplication in the  “Multiple duplications” yeast lineage is unwarranted.” Friedman, Gen. Res, 2003  “Spontaneous duplications”  To address Martin et al., 2007 Koszul, EMBO, 2004  “Our analysis resolved the arguments against WGD, it would controversy” be useful to reconstruct the pre- Kellis, Nature, 2004  “WGD in the yeast lineage is duplicated genomes. unwarranted” Martin, Biology Direct, 2007  ...
  • 25. Genome Halving Problem Reconstruction of the pre-duplicated ancestor: Genome Halving Problem
  • 26. From Biological Controversies to Algorithmic Problems Primate - Ancestral Rodent - Genome Carnivore Split Reconstruction Rearrangement Breakpoint Hotspots Re-use Analysis Whole Genome Genome Halving Duplications Problem
  • 28. Unichromosomal Genomes: Reversal Distance • Sorting by reversals: find the shortest series of reversals transforming one uni-chromosomal genome into another. • The number of reversals in such a shortest series is the reversal distance between genomes. • Hannenhalli and PP. (FOCS 1995) gave a polynomial algorithm for computing the reversal distance. . Step 0: 2 -4 -3 5 -8 -7 -6 1 Step 1: 2 3 4 5 -8 -7 -6 1 Step 2: -5 -4 -3 -2 -8 -7 -6 1 Step 3: -5 -4 -3 -2 -1 6 7 8 Sorting by prefix reversals Step 4: 1 2 3 4 5 6 7 8 (pancake flipping problem), Gates and Papadimitriou, Discrete Appl. Math. 1976
  • 29. Multichromosomal Genomes: Genomic Distance  Genomic Distance between two genomes is the minimum number of reversals, translocations, fusions, and fissions required to transform one genome into another.  Hannenhalli and PP (STOC 1995) gave a polynomial algorithm for computing the genomic distance.  These algorithms were followed by many improvements: Kaplan et al. 1999, Bader et al. 2001, Tesler 2002, Ozery-Flato & Shamir 2003, Tannier & Sagot 2004, Bergeron 2001-07, etc.
  • 30. Simplifying HP Theory: from Linear to Circular Chromosomes b A chromosome is represented as a a c cycle with directed red and undirected black edges: d a b c d red edges encode blocks (genes) black edges connect adjacent blocks
  • 31. Reversals on Circular Chromosomes b b reversal a c a c d d a b c d a b d c A reversal replaces two black edges with two other black edges
  • 32. Fissions b b fission a c a c d d a b c d a b c d  A fission splits a single cycle (chromosome) into two.  A fissions replaces two black edges with two other black edges.
  • 33. Translocations / Fusions b b fusion a c a c d d a b c d a b c d  A translocation/fusion merges two cycles (chromosomes) into a single one.  They also replace two black edges with two other black edges.
  • 34. 2-Breaks b b 2-break a c a c d d  A 2-Break replaces any 2 black edges with another 2 black edges forming matching on the same 4 vertices.  Reversals, translocations, fusions, and fissions represent all possible types of 2-breaks (introduced as DCJ operations by Yancopoulos et al., 2005)
  • 35. 2-Break Distance  The 2-break distance d2(P,Q) is the minimum number of 2-breaks required to transform P into Q.  In contrast to the genomic distance, the 2-break distance is easy to compute.
  • 36. Two Genomes as Black-Red and Green-Red Cycles b a P c a b c d d c a Q b a c b d d
  • 37. “Q-style” representation of P b a P c c d a b c d a Q b d
  • 38. “Q-style” representation of P b a P c c d a b c d a Q b d
  • 39. Breakpoint Graph: Superposition of Genome Graphs b Breakpoint Graph (Bafna & PP, FOCS 1994) a P c c d G(P,Q) a b c d a Q b d
  • 40. Breakpoint Graph: GLUING Red Edges with the Same Labels b Breakpoint Graph (Bafna & PP, FOCS 1994) a P c c d G(P,Q) a b c d a Q b d
  • 41. Black-Green Alternating Cycles  Black and green edges form perfect matchings in the breakpoint graph. Therefore, c these edges form a collection of black-green alternating a b cycles d  Let cycles(P,Q) be the number of black-green cycles. cycles(P,Q)=2
  • 42. Rearrangements Change Breakpoint Graphs and cycle(P,Q) Transforming genome P into genome Q by 2-breaks corresponds to transforming the breakpoint graph G(P,Q) into the identity breakpoint graph G(Q,Q). c c c G(P,Q) b b G(Q,Q) b a a a G(P',Q) trivial cycles d d d cycles(P,Q) = 2 cycles(P',Q) = 3 cycles(Q,Q) = 4 =#blocks
  • 43. Sorting by 2-Breaks 2-breaks P=P0 → P1 → ... → Pd= Q G(P,Q) → G(P1,Q) → ... → G(Q,Q) cycles(P,Q) cycles → ... → #blocks cycles # of black-green cycles increased by #blocks - cycles(P,Q) How much each 2-break can contribute to the increase in the number of cycles?
  • 44. Each 2-Break Increases #Cycles by at Most 1 A 2-break:  adds 2 new black edges and thus creates at most 2 new cycles (containing two new black edges)  removes 2 black edges and thus destroys at least 1 old cycle (containing two old edges): change in the number of cycles: Δcycles ≤ 2-1=1.
  • 45. 2-Break Distance  Any 2-break increases the number of cycles by at most one (Δcycles ≤ 1)  Any non-trivial cycle can be split into two cycles with a 2- break (Δcycles = 1)  Every sorting by 2-breaks must increase the number of cycles by #blocks - cycles(P,Q)  The 2-break distance between genomes P and Q: d2(P,Q) = #blocks - cycles(P,Q) (cp. Yancopoulos et al., 2005, Bergeron et al., 2006)
  • 46. Complex Rearrangements: Transpositions  Sorting by Transpositions Problem: find the shortest sequence of transpositions transforming one genome into another.  First 1.5-approximation algorithm was given by Bafna and P.P. SODA 1995. The best known result: 1.375-approximation algorithm of Elias and Hartman, 2005.  The complexity status remains unknown.
  • 47. Transpositions b b transposition a c a c d d a b c d c a b d Transpositions cut off a segment of one chromosome and insert it at some position in the same or another chromosome
  • 48. Transpositions Are 3-Breaks b b 3-break a c a c d d  3-Break replaces any triple of black edges with another triple forming matching on the same 6 vertices.  Transpositions are 3-Breaks.
  • 49. Sorting by 3-Breaks 3-breaks P=P0 → P1 → ... → Pd= Q G(P,Q) → G(P1,Q) → ... → G(Q,Q) cycles(P,Q) cycles → ... → #blocks cycles # of black-green cycles increased by #blocks - cycles(P,Q) How much each 3-break can contribute to this increase?
  • 50. Each 3-Break Increases #Cycles by at Most 2 A 3-Break:  adds 3 new black edges and thus creates at most 3 new cycles (containing three new black edges)  removes 3 black edges and thus destroys at least 1 old cycle (containing three old edges): change in the number of cycles: Δcycle ≤ 3-1=2.
  • 51. 3-Break Distance  Any 3-break increases the number of cycles by at most TWO (Δcycles ≤2)  Any non-trivial cycle can be split into three cycles with a 3-break (Δcycles = 2)  Every sorting by 3-breaks must increase the number of cycles by #blocks - cycles(P,Q)  The 3-break distance between genomes P and Q: d3(P,Q) = (#blocks - cycles(P,Q))/2
  • 52. 3-Break Distance  Any 3-break increases the number of cycles by at most TWO (Δcycles ≤2)  Any non-trivial cycle can be split into three cycles with a 3-break (Δcycles = 2) – WRONG STATEMENT  Every sorting by 3-breaks must increase the number of cycles by #blocks - cycles(P,Q)  The 3-break distance between genomes P and Q: d3(P,Q) = (#blocks - cycles(P,Q))/2
  • 53. 3-Break Distance: Focus on Odd Cycles  A 3-break can increase the number of odd cycles (i.e., cycles with odd number of black edges) by at most 2 (Δcyclesodd ≤ 2)  A non-trivial odd cycle can be split into three odd cycles with a 3-break (Δcyclesodd = 2)  An even cycle can be split into two odd cycles with a 3- break (Δcyclesodd=2)  The 3-Break Distance between genomes P and Q is: d3(P,Q) = ( #blocks – cyclesodd(P,Q) ) / 2
  • 54. Polynomial Algorithm for Multi-Break Rearrangements  The formulas for k-break distance become complex as k grows (without an obvious pattern): The formula for d20(P,Q) contains over 1,500 terms! Alekseyev & PP (SODA 07, Theoretical Computer Science 08) Exact formulas for the k-break distance as well as a linear-time algorithm for computing it.
  • 55. Breakpoint Re-use Analysis Algorithmic Problem Searching for Rearrangement Hotspots (Fragile Regions) in Human Genome
  • 56. Random vs. Fragile Breakage Debate: Complex Rearrangements  PP & Tesler, PNAS 2003, argued that every evolutionary scenario for transforming Mouse into Human genome with reversals and translocations must result in a large number of breakpoint re-uses, a contradiction to the RBM.  Sankoff, PLoS CB 2006: “We cannot infer whether mutually randomized synteny block orderings derived from two divergent genomes were created ... through processes other than reversals and translocations.” (read: “transpositions” or 3-breaks)
  • 57. Random Breakage Theory Re-Re-Re-Re-Re-Visited  Ohno, 1970, Nadeau & Taylor, 1984 introduced RBM  Pevzner & Tesler, 2003 (PNAS) argued against RBM  Sankoff & Trinh, 2004 (RECOMB, JCB) argued against Pevzner & Tesler, 2003 arguments against RBM  Peng et al., 2006 (PLOS CB) argued against Sankoff & Trinh, 2004 arguments against Pevzner & Tesler, 2003 arguments against RBM  Sankoff, 2006 (PLOS CB) acknowledged an error in Sankoff&Trinh, 2004 but came up with a new argument against Pevzner and Tesler, 2003 arguments against RBM  Alekseyev and Pevzner, 2007 (PLOS CB) argued against Sankoff, 2006 new argument against Pevzner & Tesler, 2003 arguments against RBM
  • 58. Are There Rearrangement Hotspots in Human Genome?
  • 59. Are There Rearrangement Hotspots in Human Genome? “Theorem”. Yes
  • 60. Are There Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: • The formula for the 3-break distance implies that there were at least d3(Human,Mouse) = 139 rearrangements between human and mouse (including transpositions)
  • 61. Are There Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: • The formula for 3-break distance implies that there were at least d3(Human,Mouse) = 139 rearrangements between human and mouse (including transpositions) • Every 3-break creates up to 3 breakpoints
  • 62. Are There Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: • The formula for 3-break distance implies that there were at least d3(Human,Mouse) = 139 rearrangements between human and mouse (including transpositions) • Every 3-break creates up to 3 breakpoints • If there were no breakpoint re-use then after 139 rearrangements we may see 139*3=417 breakpoints
  • 63. Are There Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: • The formula for 3-break distance implies that there were at least d3(Human,Mouse) = 139 rearrangements between human and mouse (including transpositions) • Every rearrangement creates up to 3 breakpoints • If there were no breakpoint re-use then after 139 rearrangements we may see 139*3=417 breakpoints • But there are only 281 breakpoints between Human and Mouse!
  • 64. Are There Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: • The formula for 3-break distance implies that there were at least d3(Human,Mouse) = 139 rearrangements between human and mouse (including transpositions) • Every rearrangement creates up to 3 breakpoints • If there were no breakpoint re-use then after 139 rearrangements we may see 139*3=417 breakpoints • But there are only 281 breakpoints between Human and Mouse • Is 417 larger than 281?
  • 65. Are There Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: • The formula for 3-break distance implies that there were at least d3(Human,Mouse) = 139 rearrangements between human and mouse (including transpositions) • Every rearrangement creates up to 3 breakpoints • If there were no breakpoint re-use then after 139 rearrangements we may see 139*3=417 breakpoints • But there are only 281 breakpoints between Human and Mouse • Is 417 larger than 281? • Yes, 417 >> 281!
  • 66. Are There Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: • The formula for 3-break distance implies that there were at least d3(Human,Mouse) = 139 rearrangements between human and mouse (including transpositions) • Every rearrangement creates up to 3 breakpoints • If there were no breakpoint re-use then after 139 rearrangements we may see 139*3=417 breakpoints • But there are only 281 breakpoints between Human and Mouse • Is 417 larger than 281? • Yes, 417 >> 281!
  • 67. Transforming Human Genome into Mouse Genome by 3-Breaks 3-breaks Human = P0 → P1 → ... → Pd = Mouse d = d3(Human,Mouse) = 139 Our “proof” assumed that each 3-break makes 3 breakages in a genome, so the total number of breakages made in this transformation is 3 * 139 = 417. OOPS! The transformation may include 2-breaks (as a particular case of 3-breaks). If every 3-break were a 2-break then the total number of breakages is only 2*139 = 278 < 281, in which case there could be no breakpoint re-uses at all.
  • 68. Minimizing the Number of Breakages Problem. Given genomes P and Q, find a series of k-breaks transforming P into Q and making the smallest number of breakages. Theorem. Any series of k-breaks transforming P into Q makes at least dk(P,Q)+d2(P,Q) breakages Theorem. There exists a series of d3(P,Q) 3-breaks transforming P into Q and making d3(P,Q)+d2(P,Q) breakages. d2(Human,Mouse) = 246 d3(Human,Mouse) = 139 minimum number of breakages = 246 + 139 = 385
  • 69. Are There Rearrangement Hotspots in Human Genome? Theorem. Yes Proof: • The formula for 3-break distance implies that there were at least d3(Human,Mouse) = 139 rearrangements between human and mouse (including transpositions) • Every rearrangement creates up to 3 breakpoints • If there were no breakpoint re-use then after 139 rearrangements we SHOULD SEE AT LEAST 385 breakpoints (rather than 417 as before) • But there are only 281 breakpoints between Human and Mouse • Yes, 385 >> 281!
  • 70. Are there rearrangement hotspots in human genome?  We showed that even if 3-break rearrangements were frequent, the argument against the RBM still stands (Alekseyev & PP, PLoS Comput. Biol. 2007) We don’t believe that 3-breaks are frequent but even if they were, we proved that there exist fragile regions in the human genome. It is time to answer a more difficult question: “Where are the rearrangement hotspots in the human genome?” (the detailed analysis appeared in Alekseyev and PP, Genome Biol., Dec.1, 2010)
  • 71. Ancestral Genome Reconstruction Algorithmic Problem: Ancestral Genome Reconstruction and Multiple Breakpoint Graphs
  • 72. Ancestral Genomes Reconstruction  Given a set of genomes, reconstruct genomes of their common ancestors.
  • 73. Algorithms for Ancestral Genomes Reconstruction  GRAPPA: Tang, Moret, Warnow et al . (2001)  MGR: Bourque and PP (Genome Res. 2002)  InferCARs: Ma et al. (Genome Res. 2006)  EMRAE: Zhao and Bourque (Genome Res. 2007)  MGRA: Alekseyev and PP (Genome. Res.2009)
  • 74. Ancestral Genomes Reconstruction Problem (with a known tree)  Input: a set of k genomes and a phylogenetic tree T  Output: genomes at the internal nodes of the tree T that minimize the total sum of the 2-break distances along the branches of T  NP-complete in the “simplest” case of k=3.  What makes it hard?
  • 75. Ancestral Genomes Reconstruction Problem (with a known tree)  Input: a set of k genomes and a phylogenetic tree T  Output: genomes at the internal nodes of the tree T  Objective: minimize the total sum of the 2-break distances along the branches of T  NP-complete in the “simplest” case of k=3.  What makes it hard? BREAKPOINTS RE-USE
  • 76. How to Construct the Breakpoint Graph for Multiple Genomes? b b P c R d a a d c c a Q b d
  • 77. Constructing Multiple Breakpoint Graph: rearranging P in the Q order b b P c R d a a d c c c a Q b a b d d
  • 78. Constructing Multiple Breakpoint Graph: rearranging R in the Q order b b P c R d a a d c c c a Q b a b d d
  • 79. Multiple Breakpoint Graph: Still Gluing Red Edges with the Same Labels b b P c R d a a d c c c Multiple Breakpoint b Graph a Q b a G(P,Q,R) d d
  • 80. Multiple Breakpoint Graph of 6 Mammalian Genomes Multiple Breakpoint Graph G(M,R,D,Q,H,C) of the Mouse, Rat, Dog, macaQue, Human, and Chimpanzee genomes.
  • 81. Two Genomes: Two Ways of Sorting by 2-Breaks Transforming P into Q with “black” 2-breaks: P = P0 → P1 → ... → Pd-1 → Pd = Q G(P,Q) → G(P1,Q) → ... → G(Pd,Q) = G(Q,Q) Transforming Q into P with “green” 2-breaks: Q = Q0 → Q1 → ... → Qd = P G(P,Q) → G(P,Q1) → ... → G(P,Qd) = G(P,P) Let's make a black-green chimeric transformation
  • 82. Transforming G(P,Q) into (an unknown!) G(X,X) rather than into (a known) G(Q,Q) as before  Let X be any genome on a path from P to Q: P = P0 → P1 → ... → Pm = X = Qm-d ← ... ← Q1 ← Q0 = Q  Sorting by 2-breaks is equivalent to finding a shortest transformation of G(P,Q) into an identity breakpoint graph G(X,X) of a priori unknown genome X: G(P0,Q0) → G(P1,Q0) → G(P1,Q1) → G(P1,Q2) → ... → G(X,X)  The ―black‖ and ―green‖ 2-breaks may arbitrarily alternate.
  • 83. From 2 Genomes To Multiple Genomes  We find a transformation of the multiple breakpoint graph G(P1,P2,...,Pk) into (a priori unknown!) identity multiple breakpoint graph G(X,X,...,X): G(P1,P2,...,Pk) → ... → G(X,X,...,X)  The evolutionary tree T defines groups of genomes (to which the same 2-breaks may be applied simultaneously).  For example, the groups {M, R} (Mouse and Rat) and {Q,H,C} (macaQue, Human, Chimpanzee) correspond to branches of T while the groups {M, C} and {R, D} do not.
  • 84. If the Evolutionary Tree Is Unknown  For the set of 7 mammalian genomes (Mouse, Rat, Dog, macaQue, Human, Chimpanzee, and Opossum), the evolutionary tree T was subject of enduring debates  Depending on the primate – rodent – carnivore split, three topologies are possible (only two of them are viable).
  • 85. Rearrangement Evidence For The Primate- Carnivore Split  Each of these three topologies has an unique branch that supports either blue (primate-rodent), or green (rodent-carnivore), or red (primate- carnivore) hypothesis. M M M O H H H D O D O D  Rearrangement analysis supports the primate – carnivore split.
  • 86. Rearrangement Evidence For The Primate-Carnivore Split  Each of these three topologies has an unique branch that supports either blue (primate-rodent), or green (rodent-carnivore), or red (primate- carnivore) hypothesis. M M M O 16 24 11 H H H D O D O D  Rearrangement analysis supports the primate – carnivore split.
  • 87. Genome Halving Problem Algorithmic Problem: Genome Halving Problem
  • 88. WGD and Genome Halving Problem  Whole Genome Duplication (WGD) of a genome R results in a perfect duplicated genome R+R where each chromosome is doubled.  The genome R+R is subjected to rearrangements that result in a duplicated genome P.  Genome Halving Problem: Given a duplicated genome P, find a perfect duplicated genome R+R minimizing the rearrangement distance between R+R and P.
  • 89. Genome Halving Problem: Previous Results  Algorithm for reconstructing a pre-duplicated genome was proposed in a series of papers by El- Mabrouk and Sankoff (culminating in SIAM J Comp., 2004).  The proof of the Genome Halving Theorem is rather technical (spans over 30 pages).
  • 90. Genome Halving Theorem  Found an error in the original Genome Halving Theorem for unichromosomal genomes and proved a theorem that adequately deals with all genomes. (Alekseyev & PP, SIAM J. Comp. 2007)  Suggested a new (short) proof of the Genome Halving Theorem - our proof is 5 pages long. (Alekseyev & PP, IEEE Bioinformatics 2007)  Proved the Genome Halving Theorem for the harder case of transposition-like operations (Alekseyev & PP, SODA 2007)  Introduced the notion of the contracted breakpoint graph that makes difficult problems (like the Genome Halving Problem) more transparent.
  • 91. 2-Break Distance Between Duplicated Genomes 2-Break Distance between Duplicated Genomes: a a b b a b a b
  • 92. 2-Break Distance Between Duplicated Genomes 2-Break Distance between Duplicated Genomes:. a a b b a b a b Difficulty: The notion of the breakpoint graph is not defined for duplicated genomes (the first a in P can correspond to either the first or the second a in Q).
  • 93. 2-Break Distance Between Duplicated Genomes 2-Break Distance between Duplicated Genomes: a a b b a b a b Idea: Establish a correspondence between the genes in P and Q and “re-label” the corresponding blocks (e.g., as a1, a2, b1, b2)
  • 94. 2-Break Distance Between Duplicated Genomes 2-Break Distance between Duplicated Genomes a1 a2 b1 b2 a1 b1 a2 b2 Treat the labeled genomes as non-duplicated, construct the breakpoint graph and compute the 2-break distance between them.
  • 95. Different Labelings Result in Different Breakpoint Graphs  There are 2 different labelings of the copies of each block.  One of these labelings (corresponding to a breakpoint graph with maximum number of cycles) is an optimal labeling.  Trying all possible labelings takes exponential time: 2#blocks invocations of the 2-break distance algorithm.  Labeling Problem. Construct an optimal labeling that results in the maximum number of cycles in the breakpoint graph (hence, the smallest 2-break distance).
  • 96. Constructing Breakpoint Graph of Duplicated Genomes a a P b a a b b b b a Q a a b a b b
  • 97. Contracted Genome Graphs: GLUING Red Edges a b a P b a b b a b a Q a b
  • 98. Contracted Breakpoint Graph: Gluing Red Edges Yet Again a Contracted Breakpoint b Graph a P b a G’(P,Q) b b b a a b a Q a b
  • 99. Contracted Breakpoint Graph of Duplicated Genomes  Edges of 3 colors: black, green, and red. b  Red edges form a matching. a  Black edges form black cycles.  Green edges form green cycles.
  • 100. What is the Relationship Between the Breakpoint Graph and the Contracted Breakpoint Graph? a a b b a b
  • 101. Every Breakpoint Graph Induces a Cycle Decomposition of the Contracted Breakpoint Graph a a b b Why do we care?
  • 102. Every Breakpoint Graph Induces Cycle Decomposition of the Contracted Breakpoint Graph a a b b Why do we care? Because optimal labeling corresponds to a maximum cycle decomposition.
  • 103. Maximum Cycle Decomposition Problem Open Problem 1: Given a contracted breakpoint graph, find its maximum black-green cycle decomposition. ?
  • 104. Labeling Problem Open Problem 2: Find a breakpoint graph that induces a given cycle decomposition of the contracted breakpoint graph. ?
  • 105. Computing 2-Break Distance Between Duplicated Genomes: Two Problems breakpoint graph inducing max cycle maximum contracted decomposition cycle breakpoint decomposition graph a b a b a b
  • 106. Computing 2-Break Distance Between Duplicated Genomes: Two HARD Problems breakpoint graph inducing max cycle maximum contracted decomposition cycle breakpoint decomposition graph a b a b a b These problems are difficult and we do not know how to solve them. However, we solved them in the case of the Genome Halving Problem.
  • 107. 2-Break Genome Halving Problem 2-Break Genome Halving Problem: Given a duplicated genome P, find a perfect duplicated genome R+R minimizing the 2-break distance: d2(P,R+R) = #blocks - cycles(P,R+R) Minimizing d2(P,R+R) is equivalent to finding a perfect duplicated genome R+R and a labeling of P and R+R that maximizes cycles(P,R+R).
  • 108. Contracted Breakpoint Graph of a Perfect Duplicated Genome a b a P b a G'(P,R+R) b b b a a b a R+R a b
  • 109. Contracted Breakpoint Graph of a Perfect Duplicated Genome has a Special Structure  Red edges form a matching.  Black edges form black cycles. G'(P,R+R)  Green edges form cycles in general case but now (for R+R) b these cycles are formed by a parallel (double) green edges forming a matching
  • 110. Finding the “Best” Contracted Breakpoint Graph for a Given Genome a b a P b a b
  • 111. “Blue” Problem: Optimal Contracted Breakpoint Graph a b a P b a b b a
  • 112. “Magenta” Problem: Maximum Cycle Decomposition a b a P b a b b a
  • 113. “Orange” Problem: Labeling a b a P b a b b a b a ?
  • 114. Solutions to ?, ?, and ? are too complex to be presented in this talk (SIAM J. Comp.2007) a b a P b a b b a b a ?
  • 116. Acknowledgments Max Alekseyev David Sankoff “If you are not criticized, you may not be doing much.” Donald Rumsfeld
  • 117. Acknowledgments Max Alekseyev David Sankoff “If you are not criticized, you may not be doing much.” Donald Rumsfeld Glenn Tesler, Math. Dept., UCSD Qian Peng