This document discusses generalizing phylogenetics to infer patterns of shared evolutionary events. It begins by outlining how shared ancestry is a fundamental property of life and how phylogenetics is progressing as the statistical foundation of comparative biology. It then discusses how violating the assumption of independent divergences can provide insights into processes like biogeography, gene family evolution, and endosymbiont evolution. The document considers challenges in accounting for shared divergences, like the complexity of gene trees and sampling over all possible models, and proposes two approaches: using existing methods or approximate Bayesian computation with a diffuse Dirichlet process prior.
Generalizing phylogenetics to infer patterns of shared evolutionary events
1. Generalizing
phylogenetics to infer
patterns of shared
evolutionary events
Jamie R. Oaks
Department of Biological Sciences &
Museum of Natural History, Auburn
University
August 23, 2018
c 2007 Boris Kulikov boris-kulikov.blogspot.com
Shared divergences Jamie Oaks – phyletica.org 1/35
3. Shared ancestry is a fundamental
property of life
Shared divergences Jamie Oaks – phyletica.org 2/35
4. Shared ancestry is a fundamental
property of life
Phylogenetics is rapidly
progressing as the statistical
foundation of comparatve biology
Shared divergences Jamie Oaks – phyletica.org 2/35
5. Shared ancestry is a fundamental
property of life
Phylogenetics is rapidly
progressing as the statistical
foundation of comparatve biology
“Big data” present exciting
possibilities and computational
challenges
Shared divergences Jamie Oaks – phyletica.org 2/35
6. Shared ancestry is a fundamental
property of life
Phylogenetics is rapidly
progressing as the statistical
foundation of comparatve biology
“Big data” present exciting
possibilities and computational
challenges
Exciting opportunities to develop
new ways to study biology in the
light of phylogeny
Shared divergences Jamie Oaks – phyletica.org 2/35
8. Assumption: All processes of
diversification affect each lineage
independently and only cause
bifurcating divergences.
Shared divergences Jamie Oaks – phyletica.org 3/35
14. Biogeography
Environmental changes that
affect whole communities of
species
Gene family evolution
Chromosomal duplications
Epidemiology
Disease spread via co-infected
individuals
Transmission at social
gatherings
Shared divergences Jamie Oaks – phyletica.org 5/35
15. Biogeography
Environmental changes that
affect whole communities of
species
Gene family evolution
Chromosomal duplications
Epidemiology
Disease spread via co-infected
individuals
Transmission at social
gatherings
Endosymbiont evolution (e.g.,
parasites, microbiome)
Speciation of the host
Co-colonization of new host
species
Shared divergences Jamie Oaks – phyletica.org 5/35
20. Why account for shared divergences?
1. Improve inference
2. Provide a framework for studying processes of co-diversification
Shared divergences Jamie Oaks – phyletica.org 7/35
21. Biogeography
Environmental changes that
affect whole communities of
species
Gene family evolution
Chromosomal duplications
Epidemiology
Disease spread via co-infected
individuals
Transmission at social
gatherings
Endosymbiont evolution (e.g.,
parasites, microbiome)
Speciation of the host
Co-colonization of new host
species
Shared divergences Jamie Oaks – phyletica.org 8/35
29. m1 m2 m3 m4 m5
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/35
30. m1 m2 m3 m4 m5
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer the model and divergence times given DNA alignments
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/35
31. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer the model and divergence times given DNA alignments
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/35
32. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer the model and divergence times given DNA alignments
p(mi | X) ∝ p(X | mi )p(mi )
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/35
33. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer the model and divergence times given DNA alignments
p(mi | X) ∝ p(X | mi )p(mi )
p(X | mi ) =
θ
p(X | θ, mi )p(θ | mi )dθ
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/35
34. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
We want to infer the model and divergence times given DNA alignments
p(mi | X) ∝ p(X | mi )p(mi )
p(X | mi ) =
θ
p(X | θ, mi )p(θ | mi )dθ
Divergence times
Gene trees
Substitution parameters
Demographic parameters
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/35
35. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
Challenges:
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/35
36. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
Challenges:
1. Likelihood is tractable, but gene trees are a pain
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/35
37. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
Challenges:
1. Likelihood is tractable, but gene trees are a pain
2. Sampling over all possible models
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/35
38. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
Challenges:
1. Likelihood is tractable, but gene trees are a pain
2. Sampling over all possible models
5 taxa = 52 models
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/35
39. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
Challenges:
1. Likelihood is tractable, but gene trees are a pain
2. Sampling over all possible models
5 taxa = 52 models
10 taxa = 115,975 models
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/35
40. p(m1 | X) p(m2 | X) p(m3 | X) p(m4 | X) p(m5 | X)
τ1 τ2 τ1 τ1τ2 τ1τ2 τ3 τ1τ2
Challenges:
1. Likelihood is tractable, but gene trees are a pain
2. Sampling over all possible models
5 taxa = 52 models
10 taxa = 115,975 models
20 taxa = 51,724,158,235,372 models!!
J. R. Oaks et al. (2013). Evolution 67: 991–1010, J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 10/35
47. Approach #1
Challenges:
1. Likelihood is tractable, but gene trees are difficult
2. Sampling over all possible models
1
M. J. Hickerson et al. (2006). Evolution 60: 2435–2453
2
W. Huang et al. (2011). BMC Bioinformatics 12: 1
Shared divergences Jamie Oaks – phyletica.org 14/35
48. Approach #1
Challenges:
1. Likelihood is tractable, but gene trees are difficult
Use an existing method1,2
2. Sampling over all possible models
Use an existing method1,2
1
M. J. Hickerson et al. (2006). Evolution 60: 2435–2453
2
W. Huang et al. (2011). BMC Bioinformatics 12: 1
Shared divergences Jamie Oaks – phyletica.org 14/35
49. 0.00
0.25
0.50
0.75
1.00
1 2 3 4 5 6 7 8 9
Number of events
Probability
J. R. Oaks et al. (2013). Evolution 67: 991–1010
Shared divergences Jamie Oaks – phyletica.org 15/35
50. Approach #2
Challenges:
1. Likelihood is tractable, but gene trees are difficult
2. Sampling over all possible models
Shared divergences Jamie Oaks – phyletica.org 16/35
51. Approach #2
Challenges:
1. Likelihood is tractable, but gene trees are difficult
Numerical approximation via approximate-likelihood Bayesian
computation (ABC)
2. Sampling over all possible models
Shared divergences Jamie Oaks – phyletica.org 16/35
52.
53. Approach #2
Challenges:
1. Likelihood is tractable, but gene trees are difficult
Numerical approximation via approximate-likelihood Bayesian
computation (ABC)
2. Sampling over all possible models
Shared divergences Jamie Oaks – phyletica.org 18/35
54. Approach #2
Challenges:
1. Likelihood is tractable, but gene trees are difficult
Numerical approximation via approximate-likelihood Bayesian
computation (ABC)
2. Sampling over all possible models
A “diffuse” Dirichlet process prior (DPP)
Shared divergences Jamie Oaks – phyletica.org 18/35
62. New method: dpp-msbayes
Approximate-likelihood Bayesian approach to inferring models of
shared divergences
Flexible Dirichlet-process prior (DPP) over all possible divergence
models
Flexible priors on parameters to avoid strongly weighted posteriors
J. R. Oaks (2014). BMC Evolutionary Biology 14: 150
Shared divergences Jamie Oaks – phyletica.org 20/35
63. 0.00
0.05
0.10
0.15
0.20
1 2 3 4 5 6 7 8 9
Number of events
Probability
J. R. Oaks et al. (2013). Evolution 67: 991–1010
Shared divergences Jamie Oaks – phyletica.org 21/35
65. Approach #3
Challenges:
1. Likelihood is tractable, but gene trees are difficult
2. Sampling over all possible models
A “diffuse” Dirichlet process prior (DPP)
1
D. Bryant et al. (2012). Molecular Biology and Evolution 29: 1917–1932
Shared divergences Jamie Oaks – phyletica.org 22/35
66. Approach #3
Challenges:
1. Likelihood is tractable, but gene trees are difficult
Bryant et al.1
introduced a way to integrate over them analytically
2. Sampling over all possible models
A “diffuse” Dirichlet process prior (DPP)
1
D. Bryant et al. (2012). Molecular Biology and Evolution 29: 1917–1932
Shared divergences Jamie Oaks – phyletica.org 22/35
67. Ecoevolity: Estimating evolutionary coevality1
1
J. R. Oaks (2018). bioRxiv doi:10.1101/324525
2
R. M. Neal (2000). Journal of Computational and Graphical Statistics 9: 249–265
3
D. Bryant et al. (2012). Molecular Biology and Evolution 29: 1917–1932
Shared divergences Jamie Oaks – phyletica.org 23/35
68. Ecoevolity: Estimating evolutionary coevality1
CTMC model of characters evolving along genealogies
Coalescent model of genealogies branching within populations
Dirichlet-process prior across divergence models
Gibbs sampling2 to numerically sample models
Analytically integrate over genealogies3
1
J. R. Oaks (2018). bioRxiv doi:10.1101/324525
2
R. M. Neal (2000). Journal of Computational and Graphical Statistics 9: 249–265
3
D. Bryant et al. (2012). Molecular Biology and Evolution 29: 1917–1932
Shared divergences Jamie Oaks – phyletica.org 23/35
69. Ecoevolity: Estimating evolutionary coevality1
CTMC model of characters evolving along genealogies
Coalescent model of genealogies branching within populations
Dirichlet-process prior across divergence models
Gibbs sampling2 to numerically sample models
Analytically integrate over genealogies3
Goal: Fast, full-likelihood Bayesian method to infer patterns of
co-diversification from genome-scale data
1
J. R. Oaks (2018). bioRxiv doi:10.1101/324525
2
R. M. Neal (2000). Journal of Computational and Graphical Statistics 9: 249–265
3
D. Bryant et al. (2012). Molecular Biology and Evolution 29: 1917–1932
Shared divergences Jamie Oaks – phyletica.org 23/35
81. Ecoevolity: Results
The results are very different
than the ABC methods
But, these are different data
Shared divergences Jamie Oaks – phyletica.org 29/35
82. Ecoevolity: Results
The results are very different
than the ABC methods
But, these are different data
For a direct comparison, we
applied full-likelihood and ABC
methods to random subset of
RADseq data
200 loci from 3 pairs of gecko
populations
Shared divergences Jamie Oaks – phyletica.org 29/35
83. A B
<0.000624
0.00533
553
0.0
0.3
0.6
0.9
1 2 3
Number of events
Probability
Camiguin Norte | Dalupiri
Maestre De Campo | Masbate
Babuyan Claro | Calayan
0.000 0.003 0.006 0.009 0.012
Divergence time
Ecoevolity
C D
18.9
0.151
0.0119
0.00
0.25
0.50
0.75
1 2 3
Number of events
Probability
Camiguin Norte | Dalupiri
Maestre De Campo | Masbate
Babuyan Claro | Calayan
0.000 0.003 0.006 0.009 0.012
Divergence time
dpp-msbayes
J. R. Oaks et al. (2018). bioRxiv doi:10.1101/395434
Shared divergences Jamie Oaks – phyletica.org 30/35
85. Our journey
0.00
0.25
0.50
0.75
1.00
1 2 3 4 5 6 7 8 9
Number of events
Probability
0.00
0.05
0.10
0.15
0.20
1 2 3 4 5 6 7 8 9
Number of events
Probability
Shared divergences Jamie Oaks – phyletica.org 31/35
86. Our journey
0.00
0.25
0.50
0.75
1.00
1 2 3 4 5 6 7 8 9
Number of events
Probability
0.00
0.05
0.10
0.15
0.20
1 2 3 4 5 6 7 8 9
Number of events
Probability
0.0
0.2
0.4
0.6
1 2 3 4 5 6 7 8
Number of events
Probability
Shared divergences Jamie Oaks – phyletica.org 31/35
87. Our journey
0.00
0.25
0.50
0.75
1.00
1 2 3 4 5 6 7 8 9
Number of events
Probability
0.00
0.05
0.10
0.15
0.20
1 2 3 4 5 6 7 8 9
Number of events
Probability
0.0
0.2
0.4
0.6
1 2 3 4 5 6 7 8
Number of events
Probability
Conclusions?
Shared divergences Jamie Oaks – phyletica.org 31/35
88. Next step: A general framework
Develop a framework for
inferring shared divergences
across phylogenies τ1τ2
Shared divergences Jamie Oaks – phyletica.org 32/35
89. Next step: A general framework
Develop a framework for
inferring shared divergences
across phylogenies τ1τ2
Shared divergences Jamie Oaks – phyletica.org 32/35
90. Next step: A general framework
Develop a framework for
inferring shared divergences
across phylogenies
Generalize Bayesian
phylogenetics to incorporate
shared divergences
τ1τ2
Shared divergences Jamie Oaks – phyletica.org 32/35
91. Next step: A general framework
Develop a framework for
inferring shared divergences
across phylogenies
Generalize Bayesian
phylogenetics to incorporate
shared divergences
Sample models numerically via
reversible-jump Markov chain
Monte Carlo
τ1τ2
Shared divergences Jamie Oaks – phyletica.org 32/35
92. Next step: A general framework
Develop a framework for
inferring shared divergences
across phylogenies
Generalize Bayesian
phylogenetics to incorporate
shared divergences
Sample models numerically via
reversible-jump Markov chain
Monte Carlo
Benefits:
Improve phylogenetic inference
Framework for studying
processes of co-diversification
τ1τ2
Shared divergences Jamie Oaks – phyletica.org 32/35