SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Allele frequencies as Stochastic Processes
Mathematical and Statistical Approaches

Gota Morota

Nov 30, 2010

1 / 32
Outline

Change of Allele Frequencies as Stochastic Processes

Steady State Distributions of Allele Frequencies

Time Series Analysis

2 / 32
Outline

Change of Allele Frequencies as Stochastic Processes

Steady State Distributions of Allele Frequencies

Time Series Analysis

3 / 32
Outline

Change of Allele Frequencies as Stochastic Processes

Steady State Distributions of Allele Frequencies

Time Series Analysis

4 / 32
Various factors affecting allele frequencies

• Selection, mutation and migration (cross breedings) ⇒
systematic pressures (Wright 1949)
• Random fluctuations
1. Random sampling of gametes (genetic drift)
2. Random fluctuation in systematic pressures

⇓
Allele frequencies are funcions of the systematic forces and the
random components

5 / 32
Random walk ⇒ Brownian Motion
0.10
−0.010

0.05
−0.015
0.00
−0.020
−0.05

−0.025
−0.10

−0.030

−0.15

−0.20

−0.035

−0.25
−0.040
200
2

4

6

8

400

10

600

800

1000

Time

Time

Figure 3: Time = [1:1000]

Figure 1: Time = [1,10]

0.8
−0.02

0.6
−0.04
0.4

−0.06
0.2

−0.08
0.0

−0.10

−0.2

2000
20

40

60

80

100

Time

Figure 2: Time = [1:100]

4000

6000

8000

10000

Time

Figure 4: Time = [1:10000]
6 / 32
Brownian Motion ⇒ Diffusion Model

0.8

0.6

+ conditional on
forces

0.4

0.2

Systematic

0.0

−0.2

2000

4000

6000

8000

10000

Time

Figure 5: Time = [1:10000]

• treat change of allele frequencies as stochastic porcess

⇓

Diffusion Model

7 / 32
Diffusion Model

Allele Frequency

It frames infinite number of paths that allele fequencies would take
over time under certain systematic pressures.

0

2000

4000

6000

8000

10000

6000

8000

10000

6000

8000

10000

Allele Frequency

Time

0

2000

4000

Allele Frequency

Time

0

2000

4000
Time

• pick up single time
point t (say 5000 in
above)
• try to find PDF at
point t

• need to solve partial differntial
equation (PDE)
• Fokker-Planck Equation!

8 / 32
Fokker-Planck Equation
• Derived from a continuous time stochastic process (X)
• Partial differential equation

∂
∂φ(p , x ; t ) 1 ∂2
{Vδx φ(p , x ; t )} −
=
{Mδx φ(p , x ; t )}
2
∂t
2 ∂x
∂x

(1)

where
• p: initial allele frequency (fixed)
• x: allele frequency (random variable)
• t: time (continuous variable)
• φ(p , x ; t ): PDF
• Vδx : variance of δx (amount of change in allele frequency per
time)
• Mδx : mean of δx (amount of change in allele frequency per
time)
• Vδx and Mδx : both may depend on x and t
9 / 32
Fokker-Planck Equation for Brownian Motion
A standard Brownian motion can be constructed from random walk
with error having mean 0 and variance 1 under right scaling. It has
the PDF of N(0, t).
• when t = 1.0, N(0, 1)
• when t = 1.5, N(0, 1.5)

Fokker-Planck equation:

∂φ(p , x ; t ) 1 ∂2
=
φ(p , x ; t )
∂t
2 ∂x 2
= Heat equation

(2)
(3)

Mδx = 0 and Vδx = 1 in equation (1)
Solution:

φ(p .x ; t ) = √

1

2πt

exp

−x 2
2t

(4)
10 / 32
Solution of the Heat Equation (the Heat Kernel)

t = 0.00001
t = 0.01
t=0.1
t=1
t=10

−2

−1

0

1

2

x

11 / 32
Under Random Genetic Drift
Mδx = 0

Vδx =

x (1 − x )
2Ne

Fokker-Planck equation for random genetic drift:

∂φ(p , x ; t )
1 ∂2
x (1 − x )φ(p , x ; t )
=
∂t
4Ne ∂x 2

(5)

Solutions are obtained as infinite series of sum by...
• Kimura (1955) Hypergeometric function
• Korn and Korn (1968) Gegenbauer polynomial

φ = 6p (1 − p )exp

−1
2Ne

t + 30p (1 − p )(1 − 2p )(1 − 2x )

−3
2Ne

t + ··· ,

12 / 32
Solution of FPE (Kimura 1955)
VOL. 41) 1955

GENETICS: MOTOO KIMURA

149

FIGS. 1-2.-The processes of the change in the probability distribution of heterallelic classes,
due to random sampling of gametes in reproduction. It is assumed that the population starts
from the gene frequency 0.5 in Fig. 1 (left) and 0.1 in Fig. 2 (right). t = time in generation; N = effective size of the population; abscissa is gene frequency; ordinate is probability
density.
13 / 32
Under Selection and Random Genetic Drift
Mδx = sx (1 − x )

Vδx =

x (1 − x )
2Ne

∂
1 ∂2
∂φ(p , x ; t )
x (1 − x )φ(p , x ; t ) − s x (1 − x )φ(p , x ; t ) (6)
=
∂t
4Ne ∂x 2
∂x
Solutions are obtained as infinite series using oblate spheroidal
equation using transformaton of allele frequencies (z = 1-2x)
• Kimura (1955)
• Kimura and Crow (1956)
∞
(1)

φ(p , x , t ) =
k =0

Ck exp (−λk t + 2cx )V1k (z )

(7)

where
(1)

V1k (z ) =

k 1
fn Tn (z )
n=0,1
14 / 32
Kolmogorov Backward Equation
• Derived from a continuous time stochastic process (P)
• Partial differential equation

∂
∂2
∂φ(p , x ; t ) 1
= Vδp 2 φ(p , x ; t ) + Mδp φ(p , x ; t )
∂t
2
∂p
∂p

(8)

where
• p: initial allele frequency (random variable)
• x: allele frequency (random variable except x in the time t is
fixed)
• t: time (continuous variable)
• φ(p , x ; t ): PDF
• Vδp : variance of δp (amount of change in allele frequency)
• Mδp : mean of δp (amount of change in allele frequency)
• Vδp and Mδp : both may depend on x but not on t (time
homogeneous)
15 / 32
Steady State Distribution of Allele Frequencies
Equilibrium
• single point (balance between various forces that keep allele
frequecies near equilibrium )
• PDF

⇓
PDF of stable equilibrium instead of single point
Steady state allele frequency distribution
• Fisher (1922), (1930)
• Wright (1931), (1937), (1938)

φ(p , x ; t ) = solution of a fokker-planck equation
lim φ(p , x ; t ) = φ(x )

(10)

t →∞

φ(x ) =

C
exp (2
V δx

(9)

M δx
dx )
Vδx

(11)
16 / 32
Steady State Distribution – Random Genetic Drift

For a large value of t, only the first few terms have impact on
determining the actual form of the PDF.

φ = 6p (1 − p )exp

−t
2Ne

+ 30p (1 − p )(1 − 2p )(1 − 2x )

−3t
2Ne

+ ··· ,

Asymptotic formula:
lim φ = C · exp

t →∞

−1
2Ne

t

17 / 32
is large can be found directly from the Poisson series according to which
the chance of drawing 0 where m is the mean number in a sample i s r m .
The contribution to the 0 class will thus be (e-1+e-2+e-3 . . .)f =
e-l
f , = 0.582f.
1-e-l

Graphical Representation (Wright 1931)

T

25%

50%

754,

Factor Frequ e nc y

FIGURE
3.-Distribution of gene frequencies in an isolated population in which fixation and
loss of genes each is proceeding at the rate 1/4N in the absence of appreciable selection or muta-

18 / 32
Steady State Distribution – Selection and Mutation

Mδx = −ux + v (1 − x ) +

¯
x (1 − x ) d a
2
dx

Vδx =

x (1 − x )
2Ne

¯
φ(x ) = C · exp (2Ne a )x 4Ne v −1 (1 − x )4Ne u−1

(12)

When A has selecive advantage s over a:

¯
a = 2sx 2 + s2x (1 − x ) + 0 ∗ (1 − x 2 )
= 2sx
φ(x ) = C · exp (4Ne sx )x 4Ne v −1 (1 − x )4Ne u−1

(13)

19 / 32
Graphical Representation (Wright 1937)
GENETICS: S. WRIGHT
308

PROC. N. A. S.

Fig.l

Fig 4

Fi9.2

Fig. 5

Fig. 6
20 / 32
Time Series Analysis

When variable is measured sequentially in time resulting data form
a time series.
• Diffusion Model – Continuous time stochastic process
• Time Series – Discrete time stochastic process

21 / 32
Basic Models
Observations close together in time tend to be correlated
• Autoregressive Model: AR(p)
p

Xt = c +

ψi Xt −i +

t

(14)

i =1

• Moving Average Model: MA(q)
q

Xt = c +

θi

t −i

+

t

(15)

i =1

• Autoregressive Moving Average Model: ARMA (p, q)

Xt = AR(p) + MA(q)

(16)

22 / 32
Time Series as a Polynomial Equation
B k Xt = Xt −k (back shift operator)
• AR(p)

Xt = ψ1 Xt −1 + · · · + ψp Xt −p
Xt = (ψ1 B + · · · + ψp B p )Xt

(1 − ψ1 B − · · · − ψp B p )Xt = 0
• ARMA(p,q)

Xt − ψ1 Xt −1 − · · · − ψp Xt −p =

t

+ θ1

t −1

+ · · · + θq

t −q

(1 − ψ1 B − · · · − ψp B )Xt = (1 + θ1 B + · · · + θq B q )
p

t

23 / 32
Stationary Process
The mean and variance do not change over time. No trend.
Not stationary

Looks like stationary
10

0.8

0.6

5

0.4
0
0.2

−5

0.0

−0.2
−10
2000

4000

6000

8000

10000

2000

4000

6000

8000

Time

Figure 6: Random Walk

10000

Time

Figure 7: Detrended

Detrending:
• linear regression
• take a difference
• Autoregressive Integrated Moving Average: ARIMA(p,d,q)
24 / 32
Application on Allele Frequencies
• Influential SNPs – indicative of deterministic trends
• Uninfluential SNPs – random fluctuation?
• Diffusion Model – assumed Markovian process
• Time Series – which model describes the process of change
of allele frequencies

Application
• Objective: model process of change of allele freqeuncies
• Data: SNPs genotypes of 4,798 Holstein bulls with 38,416
markers and milk yield
• Genotype inputation: FastPhase 1.4
• Estimation of marker effects: BayesCπ

25 / 32
BayesCπ

Analysis of human mini-exome sequencing data using a Bayesian hierarchical mixture
model: Genetic Analysis Workshop 17
Bueno Filho JS1,2∗ , Morota G1∗ , Tran QT3 , Maenner MJ4 , Vera-Cala LM4,5 , Engelman CD4§ , and Meyers KJ4§
Department of Dairy Science, University of Wisconsin-Madison, USA
Departamento de Ciˆncias Exatas, Universidade Federal de Lavras, Brasil
e
3
Department of Statistics, University of Wisconsin-Madison, USA
4
Department of Population Health Sciences, University of Wisconsin-Madison, USA
5
Departamento de Salud Publica, Universidad Industrial de Santander, Colombia

1
2

∗
§

Contributed equally to this work
Corresponding author

Email addresses:
JSB: jssbueno@dex.ufla.br
Figure
GM: morota@wisc.edu
QTT: tran@stat.wisc.edu
MJM: maenner@waisman.wisc.edu
LMV: veracala@wisc.edu
CDE: cengelman@wisc.edu
KJM: kjmeyers2@wisc.edu

8: GAW17

26 / 32
Allele Frequency of the Top Marker

0.8
0.6
0.4

Allele Frequency

Original

0

5

10

15

20

25

30

25

30

Time

0.15
0.00
−0.15

Allele Frequency

Detrended

5

10

15

20

Time

Figure 9: Time plots of allele frequencies. Top: Original series. Bottom:
Smoothed by taking the first order difference.
27 / 32
Autocorrelation and Partial Autocorrelation
ARIMA(1,1,1)?
Original series

0.2

−0.4

−0.2

0.0

Partial ACF

0.4
0.0

ACF

0.8

0.4

Original series

0

2

4

6

8

10

12

14

2

4

6

8

10

12

First order difference series

14

Lag

First order ifference series

0.2
0.0

Partial ACF

−0.4

−0.2

0.4
0.0
−0.4

ACF

0.8

0.4

Lag

0

2

4

6

8
Lag

10

12

14

2

4

6

8

10

12

14

Lag

Figure 10: ACF and PACF
28 / 32
Model Selection

Table 1: Comparison of several competitive models

Model
ARIMA (1,0,0)
ARIMA (0,1,0)
ARIMA (0,0,1)

AIC
-51.56
-49.38
-46.41

Model
ARIMA (1,1,0)
ARIMA (1,0,1)
ARIMA (1,1,1)

AIC
-52.47
-51.13
-51.02

ARIMA(1,1,0)
Xt = 0.635Xt −1 +

t

29 / 32
Advanced Models

Time dependent variance
• ARCH (Autoregressive Conditional Heteroskedasticity)
• GARCH (Generalized Autoregressive Conditional
Heteroskedasticity)

Multivariate
• VARMA (Vector Autoregression Moving Average)
• BVARMA (Bayesian Vector Autoregression Moving Average)

30 / 32
Intersection of Mathematics and Statistics

Under certain condition
GARCH(1,1) ≈ Diffusion Model!

31 / 32
Thank you!

32 / 32

Mais conteúdo relacionado

Mais procurados

2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...NUI Galway
 
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...NUI Galway
 
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...NUI Galway
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methodsChristian Robert
 
Sampling strategies for Sequential Monte Carlo (SMC) methods
Sampling strategies for Sequential Monte Carlo (SMC) methodsSampling strategies for Sequential Monte Carlo (SMC) methods
Sampling strategies for Sequential Monte Carlo (SMC) methodsStephane Senecal
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa
 
A lambda calculus for density matrices with classical and probabilistic controls
A lambda calculus for density matrices with classical and probabilistic controlsA lambda calculus for density matrices with classical and probabilistic controls
A lambda calculus for density matrices with classical and probabilistic controlsAlejandro Díaz-Caro
 

Mais procurados (20)

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
CLIM Fall 2017 Course: Statistics for Climate Research, Guest lecture: Data F...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.18 Time Series Analysis Workshop ..Applications in Physiology, Climat...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
Lecture 9 f17
Lecture 9 f17Lecture 9 f17
Lecture 9 f17
 
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
2018 MUMS Fall Course - Bayesian inference for model calibration in UQ - Ralp...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
2013.06.17 Time Series Analysis Workshop ..Applications in Physiology, Climat...
 
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
CLIM Fall 2017 Course: Statistics for Climate Research, Geostats for Large Da...
 
MCMC and likelihood-free methods
MCMC and likelihood-free methodsMCMC and likelihood-free methods
MCMC and likelihood-free methods
 
Sampling strategies for Sequential Monte Carlo (SMC) methods
Sampling strategies for Sequential Monte Carlo (SMC) methodsSampling strategies for Sequential Monte Carlo (SMC) methods
Sampling strategies for Sequential Monte Carlo (SMC) methods
 
CLIM Fall 2017 Course: Statistics for Climate Research, Detection & Attributi...
CLIM Fall 2017 Course: Statistics for Climate Research, Detection & Attributi...CLIM Fall 2017 Course: Statistics for Climate Research, Detection & Attributi...
CLIM Fall 2017 Course: Statistics for Climate Research, Detection & Attributi...
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
A lambda calculus for density matrices with classical and probabilistic controls
A lambda calculus for density matrices with classical and probabilistic controlsA lambda calculus for density matrices with classical and probabilistic controls
A lambda calculus for density matrices with classical and probabilistic controls
 

Semelhante a Allele Frequencies as Stochastic Processes: Mathematical & Statistical Approaches.

Looking Inside Mechanistic Models of Carcinogenesis
Looking Inside Mechanistic Models of CarcinogenesisLooking Inside Mechanistic Models of Carcinogenesis
Looking Inside Mechanistic Models of CarcinogenesisSascha Zöllner
 
extreme times in finance heston model.ppt
extreme times in finance heston model.pptextreme times in finance heston model.ppt
extreme times in finance heston model.pptArounaGanou2
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionJeremyHeng10
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Frank Nielsen
 
Frequency14.pptx
Frequency14.pptxFrequency14.pptx
Frequency14.pptxMewadaHiren
 
Ray : modeling dynamic systems
Ray : modeling dynamic systemsRay : modeling dynamic systems
Ray : modeling dynamic systemsHouw Liong The
 
Talk in BayesComp 2018
Talk in BayesComp 2018Talk in BayesComp 2018
Talk in BayesComp 2018JeremyHeng10
 
Formulas statistics
Formulas statisticsFormulas statistics
Formulas statisticsPrashi_Jain
 
Controlled sequential Monte Carlo
Controlled sequential Monte Carlo Controlled sequential Monte Carlo
Controlled sequential Monte Carlo JeremyHeng10
 
Seismic data processing lecture 3
Seismic data processing lecture 3Seismic data processing lecture 3
Seismic data processing lecture 3Amin khalil
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionJeremyHeng10
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionPedro222284
 
Lecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsFrank Kienle
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Alexander Litvinenko
 
A series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyA series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyFrank Nielsen
 

Semelhante a Allele Frequencies as Stochastic Processes: Mathematical & Statistical Approaches. (20)

Looking Inside Mechanistic Models of Carcinogenesis
Looking Inside Mechanistic Models of CarcinogenesisLooking Inside Mechanistic Models of Carcinogenesis
Looking Inside Mechanistic Models of Carcinogenesis
 
extreme times in finance heston model.ppt
extreme times in finance heston model.pptextreme times in finance heston model.ppt
extreme times in finance heston model.ppt
 
main
mainmain
main
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
 
Frequency14.pptx
Frequency14.pptxFrequency14.pptx
Frequency14.pptx
 
ENFPC 2010
ENFPC 2010ENFPC 2010
ENFPC 2010
 
Ray : modeling dynamic systems
Ray : modeling dynamic systemsRay : modeling dynamic systems
Ray : modeling dynamic systems
 
002 ray modeling dynamic systems
002 ray modeling dynamic systems002 ray modeling dynamic systems
002 ray modeling dynamic systems
 
002 ray modeling dynamic systems
002 ray modeling dynamic systems002 ray modeling dynamic systems
002 ray modeling dynamic systems
 
Talk in BayesComp 2018
Talk in BayesComp 2018Talk in BayesComp 2018
Talk in BayesComp 2018
 
Formulas statistics
Formulas statisticsFormulas statistics
Formulas statistics
 
Controlled sequential Monte Carlo
Controlled sequential Monte Carlo Controlled sequential Monte Carlo
Controlled sequential Monte Carlo
 
Seismic data processing lecture 3
Seismic data processing lecture 3Seismic data processing lecture 3
Seismic data processing lecture 3
 
Sequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmissionSequential Monte Carlo algorithms for agent-based models of disease transmission
Sequential Monte Carlo algorithms for agent-based models of disease transmission
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...
 
The Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability DistributionThe Multivariate Gaussian Probability Distribution
The Multivariate Gaussian Probability Distribution
 
Lecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo Methods
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
 
A series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropyA series of maximum entropy upper bounds of the differential entropy
A series of maximum entropy upper bounds of the differential entropy
 

Último

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Último (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Allele Frequencies as Stochastic Processes: Mathematical & Statistical Approaches.

  • 1. Allele frequencies as Stochastic Processes Mathematical and Statistical Approaches Gota Morota Nov 30, 2010 1 / 32
  • 2. Outline Change of Allele Frequencies as Stochastic Processes Steady State Distributions of Allele Frequencies Time Series Analysis 2 / 32
  • 3. Outline Change of Allele Frequencies as Stochastic Processes Steady State Distributions of Allele Frequencies Time Series Analysis 3 / 32
  • 4. Outline Change of Allele Frequencies as Stochastic Processes Steady State Distributions of Allele Frequencies Time Series Analysis 4 / 32
  • 5. Various factors affecting allele frequencies • Selection, mutation and migration (cross breedings) ⇒ systematic pressures (Wright 1949) • Random fluctuations 1. Random sampling of gametes (genetic drift) 2. Random fluctuation in systematic pressures ⇓ Allele frequencies are funcions of the systematic forces and the random components 5 / 32
  • 6. Random walk ⇒ Brownian Motion 0.10 −0.010 0.05 −0.015 0.00 −0.020 −0.05 −0.025 −0.10 −0.030 −0.15 −0.20 −0.035 −0.25 −0.040 200 2 4 6 8 400 10 600 800 1000 Time Time Figure 3: Time = [1:1000] Figure 1: Time = [1,10] 0.8 −0.02 0.6 −0.04 0.4 −0.06 0.2 −0.08 0.0 −0.10 −0.2 2000 20 40 60 80 100 Time Figure 2: Time = [1:100] 4000 6000 8000 10000 Time Figure 4: Time = [1:10000] 6 / 32
  • 7. Brownian Motion ⇒ Diffusion Model 0.8 0.6 + conditional on forces 0.4 0.2 Systematic 0.0 −0.2 2000 4000 6000 8000 10000 Time Figure 5: Time = [1:10000] • treat change of allele frequencies as stochastic porcess ⇓ Diffusion Model 7 / 32
  • 8. Diffusion Model Allele Frequency It frames infinite number of paths that allele fequencies would take over time under certain systematic pressures. 0 2000 4000 6000 8000 10000 6000 8000 10000 6000 8000 10000 Allele Frequency Time 0 2000 4000 Allele Frequency Time 0 2000 4000 Time • pick up single time point t (say 5000 in above) • try to find PDF at point t • need to solve partial differntial equation (PDE) • Fokker-Planck Equation! 8 / 32
  • 9. Fokker-Planck Equation • Derived from a continuous time stochastic process (X) • Partial differential equation ∂ ∂φ(p , x ; t ) 1 ∂2 {Vδx φ(p , x ; t )} − = {Mδx φ(p , x ; t )} 2 ∂t 2 ∂x ∂x (1) where • p: initial allele frequency (fixed) • x: allele frequency (random variable) • t: time (continuous variable) • φ(p , x ; t ): PDF • Vδx : variance of δx (amount of change in allele frequency per time) • Mδx : mean of δx (amount of change in allele frequency per time) • Vδx and Mδx : both may depend on x and t 9 / 32
  • 10. Fokker-Planck Equation for Brownian Motion A standard Brownian motion can be constructed from random walk with error having mean 0 and variance 1 under right scaling. It has the PDF of N(0, t). • when t = 1.0, N(0, 1) • when t = 1.5, N(0, 1.5) Fokker-Planck equation: ∂φ(p , x ; t ) 1 ∂2 = φ(p , x ; t ) ∂t 2 ∂x 2 = Heat equation (2) (3) Mδx = 0 and Vδx = 1 in equation (1) Solution: φ(p .x ; t ) = √ 1 2πt exp −x 2 2t (4) 10 / 32
  • 11. Solution of the Heat Equation (the Heat Kernel) t = 0.00001 t = 0.01 t=0.1 t=1 t=10 −2 −1 0 1 2 x 11 / 32
  • 12. Under Random Genetic Drift Mδx = 0 Vδx = x (1 − x ) 2Ne Fokker-Planck equation for random genetic drift: ∂φ(p , x ; t ) 1 ∂2 x (1 − x )φ(p , x ; t ) = ∂t 4Ne ∂x 2 (5) Solutions are obtained as infinite series of sum by... • Kimura (1955) Hypergeometric function • Korn and Korn (1968) Gegenbauer polynomial φ = 6p (1 − p )exp −1 2Ne t + 30p (1 − p )(1 − 2p )(1 − 2x ) −3 2Ne t + ··· , 12 / 32
  • 13. Solution of FPE (Kimura 1955) VOL. 41) 1955 GENETICS: MOTOO KIMURA 149 FIGS. 1-2.-The processes of the change in the probability distribution of heterallelic classes, due to random sampling of gametes in reproduction. It is assumed that the population starts from the gene frequency 0.5 in Fig. 1 (left) and 0.1 in Fig. 2 (right). t = time in generation; N = effective size of the population; abscissa is gene frequency; ordinate is probability density. 13 / 32
  • 14. Under Selection and Random Genetic Drift Mδx = sx (1 − x ) Vδx = x (1 − x ) 2Ne ∂ 1 ∂2 ∂φ(p , x ; t ) x (1 − x )φ(p , x ; t ) − s x (1 − x )φ(p , x ; t ) (6) = ∂t 4Ne ∂x 2 ∂x Solutions are obtained as infinite series using oblate spheroidal equation using transformaton of allele frequencies (z = 1-2x) • Kimura (1955) • Kimura and Crow (1956) ∞ (1) φ(p , x , t ) = k =0 Ck exp (−λk t + 2cx )V1k (z ) (7) where (1) V1k (z ) = k 1 fn Tn (z ) n=0,1 14 / 32
  • 15. Kolmogorov Backward Equation • Derived from a continuous time stochastic process (P) • Partial differential equation ∂ ∂2 ∂φ(p , x ; t ) 1 = Vδp 2 φ(p , x ; t ) + Mδp φ(p , x ; t ) ∂t 2 ∂p ∂p (8) where • p: initial allele frequency (random variable) • x: allele frequency (random variable except x in the time t is fixed) • t: time (continuous variable) • φ(p , x ; t ): PDF • Vδp : variance of δp (amount of change in allele frequency) • Mδp : mean of δp (amount of change in allele frequency) • Vδp and Mδp : both may depend on x but not on t (time homogeneous) 15 / 32
  • 16. Steady State Distribution of Allele Frequencies Equilibrium • single point (balance between various forces that keep allele frequecies near equilibrium ) • PDF ⇓ PDF of stable equilibrium instead of single point Steady state allele frequency distribution • Fisher (1922), (1930) • Wright (1931), (1937), (1938) φ(p , x ; t ) = solution of a fokker-planck equation lim φ(p , x ; t ) = φ(x ) (10) t →∞ φ(x ) = C exp (2 V δx (9) M δx dx ) Vδx (11) 16 / 32
  • 17. Steady State Distribution – Random Genetic Drift For a large value of t, only the first few terms have impact on determining the actual form of the PDF. φ = 6p (1 − p )exp −t 2Ne + 30p (1 − p )(1 − 2p )(1 − 2x ) −3t 2Ne + ··· , Asymptotic formula: lim φ = C · exp t →∞ −1 2Ne t 17 / 32
  • 18. is large can be found directly from the Poisson series according to which the chance of drawing 0 where m is the mean number in a sample i s r m . The contribution to the 0 class will thus be (e-1+e-2+e-3 . . .)f = e-l f , = 0.582f. 1-e-l Graphical Representation (Wright 1931) T 25% 50% 754, Factor Frequ e nc y FIGURE 3.-Distribution of gene frequencies in an isolated population in which fixation and loss of genes each is proceeding at the rate 1/4N in the absence of appreciable selection or muta- 18 / 32
  • 19. Steady State Distribution – Selection and Mutation Mδx = −ux + v (1 − x ) + ¯ x (1 − x ) d a 2 dx Vδx = x (1 − x ) 2Ne ¯ φ(x ) = C · exp (2Ne a )x 4Ne v −1 (1 − x )4Ne u−1 (12) When A has selecive advantage s over a: ¯ a = 2sx 2 + s2x (1 − x ) + 0 ∗ (1 − x 2 ) = 2sx φ(x ) = C · exp (4Ne sx )x 4Ne v −1 (1 − x )4Ne u−1 (13) 19 / 32
  • 20. Graphical Representation (Wright 1937) GENETICS: S. WRIGHT 308 PROC. N. A. S. Fig.l Fig 4 Fi9.2 Fig. 5 Fig. 6 20 / 32
  • 21. Time Series Analysis When variable is measured sequentially in time resulting data form a time series. • Diffusion Model – Continuous time stochastic process • Time Series – Discrete time stochastic process 21 / 32
  • 22. Basic Models Observations close together in time tend to be correlated • Autoregressive Model: AR(p) p Xt = c + ψi Xt −i + t (14) i =1 • Moving Average Model: MA(q) q Xt = c + θi t −i + t (15) i =1 • Autoregressive Moving Average Model: ARMA (p, q) Xt = AR(p) + MA(q) (16) 22 / 32
  • 23. Time Series as a Polynomial Equation B k Xt = Xt −k (back shift operator) • AR(p) Xt = ψ1 Xt −1 + · · · + ψp Xt −p Xt = (ψ1 B + · · · + ψp B p )Xt (1 − ψ1 B − · · · − ψp B p )Xt = 0 • ARMA(p,q) Xt − ψ1 Xt −1 − · · · − ψp Xt −p = t + θ1 t −1 + · · · + θq t −q (1 − ψ1 B − · · · − ψp B )Xt = (1 + θ1 B + · · · + θq B q ) p t 23 / 32
  • 24. Stationary Process The mean and variance do not change over time. No trend. Not stationary Looks like stationary 10 0.8 0.6 5 0.4 0 0.2 −5 0.0 −0.2 −10 2000 4000 6000 8000 10000 2000 4000 6000 8000 Time Figure 6: Random Walk 10000 Time Figure 7: Detrended Detrending: • linear regression • take a difference • Autoregressive Integrated Moving Average: ARIMA(p,d,q) 24 / 32
  • 25. Application on Allele Frequencies • Influential SNPs – indicative of deterministic trends • Uninfluential SNPs – random fluctuation? • Diffusion Model – assumed Markovian process • Time Series – which model describes the process of change of allele frequencies Application • Objective: model process of change of allele freqeuncies • Data: SNPs genotypes of 4,798 Holstein bulls with 38,416 markers and milk yield • Genotype inputation: FastPhase 1.4 • Estimation of marker effects: BayesCπ 25 / 32
  • 26. BayesCπ Analysis of human mini-exome sequencing data using a Bayesian hierarchical mixture model: Genetic Analysis Workshop 17 Bueno Filho JS1,2∗ , Morota G1∗ , Tran QT3 , Maenner MJ4 , Vera-Cala LM4,5 , Engelman CD4§ , and Meyers KJ4§ Department of Dairy Science, University of Wisconsin-Madison, USA Departamento de Ciˆncias Exatas, Universidade Federal de Lavras, Brasil e 3 Department of Statistics, University of Wisconsin-Madison, USA 4 Department of Population Health Sciences, University of Wisconsin-Madison, USA 5 Departamento de Salud Publica, Universidad Industrial de Santander, Colombia 1 2 ∗ § Contributed equally to this work Corresponding author Email addresses: JSB: jssbueno@dex.ufla.br Figure GM: morota@wisc.edu QTT: tran@stat.wisc.edu MJM: maenner@waisman.wisc.edu LMV: veracala@wisc.edu CDE: cengelman@wisc.edu KJM: kjmeyers2@wisc.edu 8: GAW17 26 / 32
  • 27. Allele Frequency of the Top Marker 0.8 0.6 0.4 Allele Frequency Original 0 5 10 15 20 25 30 25 30 Time 0.15 0.00 −0.15 Allele Frequency Detrended 5 10 15 20 Time Figure 9: Time plots of allele frequencies. Top: Original series. Bottom: Smoothed by taking the first order difference. 27 / 32
  • 28. Autocorrelation and Partial Autocorrelation ARIMA(1,1,1)? Original series 0.2 −0.4 −0.2 0.0 Partial ACF 0.4 0.0 ACF 0.8 0.4 Original series 0 2 4 6 8 10 12 14 2 4 6 8 10 12 First order difference series 14 Lag First order ifference series 0.2 0.0 Partial ACF −0.4 −0.2 0.4 0.0 −0.4 ACF 0.8 0.4 Lag 0 2 4 6 8 Lag 10 12 14 2 4 6 8 10 12 14 Lag Figure 10: ACF and PACF 28 / 32
  • 29. Model Selection Table 1: Comparison of several competitive models Model ARIMA (1,0,0) ARIMA (0,1,0) ARIMA (0,0,1) AIC -51.56 -49.38 -46.41 Model ARIMA (1,1,0) ARIMA (1,0,1) ARIMA (1,1,1) AIC -52.47 -51.13 -51.02 ARIMA(1,1,0) Xt = 0.635Xt −1 + t 29 / 32
  • 30. Advanced Models Time dependent variance • ARCH (Autoregressive Conditional Heteroskedasticity) • GARCH (Generalized Autoregressive Conditional Heteroskedasticity) Multivariate • VARMA (Vector Autoregression Moving Average) • BVARMA (Bayesian Vector Autoregression Moving Average) 30 / 32
  • 31. Intersection of Mathematics and Statistics Under certain condition GARCH(1,1) ≈ Diffusion Model! 31 / 32