Bayesian Variable Selection Using Priors

Bayesian Variable Selection and the
(Ab)use of Priors

Bob O'Hara
BiK-F
Frankfurt am Main
Germany
blogs.nature.com/boboh/2012/07/16/abusing_a_prior

(this is mainly a review of work by other people)

Not Useful for Modern Applications
GWAS: 105 variables

Ikram MK et al (2010) Four Novel Loci (19q13, 6q24, 12q24, and 5q14) Influence the Microcirculation In Vivo. PLoS Genet. 2010 Oct 28;6(10):e1001184.

Anyway, we want to be Bayesian
Could use DIC, but same problems

So, let's build variable
selection into the model

Health Warnings I
I am not saying you should use variable selection

Health Warnings I
I am not saying you should use variable selection

p-values are EVIL

Health Warnings II
The methods I am about to describe are sensitive
to the priors

The Regression Problem
Our model:

K
y i =β0 + ∑ βk X ik +εi
k =1

(everything else is just a variant)

Bayesian Approach
Posterior:

K
P (β , β0, σε∣y)∝ P ( y∣β , β0, σε ) P (β0 ) P (σ ε ) ∏ P (βk )
k =1

Bayesian Approach
Posterior:

K
P (β ,β0, σ ε∣y)∝ P ( y∣β ,β0, σ ε ) P (β0) P (σ ε ) ∏ P (βk )
k =1

Likelihood

Bayesian Approach
Posterior:

K
P (β ,β0, σ ε∣y)∝ P ( y∣β ,β0, σ ε ) P (β0) P (σ ε ) ∏ P (βk )
k =1

Likelihood Priors for
regression
parameters

Fitting: use MCMC
Creates Markov chain
Loops through the parameters
Simply drop uninteresting parameters
marginalisation

The advantage (for us) of MCMC
We can over-parameterise
Some MCMC samplers (e.g Gibbs) are more
efficient
Run faster & mix better

The advantage (for us) of MCMC

With some imagination, we can
design priors that will work for us

Variable Selection

Which of the X's should be in
the model?

alternatively

Which of the β's should be zero?

Choosing X's

rjMCMC

General method for moving
between models with
different number of
dimensions

Setting βs to 0
Easier to implement
But can be slower
Stays in large dimensions

Slab and Spike Priors

Spike
Slab

Slab and Spike Posteriors

Bimodal

Several ways of getting priors

Method I: Point Mass at 0

P (β)=(1− p)0+ p N (0, σ β )

Indicators

Ik – indicator that variable k is in the model

P(Ik=1) = p

θ ~ N(0,σβ2)

P(β) = (1-I) 0 + I θ

And integrate over P(I=1) by MCMC
Gibbs sampling should work nicely

A problem with Gibbs Sampling

P(β) = (1-I) 0 + I θ

When I = 0, θ only depends on its prior

So MCMC draws wide values of θ

Only rarely will it draw
“sensible” values

A Better Version: GVS

θ ~ N(0,σβ2(I)) Pseudo-prior

P(β) = I θ

Now if I=0, generate from a pseudo-
prior, tuned to propose sensible
values
i.e. select σβ2(0) to cover
likely values of the posterior

Another way

The spike can be around 0, not
exactly on it

SSVS: Mixture distributions
Stochastic Search Variable Selection

Mixture of normals Spike

Slab

SSVS
β ~ N(0, σβ2(I)) I ~ Bern(p)
σβ2 (1)<< σβ2(0) Spike

Slab

Adaptive Shrinkage
Make a continuous mixture of distributions

Marginalise over the continuous mixture

Jeffrey's Prior
β ~ N(0, σβ2)
log(σβ2) ~ Unif(-∞,∞)

Bayesian Lasso
β ~ N(0, σβ2)
σβ2 ~ Exp(µ)
so β ~ dExp(µ)

Normal Exponential Gamma
Integrate µ from Lasso over a Gamma
β ~ N(0, σβ2)
σβ2 ~ Exp(µ)
µ ~ Γ(λ,γ 2)
NEG Exponential

NEG & Lasso

GWAS too big for MCMC

Use quicker algorithms & only estimate
posterior modes

How do they compare?

Want good
separation

Good

Bad

Comparison

Laplace – awful. Shrinks
everything

GVS – works well (when
tuned), but slower

SSVS – works well

Jeffrey's – works very well

Fixed and Random Effects

Rather than fixing parameters, can
treat as a random effect to tune it

e.g. SSVS

β ~ N(0, σI2)
σ12 ~ Γ(), σ02 = c σ12 (c<<1)

Useful with many variables, can learn about scale of
response

Random Effects
Useful with many variables, can learn about scale of
response

Variables not in model get P(I=1|data) = P(I=1)

Random Effect

Some Extensions

These might be useful sometimes

Random Effect
Variances

Polynomials

Random Effect Variances

Simple 1 level model

y i =α 0 +α 1 ( g i )+εi

α 1 (k )∼ N (0, σ α )
2

ICC

Intra-class correlation

σ 2
α
ICC = 2 2
σ α +σ

Variable selection on the variance
“GVS”

σα=
{ 0 I =1
exp(χ 2) I =2
“Jeffreys Shrinkage”
3, 3
“SSVS” log (σ α )∼ U (−10 10 )

σα=
{
exp(χ 1 ) I =1
exp(χ 2) I =2

Pr(I=1)

Pr(I=1)

True ICC

Estimated ICCs
Estimated ICC

True ICC

Polynomials

K
y i =β0 + ∑ βk X +εi
k
i
k =1

Transform to orthogonal polynomials

(use poly() in R)

Polynomials

K
y i =β0 + ∑ βk X i (k )+ε i
k =1

o – order of polynomial: ο ∈{1, ..., O}

βk ~ N(0, σβ2) if k ≤ o, else βk = 0

Why bother?

We have splines already

Polynomials are usually too wiggly

But...

Bayesian approaches integrate
over the parameters

will this smooth out the wiggles?

A test
A Response
True curve: y∝x ½

100 points

A Covariate

The Fitted Curve
A Response

A Covariate

Deviations from true mean
Deviation

A Covariate

Polynomial Order

Posterior Probability

Order of Polynomial

A couple of comments

The methods are flexible: we
can try new, weird things

Integrating over the
uncertainty smooths things

Does any of this make sense?

Are we abusing our priors?

What subjectivist priors mean

Statement about our beliefs

What the priors are doing here

Tuning the model to give sparseness

Sometimes we can bridge the gap

Genetics

Look at lots of loci (1000s)

Only a few are linked to genes
that have an effect

Hence, most effects are close to
zero

A Subjective Prior for Gene Effects?

What about other cases?

e.g. a recent paper
in Science

Response: respiration

Predictors:
Climate (PCA)
Slope, latitude, longitude etc.
Number of shrub species

Most probably have some effect

(or are correlated with something that
does)

A prior?

But doesn't separate variables very well

If we want to do variable selection...
Should first think about priors

If our subjective prior doesn't
shrink properly, either don't
select variables, or admit to
yourself you're abusing your
priors

Thank you for not abusing me
blogs.nature.com/boboh/2012/07/16/abusing_a_prior

Bayesian Variable Selection Using Priors

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Bayesian Variable Selection Using Priors

Semelhante a Bayesian Variable Selection Using Priors (20)

Mais de Bob O'Hara

Mais de Bob O'Hara (18)

Último

Último (20)

Bayesian Variable Selection Using Priors