Inferring networks from multiple samples with consensus LASSO
1. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Joint network inference with the consensual
LASSO
Nathalie Villa-Vialaneix
Joint work with Matthieu Vignes, Nathalie Viguerie
and Magali San Cristobal
Séminaire de Statistique Parisien
Paris, 15 septembre 2014
http://www.nathalievilla.org
nathalie.villa@toulouse.inra.fr
Nathalie Villa-Vialaneix | Consensus Lasso 1/36
2. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Outline
1 Short overview on biological background
2 Network inference and GGM
3 Inference with multiple samples
4 Simulations
Nathalie Villa-Vialaneix | Consensus Lasso 2/36
3. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
DNA
DNA (DeoxyriboNucleic Acid (DNA)
molecule that encodes the genetic instructions used in the development
and functioning of all known living organisms and many viruses
double helix made with only four nucleotides (Adenine, Cytosine,
Thymine, Guanine): A binds with T and C with G.
Nathalie Villa-Vialaneix | Consensus Lasso 3/36
4. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Transcription
Transcription of DNA
transcription is the process by which a particular segment of DNA (called
gene) is copied into a single-strand RNA (which is called message RNA)
A is transcripted into U, T into A, C
into G and G into C
Nathalie Villa-Vialaneix | Consensus Lasso 4/36
5. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
What is mRNA used for?
mRNA then moves out of the cell nucleus and is translated into proteins
which are made of 20 different amino-acids (using an alphabet: 3 letters
of mRNA ! 1 amino-acid)
Proteins are used by the cell to function.
Nathalie Villa-Vialaneix | Consensus Lasso 5/36
6. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Gene expression
In a given cell, at a given time, all the genes are not translated or not
translated at the same level.
Gene expression
is a process by which a given gene is transcripted or/and traducted
It depends on the type of cell, the environment, ...
Nathalie Villa-Vialaneix | Consensus Lasso 6/36
7. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Gene expression
In a given cell, at a given time, all the genes are not translated or not
translated at the same level.
Gene expression
is a process by which a given gene is transcripted or/and traducted
It depends on the type of cell, the environment, ...
Gene expression can be measured either by:
“counting” the number of copies (mRNA) of a given gene in the cell:
transcriptomic data;
“counting” the quantity of a given protein in the cell: proteomic data.
Even though a given mRNA is translated into a unique protein, there is
no simple relationship between transcriptomic and proteomic data.
Nathalie Villa-Vialaneix | Consensus Lasso 6/36
8. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Transcriptomic data
How are transcriptomic data obtained?
DNA spots associated to target
genes (probes) are attached to a
solid surface (array)
Nathalie Villa-Vialaneix | Consensus Lasso 7/36
9. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Transcriptomic data
How are transcriptomic data obtained?
RNA material extracting for cells of
interest (blood, lipid tissue, muscle,
urine...) are labeled fluorescently
and applied on the array
When the binding between mRNA
and the probes is good, the spot
becomes fluorescent.
Expression of a given gene is quantified by the intensity of the
fluorescent signal (read by a scanner).
Nathalie Villa-Vialaneix | Consensus Lasso 7/36
10. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Typical microarray data
Data: large scale gene expression data
individuals
n ' 30=50
8>>><>>>:
X =
0BBBBBBBB@
: : : : : :
: : Xj
i : : :
: : : : : :
1CCCCCCCCA
| {z }
variables (genes expression); p'103=4
Nathalie Villa-Vialaneix | Consensus Lasso 8/36
11. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Typical microarray data
Data: large scale gene expression data
individuals
n ' 30=50
8>>><>>>:
X =
0BBBBBBBB@
: : : : : :
: : Xj
i : : :
: : : : : :
1CCCCCCCCA
| {z }
variables (genes expression); p'103=4
Typical design: two (or more) conditions (treated/control for instance)
More and more complicated designs: crossed conditions (treated/control
and different breeds), longitudinal data...
Nathalie Villa-Vialaneix | Consensus Lasso 8/36
12. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Typical microarray data
Data: large scale gene expression data
individuals
n ' 30=50
8>>><>>>:
X =
0BBBBBBBB@
: : : : : :
: : Xj
i : : :
: : : : : :
1CCCCCCCCA
| {z }
variables (genes expression); p'103=4
Typical design: two (or more) conditions (treated/control for instance)
More and more complicated designs: crossed conditions (treated/control
and different breeds), longitudinal data...
Typical issues: find differentially expressed genes (i.e., genes whose
expression is significantly different between the conditions)
Nathalie Villa-Vialaneix | Consensus Lasso 8/36
13. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Outline
1 Short overview on biological background
2 Network inference and GGM
3 Inference with multiple samples
4 Simulations
Nathalie Villa-Vialaneix | Consensus Lasso 9/36
14. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Systems biology
Instead of being used to produce proteins, some genes’ expressions
activate or repress other genes’ expressions ) understanding the whole
cascade helps to comprehend the global functioning of living organisms1
1Picture taken from: Abdollahi A et al., PNAS 2007, 104:12890-12895. c
2007 by
National Academy of Sciences
Nathalie Villa-Vialaneix | Consensus Lasso 10/36
15. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Model framework
Data: large scale gene expression data
individuals
n ' 30=50
8>>><>>>:X =
0BBBBBBBB@
: : : : : :
: : Xj
i : : :
: : : : : :
1CCCCCCCCA
| {z }
variables (genes expression); p'103=4
What we want to obtain: a graph/network with
nodes: (selected) genes;
edges: strong links between gene expressions.
Nathalie Villa-Vialaneix | Consensus Lasso 11/36
16. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Advantages of network inference
1 over raw data: focuses on the strongest direct relationships:
irrelevant or indirect relations are removed (more robust) and the
data are easier to visualize and understand (track transcription
relations).
Nathalie Villa-Vialaneix | Consensus Lasso 12/36
17. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Advantages of network inference
1 over raw data: focuses on the strongest direct relationships:
irrelevant or indirect relations are removed (more robust) and the
data are easier to visualize and understand (track transcription
relations).
Expression data are analyzed all together and not by pairs (systems
model).
Nathalie Villa-Vialaneix | Consensus Lasso 12/36
18. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Advantages of network inference
1 over raw data: focuses on the strongest direct relationships:
irrelevant or indirect relations are removed (more robust) and the
data are easier to visualize and understand (track transcription
relations).
Expression data are analyzed all together and not by pairs (systems
model).
2 over bibliographic network: can handle interactions with yet
unknown (not annotated) genes and deal with data collected in a
particular condition.
Nathalie Villa-Vialaneix | Consensus Lasso 12/36
19. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Using correlations
Relevance network [Butte and Kohane, 1999]
First (naive) approach: calculate correlations between expressions for all
pairs of genes, threshold the smallest ones and build the network.
Correlations Thresholding Graph
Nathalie Villa-Vialaneix | Consensus Lasso 13/36
20. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Using partial correlations
x
y z
strong indirect correlation
Nathalie Villa-Vialaneix | Consensus Lasso 14/36
21. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Using partial correlations
x
y z
strong indirect correlation
set.seed(2807); x <- rnorm(100)
y <- 2*x+1+rnorm(100,0,0.1); cor(x,y) [1] 0.998826
z <- 2*x+1+rnorm(100,0,0.1); cor(x,z) [1] 0.998751
cor(y,z) [1] 0.9971105
Nathalie Villa-Vialaneix | Consensus Lasso 14/36
22. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Using partial correlations
x
y z
strong indirect correlation
set.seed(2807); x <- rnorm(100)
y <- 2*x+1+rnorm(100,0,0.1); cor(x,y) [1] 0.998826
z <- 2*x+1+rnorm(100,0,0.1); cor(x,z) [1] 0.998751
cor(y,z) [1] 0.9971105
] Partial correlation
cor(lm(xz)$residuals,lm(yz)$residuals) [1] 0.7801174
cor(lm(xy)$residuals,lm(zy)$residuals) [1] 0.7639094
cor(lm(yx)$residuals,lm(zx)$residuals) [1] -0.1933699
Nathalie Villa-Vialaneix | Consensus Lasso 14/36
23. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Partial correlation and GGM
(Xi)i=1;:::;n are i.i.d. Gaussian random variables N(0; ) (gene
expression); then
j ! j0(genes j and j0 are linked) , Cor
Xj ; Xj0
j(Xk )k,j;j0
0
Nathalie Villa-Vialaneix | Consensus Lasso 15/36
24. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Partial correlation and GGM
(Xi)i=1;:::;n are i.i.d. Gaussian random variables N(0; ) (gene
expression); then
j ! j0(genes j and j0 are linked) , Cor
Xj ; Xj0
j(Xk )k,j;j0
0
If (concentration matrix) S = 1,
Cor
Xj ; Xj0
j(Xk )k,j;j0
=
Sjj0 p
SjjSj0 j0
) Estimate 1 to unravel the graph structure
Nathalie Villa-Vialaneix | Consensus Lasso 15/36
25. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Partial correlation and GGM
(Xi)i=1;:::;n are i.i.d. Gaussian random variables N(0; ) (gene
expression); then
j ! j0(genes j and j0 are linked) , Cor
Xj ; Xj0
j(Xk )k,j;j0
0
If (concentration matrix) S = 1,
Cor
Xj ; Xj0
j(Xk )k,j;j0
=
Sjj0 p
SjjSj0 j0
) Estimate 1 to unravel the graph structure
Problem: : p-dimensional matrix and n p ) (b
n)1 is a poor
estimate of S)!
Nathalie Villa-Vialaneix | Consensus Lasso 15/36
26. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Various approaches for inferring
networks with GGM
Graphical Gaussian Model
seminal work:
[Schäfer and Strimmer, 2005a, Schäfer and Strimmer, 2005b]
(with shrinkage and a proposal for a Bayesian test of significance)
estimate 1 by (b
n + I)1
use a Bayesian test to test which coefficients are significantly non zero.
Nathalie Villa-Vialaneix | Consensus Lasso 16/36
27. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Various approaches for inferring
networks with GGM
Graphical Gaussian Model
seminal work:
[Schäfer and Strimmer, 2005a, Schäfer and Strimmer, 2005b]
(with shrinkage and a proposal for a Bayesian test of significance)
estimate 1 by (b
n + I)1
use a Bayesian test to test which coefficients are significantly non zero.
relation with LM
8 j, estimate the linear model:
Xj =
31. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Various approaches for inferring
networks with GGM
Graphical Gaussian Model
seminal work:
[Schäfer and Strimmer, 2005a, Schäfer and Strimmer, 2005b]
(with shrinkage and a proposal for a Bayesian test of significance)
estimate 1 by (b
n + I)1
use a Bayesian test to test which coefficients are significantly non zero.
relation with LM
8 j, estimate the linear model:
Xj =
36. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Various approaches for inferring
networks with GGM
Graphical Gaussian Model
seminal work:
[Schäfer and Strimmer, 2005a, Schäfer and Strimmer, 2005b]
(with shrinkage and a proposal for a Bayesian test of significance)
estimate 1 by (b
n + I)1
use a Bayesian test to test which coefficients are significantly non zero.
sparse approaches:
[Meinshausen and Bühlmann, 2006, Friedman et al., 2008]:
8 j, estimate the linear model:
Xj =
43. jj0 = 0 for most j0 (variable selection)
Nathalie Villa-Vialaneix | Consensus Lasso 16/36
44. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Outline
1 Short overview on biological background
2 Network inference and GGM
3 Inference with multiple samples
4 Simulations
Nathalie Villa-Vialaneix | Consensus Lasso 17/36
45. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Motivation for multiple networks
inference
Pan-European project Diogenes2 (with Nathalie Viguerie, INSERM):
gene expressions (lipid tissues) from 204 obese women before and after
a low-calorie diet (LCD).
Assumption: A common
functioning exists
regardless the
condition;
Which genes are linked
independently
from/depending on the
condition?
2http://www.diogenes-eu.org/; see also [Viguerie et al., 2012]
Nathalie Villa-Vialaneix | Consensus Lasso 18/36
46. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Naive approach: independent
estimations
Notations: p genes measured in k samples, each corresponding to a
specific condition: (Xc
j )j=1;:::;p N(0;c), for c = 1; : : : ; k.
For c = 1; : : : ; k, nc independent observations (Xc
P ij )i=1;:::;nc and
c nc = n.
Independent inference
Estimation 8 c = 1; : : : ; k and 8 j = 1; : : : ; p,
Xc
j = Xc
nj
47. cj
+ c
j
are estimated (independently) by maximizing pseudo-likelihood:
L(SjX) =
Xk
c=1
Xp
j=1
Xnc
i=1
log P
Xc
ij jXci
;nj ; Sc
j
Nathalie Villa-Vialaneix | Consensus Lasso 19/36
48. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Related papers
Problem: previous estimation does not use the fact that the different
networks should be somehow alike!
Previous proposals
[Chiquet et al., 2011] replace c bye
c = 12
c + 12
and add a
sparse penalty;
Nathalie Villa-Vialaneix | Consensus Lasso 20/36
49. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Related papers
Problem: previous estimation does not use the fact that the different
networks should be somehow alike!
Previous proposals
[Chiquet et al., 2011] replace c bye
c = 12
c + 12
and add a
sparse penalty;
[Chiquet et al., 2011] LASSO and Group-LASSO type penalties to
force identical or sign-coherent edges between conditions:
X
jj0
sX
c
(Sc
jj0 )2 or
X
jj0
266666664
sX
c
(Sc
jj0 )2
+ +
sX
c
(Sc
jj0 )2
377777775
) Sc
jj0 = 0 8 c for most entries OR Sc
jj0 can only be of a given sign
(positive or negative) whatever c
Nathalie Villa-Vialaneix | Consensus Lasso 20/36
50. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Related papers
Problem: previous estimation does not use the fact that the different
networks should be somehow alike!
Previous proposals
[Chiquet et al., 2011] replace c bye
c = 12
c + 12
and add a
sparse penalty;
[Chiquet et al., 2011] LASSO and Group-LASSO type penalties to
force identical or sign-coherent edges between conditions
[Danaher et al., 2013] add the penalty
P
c,c0 kSc Sc0
kL1 ) (sparse
penalty over the concentration matrix entries forces to strong
consistency between conditions);
Nathalie Villa-Vialaneix | Consensus Lasso 20/36
51. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Related papers
Problem: previous estimation does not use the fact that the different
networks should be somehow alike!
Previous proposals
[Chiquet et al., 2011] replace c bye
c = 12
c + 12
and add a
sparse penalty;
[Chiquet et al., 2011] LASSO and Group-LASSO type penalties to
force identical or sign-coherent edges between conditions
[Danaher et al., 2013] add the penalty
P
c,c0 kSc Sc0
kL1 ) (sparse
penalty over the concentration matrix entries forces to strong
consistency between conditions);
[PMohan P
et al., 2012] add a group-LASSO like penalty
c,c0
j kSc
j Sc0
j kL2 that focuses on differences due to a few
number of nodes only.
Nathalie Villa-Vialaneix | Consensus Lasso 20/36
52. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Consensus LASSO
Proposal
Infer multiple networks by forcing them toward a consensual network: i.e.,
explicitly keeping the differences between conditions under control but
with a L2 penalty (allow for more differences than Group-LASSO type
penalties).
Original optimization:
max
(
54. c
jk j
:
Nathalie Villa-Vialaneix | Consensus Lasso 21/36
55. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Consensus LASSO
Proposal
Infer multiple networks by forcing them toward a consensual network: i.e.,
explicitly keeping the differences between conditions under control but
with a L2 penalty (allow for more differences than Group-LASSO type
penalties).
Original optimization:
max
(
65. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Consensus LASSO
Proposal
Infer multiple networks by forcing them toward a consensual network: i.e.,
explicitly keeping the differences between conditions under control but
with a L2 penalty (allow for more differences than Group-LASSO type
penalties).
Add a constraint to force inference toward a “consensus”
74. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Choice of a consensus: set one...
Typical case:
a prior network is known (e.g., from bibliography);
with no prior information, use a fixed prior corresponding to (e.g.)
global inference
) given (and fixed)
76. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Choice of a consensus: set one...
Typical case:
a prior network is known (e.g., from bibliography);
with no prior information, use a fixed prior corresponding to (e.g.)
global inference
) given (and fixed)
87. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Choice of a consensus: adapt one
during training...
Derive the consensus from the condition-specific estimates:
90. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Choice of a consensus: adapt one
during training...
Derive the consensus from the condition-specific estimates:
98. jkL1
where Sj() =b
njnj + 2AT ()A() where A() is a
[k(p 1) k(p 1)]-matrix that does not depend on j.
Nathalie Villa-Vialaneix | Consensus Lasso 23/36
99. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Computational aspects: optimization
2j
Tj
1j
Tj
12
Common framework
Objective function can be decomposed into:
convex part C(
105. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Computational aspects: optimization
2j
Tj
1j
Tj
12
Common framework
Objective function can be decomposed into:
convex part C(
116. j + h
4: until
Nathalie Villa-Vialaneix | Consensus Lasso 24/36
117. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Computational aspects: optimization
2j
Tj
1j
Tj
12
Common framework
Objective function can be decomposed into:
convex part C(
137. j )]j0 2 [1; 1] if j0 A
4: until
Nathalie Villa-Vialaneix | Consensus Lasso 24/36
138. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Computational aspects: optimization
2j
Tj
1j
Tj
12
Common framework
Objective function can be decomposed into:
convex part C(
167. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Computational aspects: optimization
2j
Tj
1j
Tj
12
Common framework
Objective function can be decomposed into:
convex part C(
197. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Bootstrap estimation ' BOLASSO
subsample n observations with replacement
x x
x
x
x
x
x
x
Nathalie Villa-Vialaneix | Consensus Lasso 25/36
198. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Bootstrap estimation ' BOLASSO
subsample n observations with replacement
x x
x
x
x
x
x
x
cLasso estimation
! for varying , (
200. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Bootstrap estimation ' BOLASSO
subsample n observations with replacement
x x
x
x
x
x
x
x
cLasso estimation
! for varying , (
201. ;b
jj0 )jj0
# threshold
keep the first T1 largest coefficients
Nathalie Villa-Vialaneix | Consensus Lasso 25/36
202. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Bootstrap estimation ' BOLASSO
subsample n observations with replacement
x x
x
x
x
x
x
x
cLasso estimation
! for varying , (
203. ;b
jj0 )jj0
# threshold
keep the first T1 largest coefficients
B
Frequency table
(1; 2) (1; 3) ... (j; j0) ...
130 25 ... 120 ...
Nathalie Villa-Vialaneix | Consensus Lasso 25/36
204. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Bootstrap estimation ' BOLASSO
subsample n observations with replacement
x x
x
x
x
x
x
x
cLasso estimation
! for varying , (
205. ;b
jj0 )jj0
# threshold
keep the first T1 largest coefficients
B
Frequency table
(1; 2) (1; 3) ... (j; j0) ...
130 25 ... 120 ...
! Keep the T2 most frequent pairs
Nathalie Villa-Vialaneix | Consensus Lasso 25/36
206. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Outline
1 Short overview on biological background
2 Network inference and GGM
3 Inference with multiple samples
4 Simulations
Nathalie Villa-Vialaneix | Consensus Lasso 26/36
207. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Simulated data
Expression data with known co-expression network
original network (scale free) taken from
http://www.comp-sys-bio.org/AGN/data.html (100 nodes,
200 edges, loops removed);
rewire a ratio r of the edges to generate k “children” networks
(sharing approximately 100(1 2r)% of their edges);
Nathalie Villa-Vialaneix | Consensus Lasso 27/36
208. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Simulated data
Expression data with known co-expression network
original network (scale free) taken from
http://www.comp-sys-bio.org/AGN/data.html (100 nodes,
200 edges, loops removed);
rewire a ratio r of the edges to generate k “children” networks
(sharing approximately 100(1 2r)% of their edges);
generate “expression data” with a random Gaussian process from
each chid:
use the Laplacian of the graph to generate a putative concentration
matrix;
use edge colors in the original network to set the edge sign;
correct the obtained matrix to make it positive;
invert to obtain a covariance matrix...;
... which is used in a random Gaussian process to generate
expression data (with noise).
Nathalie Villa-Vialaneix | Consensus Lasso 27/36
209. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
An example with k = 2, r = 5%
mother network3 first child second child
3actually the parent network. My co-author wisely noted that the mistake was
unforgivable for a feminist...
Nathalie Villa-Vialaneix | Consensus Lasso 28/36
210. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Choice for T2
Data: r = 0:05, k = 2 and n1 = n2 = 20
100 bootstrap samples, = 1, T1 = 250 or 500
l l 30 selections at least
30 selections at least
1.00
0.75
0.50
0.25
0.00
0.00 0.25 0.50 0.75
precision
recall
43 selections at least
l
l
40 selections at least
1.00
0.75
0.50
0.25
0.00
0.00 0.25 0.50 0.75 1.00
precision
recall
Dots correspond to best F = 2
precisionrecall
precision+recall
) Best F corresponds to selecting a number of edges approximately
equal to the number of edges in the original network.
Nathalie Villa-Vialaneix | Consensus Lasso 29/36
212. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Comparison with other approaches
Method compared (direct and bootstrap approaches)
independant Graphical LASSO estimation gLasso
methods implementated in the R package simone and described in
[Chiquet et al., 2011]: intertwinned LASSO iLasso, cooperative
LASSO coopLasso and group LASSO groupLasso
fused graphical LASSO as described in [Danaher et al., 2013] as
implemented in the R package fgLasso
Nathalie Villa-Vialaneix | Consensus Lasso 31/36
213. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Comparison with other approaches
Method compared (direct and bootstrap approaches)
independant Graphical LASSO estimation gLasso
methods implementated in the R package simone and described in
[Chiquet et al., 2011]: intertwinned LASSO iLasso, cooperative
LASSO coopLasso and group LASSO groupLasso
fused graphical LASSO as described in [Danaher et al., 2013] as
implemented in the R package fgLasso
consensus Lasso with
fixed prior (the mother network) cLasso(p)
fixed prior (average over the conditions of independant estimations)
cLasso(2)
adaptative estimation of the prior cLasso(m)
Nathalie Villa-Vialaneix | Consensus Lasso 31/36
214. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Comparison with other approaches
Method compared (direct and bootstrap approaches)
independant Graphical LASSO estimation gLasso
methods implementated in the R package simone and described in
[Chiquet et al., 2011]: intertwinned LASSO iLasso, cooperative
LASSO coopLasso and group LASSO groupLasso
fused graphical LASSO as described in [Danaher et al., 2013] as
implemented in the R package fgLasso
consensus Lasso with
fixed prior (the mother network) cLasso(p)
fixed prior (average over the conditions of independant estimations)
cLasso(2)
adaptative estimation of the prior cLasso(m)
Parameters set to: T1 = 500, B = 100, = 1
Nathalie Villa-Vialaneix | Consensus Lasso 31/36
215. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Selected results (best F)
rewired edges: 5% - conditions: 2 - sample size: 2 30
direct version
Method gLasso iLasso groupLasso coopLasso
0.28 0.35 0.32 0.35
Method fgLasso cLasso(m) cLasso(p) cLasso(2)
0.32 0.31 0.86 0.30
bootstrap version
Method gLasso iLasso groupLasso coopLasso
0.31 0.34 0.36 0.34
Method fgLasso cLasso(m) cLasso(p) cLasso(2)
0.36 0.37 0.86 0.35
Conclusions
bootstraping improves performance (except for iLasso and for large
r)
joint inference improves performance
using a good prior is (as expected) very efficient
adaptive approach for cLasso is better than naive approach
Nathalie Villa-Vialaneix | Consensus Lasso 32/36
216. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Real data
204 obese women ; expression of 221 genes before and after a LCD
= 1 ; T1 = 1000 (target density: 4%)
Distribution of the number of times an edge is selected over 100
bootstrap samples
1500
1000
500
0
0 25 50 75 100
Counts
Frequency
(70% of the pairs of nodes are never selected) ) T2 = 80
Nathalie Villa-Vialaneix | Consensus Lasso 33/36
217. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Networks
Before diet
l
l
l
l l
l
l
FADS2
l
l
l
l
l
l
l l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
AZGP1
CIDEA
FADS1
GPD1L
PCK2
After diet
l
l
l
l l
l
l
FADS2
l
l
l
l
l
l
l l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
AZGP1
CIDEA
FADS1
GPD1L
PCK2
densities about 1:3% - some interactions (both shared and specific)
make sense to the biologist
Nathalie Villa-Vialaneix | Consensus Lasso 34/36
218. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Thank you for your attention...
Programs available in the R package therese (on R-Forge)4. Joint work
with
Magali SanCristobal Matthieu Vignes
(GenPhySe, INRA Toulouse) (Massey University, NZ)
Nathalie Viguerie
(I2MC, INSERM Toulouse)
4https://r-forge.r-project.org/projects/therese-pkg
Nathalie Villa-Vialaneix | Consensus Lasso 35/36
219. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
Questions?
Nathalie Villa-Vialaneix | Consensus Lasso 36/36
220. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
References
Ambroise, C., Chiquet, J., and Matias, C. (2009).
Inferring sparse Gaussian graphical models with latent structure.
Electronic Journal of Statistics, 3:205–238.
Butte, A. and Kohane, I. (1999).
Unsupervised knowledge discovery in medical databases using relevance networks.
In Proceedings of the AMIA Symposium, pages 711–715.
Chiquet, J., Grandvalet, Y., and Ambroise, C. (2011).
Inferring multiple graphical structures.
Statistics and Computing, 21(4):537–553.
Danaher, P., Wang, P., and Witten, D. (2013).
The joint graphical lasso for inverse covariance estimation accross multiple classes.
Journal of the Royal Statistical Society Series B.
Forthcoming.
Friedman, J., Hastie, T., and Tibshirani, R. (2008).
Sparse inverse covariance estimation with the graphical lasso.
Biostatistics, 9(3):432–441.
Meinshausen, N. and Bühlmann, P. (2006).
High dimensional graphs and variable selection with the lasso.
Annals of Statistic, 34(3):1436–1462.
Mohan, K., Chung, J., Han, S., Witten, D., Lee, S., and Fazel, M. (2012).
Structured learning of Gaussian graphical models.
In Proceedings of NIPS (Neural Information Processing Systems) 2012, Lake Tahoe, Nevada, USA.
Osborne, M., Presnell, B., and Turlach, B. (2000).
Nathalie Villa-Vialaneix | Consensus Lasso 36/36
221. Short overview on biological background Network inference and GGM Inference with multiple samples Simulations
On the LASSO and its dual.
Journal of Computational and Graphical Statistics, 9(2):319–337.
Schäfer, J. and Strimmer, K. (2005a).
An empirical bayes approach to inferring large-scale gene association networks.
Bioinformatics, 21(6):754–764.
Schäfer, J. and Strimmer, K. (2005b).
A shrinkage approach to large-scale covariance matrix estimation and implication for functional genomics.
Statistical Applications in Genetics and Molecular Biology, 4:1–32.
Viguerie, N., Montastier, E., Maoret, J., Roussel, B., Combes, M., Valle, C., Villa-Vialaneix, N., Iacovoni, J., Martinez, J., Holst, C., Astrup,
A., Vidal, H., Clément, K., Hager, J., Saris, W., and Langin, D. (2012).
Determinants of human adipose tissue gene expression: impact of diet, sex, metabolic status and cis genetic regulation.
PLoS Genetics, 8(9):e1002959.
Nathalie Villa-Vialaneix | Consensus Lasso 36/36