The document outlines a presentation on regression analysis using the LASSO (Least Absolute Shrinkage and Selection Operator) method. It includes an introduction to the topic, definitions of key terms like OLS (ordinary least squares) estimates, and descriptions of standard techniques like subset selection and ridge regression. The bulk of the presentation covers LASSO specifically - its definition, motivation, behavior in certain cases, examples of its use, and algorithms for finding LASSO solutions. It concludes with a discussion of simulations. The presenter's goal is to explain the LASSO method for regression shrinkage and variable selection.
General Principles of Intellectual Property: Concepts of Intellectual Proper...
Reading the Lasso 1996 paper by Robert Tibshirani
1. READING SEMINAR ON CLASSICS
Regression Shrinkage and Selection via the LASSO
By Robert Tibshirani
Presented by Ulcinaite Agne
November 4, 2012
Presented by Ulcinaite Agne LASSO November 4, 2012 1 / 41
2. Outline
1 Introduction
Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
3. Outline
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
4. Outline
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
5. Outline
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
6. Outline
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
7. Outline
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
8. Table of Contents
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 3 / 41
9. Introduction
The Article
Regression Shrinkage and Selection via the LASSO by Robert
Tibshirani
Published in 1996 for the Royal Statistical Society. Series B
(Methodological), vol. 58, No.1
Presented by Ulcinaite Agne LASSO November 4, 2012 4 / 41
10. Table of Contents
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 5 / 41
11. OLS estimates
We consider the usual regression situation.
The data: ( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )T and yi
are the regressors and the response for the ith observation.
The ordinary least square (OLS) estimates minimize the residual sum of
squares (RSS):
N p
RSS = (yi − βo − xij βj )2
i=1 j=1
Presented by Ulcinaite Agne LASSO November 4, 2012 6 / 41
12. Table of Contents
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 7 / 41
13. OLS critics
The two reasons why data analysts are often not satisfied with OLS
estimates:
Prediction accuracy: OLS estimates having low bias but large variance
Presented by Ulcinaite Agne LASSO November 4, 2012 8 / 41
14. OLS critics
The two reasons why data analysts are often not satisfied with OLS
estimates:
Prediction accuracy: OLS estimates having low bias but large variance
Iterpretation: when having too much predictors, it would be better to
have smaller subset exhibiting stronger effects
Presented by Ulcinaite Agne LASSO November 4, 2012 8 / 41
15. Table of Contents
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 9 / 41
16. Standard improving techniques
Subset selection: small changes in data can result in very different
models
Presented by Ulcinaite Agne LASSO November 4, 2012 10 / 41
17. Standard improving techniques
Subset selection: small changes in data can result in very different
models
Ridge regression:
N
ˆ
β ridge = argmin (yi − β0 − βj xij )2
i=1 j
subject to
βj2 ≤ t
j
Does not set any of the coefficients to 0 and hence does not give an
easily interpretable model
Presented by Ulcinaite Agne LASSO November 4, 2012 10 / 41
18. Table of Contents
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 11 / 41
19. Table of Contents
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 12 / 41
20. Definition
We are considering the same data as in OLS estimation case:
( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )T
Presented by Ulcinaite Agne LASSO November 4, 2012 13 / 41
21. Definition
We are considering the same data as in OLS estimation case:
( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )T
The LASSO (Least Absolute Shrinkage and Selection Operator) estimate
α ˆ
(ˆ , β) is defined by
N
α ˆ
(ˆ , β) = argmin (yi − α − βj xij )2
i=1 j
Presented by Ulcinaite Agne LASSO November 4, 2012 13 / 41
22. Definition
We are considering the same data as in OLS estimation case:
( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )T
The LASSO (Least Absolute Shrinkage and Selection Operator) estimate
α ˆ
(ˆ , β) is defined by
N
α ˆ
(ˆ , β) = argmin (yi − α − βj xij )2
i=1 j
subject to
|βj | ≤ t
j
Presented by Ulcinaite Agne LASSO November 4, 2012 13 / 41
23. Definition
The amount of shrinkage is controlled by parameter t ≥ 0 which is applied
to the estimates.
Presented by Ulcinaite Agne LASSO November 4, 2012 14 / 41
24. Definition
The amount of shrinkage is controlled by parameter t ≥ 0 which is applied
to the estimates.
ˆ ˆ
Let βjo be the full least square estimates and let t0 = |βjo |.
Values t < t0 will shrink the solutions towards 0, some coefficients making
equal to 0.
Presented by Ulcinaite Agne LASSO November 4, 2012 14 / 41
25. Definition
The amount of shrinkage is controlled by parameter t ≥ 0 which is applied
to the estimates.
ˆ ˆ
Let βjo be the full least square estimates and let t0 = |βjo |.
Values t < t0 will shrink the solutions towards 0, some coefficients making
equal to 0.
For example, taking t = t0 /2, we will have the effect roughly similar to
finding the best subset of size p/2.
Presented by Ulcinaite Agne LASSO November 4, 2012 14 / 41
26. Table of Contents
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 15 / 41
27. Motivation for LASSO
LASSO came from the proposal of Breiman (1993).
Breiman’s non-negative garotte minimizes
N
(yi − α − cj βjo xij )2
i=1 j
subject to
cj ≥ 0, cj ≤ t.
Presented by Ulcinaite Agne LASSO November 4, 2012 16 / 41
28. Table of Contents
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 17 / 41
29. Orthonormal design case
Let X the n × p design matrix with ijth entry xij and XT X = I.
The solution of previous minimization problem is
ˆ ˆ ˆ
βj = sign(βjo )(|βjo | − γ)+
Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
30. Orthonormal design case
Let X the n × p design matrix with ijth entry xij and XT X = I.
The solution of previous minimization problem is
ˆ ˆ ˆ
βj = sign(βjo )(|βjo | − γ)+
Best subset selection (of size k)
Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
31. Orthonormal design case
Let X the n × p design matrix with ijth entry xij and XT X = I.
The solution of previous minimization problem is
ˆ ˆ ˆ
βj = sign(βjo )(|βjo | − γ)+
Best subset selection (of size k)
Ridge regression solutions: 1 β oˆ
1+γ j
Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
32. Orthonormal design case
Let X the n × p design matrix with ijth entry xij and XT X = I.
The solution of previous minimization problem is
ˆ ˆ ˆ
βj = sign(βjo )(|βjo | − γ)+
Best subset selection (of size k)
Ridge regression solutions: 1 β oˆ
1+γ j
Garotte estimates: (1 − ˆ ˆ
γ/βjo2 )+ βjo
Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
33. Table of Contents
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 19 / 41
34. Function forms
(a) Subset regression, (b) ridge regression, (c) the LASSO, (d) the garrotte
Presented by Ulcinaite Agne LASSO November 4, 2012 20 / 41
35. Estimation picture for (a) the LASSO and (b) ridge regression
Presented by Ulcinaite Agne LASSO November 4, 2012 21 / 41
36. Table of Contents
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 22 / 41
37. Example of prostate cancer
Data examined: from a study by
Stamey (1989)
The factors:
log(cancer volume) lcavol
log(prostate weigth) lweigth
age
log(benign prostatic hyperplasia
amount) lbph
seminal vesicle invasion svi
log(capsular penetration) lcp
Gleason score gleason
percentage Gleason scores pgg45
Presented by Ulcinaite Agne LASSO November 4, 2012 23 / 41
38. Example of prostate cancer
Linear model to log(prostate specific
Data examined: from a study by
antigen) lpsa
Stamey (1989)
The factors:
log(cancer volume) lcavol
log(prostate weigth) lweigth
age
log(benign prostatic hyperplasia
amount) lbph
seminal vesicle invasion svi
log(capsular penetration) lcp
Gleason score gleason
percentage Gleason scores pgg45
Presented by Ulcinaite Agne LASSO November 4, 2012 23 / 41
39. Statistics of the example
Estimated coefficients and test error results, for different subset and
shrinkage methods applied to the prostate data. The blank entries
correspond to variables omitted.
Presented by Ulcinaite Agne LASSO November 4, 2012 24 / 41
40. Table of Contents
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 25 / 41
41. Prediction error and estimation of t
Methods for the estimation of the LASSO parameter t:
Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
42. Prediction error and estimation of t
Methods for the estimation of the LASSO parameter t:
Cross-validation
Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
43. Prediction error and estimation of t
Methods for the estimation of the LASSO parameter t:
Cross-validation
Generalized cross-validation
Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
44. Prediction error and estimation of t
Methods for the estimation of the LASSO parameter t:
Cross-validation
Generalized cross-validation
Analytical unbiased estimate of risk
Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
45. Prediction error and estimation of t
Methods for the estimation of the LASSO parameter t:
Cross-validation
Generalized cross-validation
Analytical unbiased estimate of risk
Strictly speaking the first two methods are applicable in the ’X-random’
case, and the third method applies to the X-fixed case.
Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
46. Prediction error and estimation of t
Suppose that
Y = η(X) + ε
where E (ε) = 0 and var (ε) = σ 2
Presented by Ulcinaite Agne LASSO November 4, 2012 27 / 41
47. Prediction error and estimation of t
Suppose that
Y = η(X) + ε
where E (ε) = 0 and var (ε) = σ 2
ME = E {ˆ(X) − η(X)}2
η
Presented by Ulcinaite Agne LASSO November 4, 2012 27 / 41
48. Prediction error and estimation of t
Suppose that
Y = η(X) + ε
where E (ε) = 0 and var (ε) = σ 2
ME = E {ˆ(X) − η(X)}2
η
PE = E {Y − η (X)}2 = ME + σ 2
ˆ
Presented by Ulcinaite Agne LASSO November 4, 2012 27 / 41
49. Cross-validation
The Prediction Error (PE) is estimated by fivefold cross-validation. The
ˆ
LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE
is estimated over a grid of values of s from 0 to 1 inclusive.
Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
50. Cross-validation
The Prediction Error (PE) is estimated by fivefold cross-validation. The
ˆ
LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE
is estimated over a grid of values of s from 0 to 1 inclusive.
Create a 5-fold partition of the dataset
Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
51. Cross-validation
The Prediction Error (PE) is estimated by fivefold cross-validation. The
ˆ
LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE
is estimated over a grid of values of s from 0 to 1 inclusive.
Create a 5-fold partition of the dataset
For each fold, all-but-one of the chunks are used for training and the
remaining chunk - for testing.
Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
52. Cross-validation
The Prediction Error (PE) is estimated by fivefold cross-validation. The
ˆ
LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE
is estimated over a grid of values of s from 0 to 1 inclusive.
Create a 5-fold partition of the dataset
For each fold, all-but-one of the chunks are used for training and the
remaining chunk - for testing.
Repeat 5 times so that each chunk is used once for testing.
Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
53. Cross-validation
The Prediction Error (PE) is estimated by fivefold cross-validation. The
ˆ
LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE
is estimated over a grid of values of s from 0 to 1 inclusive.
Create a 5-fold partition of the dataset
For each fold, all-but-one of the chunks are used for training and the
remaining chunk - for testing.
Repeat 5 times so that each chunk is used once for testing.
Value s yielding the lowest estimated PE is selected.
ˆ
Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
54. Generalized Cross-validation
The constrained is re-written as βj2 /|βj | ≤ t. So the constrained
˜
solution β can be expressed as the ridge regression estimator
β = (XT X + λW− )−1 XT y
˜
where W = diag (|βj |) and W− denotes a generalized inverse. The number
˜
of effective parameters in the constrained fit β may be approximated by
p(t) = tr X(XT X + λW− )−1 XT )
The generalised cross-validation style statistic
1 RSS(t)
GCV (t) =
N {1 − p(t)/N}2
Presented by Ulcinaite Agne LASSO November 4, 2012 29 / 41
55. Unbiased estimate of risk
This method is based on Stein’s (1981) unbiased estimate of risk.
√
ˆ
Denote the estimated standard error of βjo by τ = σ / N, where
ˆ ˆ
σ 2 = (yi − yi )2 /(N − p). Then the formula is derived
ˆ ˆ
p
ˆ ˆ τ
R β(γ) ≈ τ 2 p − 2#(j; |βjo /ˆ| < γ) +
ˆ ˆ τ
max(|βjo /ˆ|, γ)2
j=1
as an approximately unbiased estimate of the risk . Hence an estimate of
ˆ
γ can be obtained as the minimizer of R β(γ) :
ˆ
γ = argminγ≥0 [R β(γ) ].
ˆ
From this we obtain an estimate of the LASSO parameter t:
ˆ
t= ˆ
(|βjo | − γ )+ .
ˆ
Presented by Ulcinaite Agne LASSO November 4, 2012 30 / 41
56. Table of Contents
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 31 / 41
57. Algorithm for finding LASSO solutions
We fix t ≥ 0. The minimization problem of
N
(yi − βj xij )2
i=1 j
subject to j |βj | ≤ t can be seen as a least squares problem with 2p
inequality constraints.
Presented by Ulcinaite Agne LASSO November 4, 2012 32 / 41
58. Algorithm for finding LASSO solutions
We fix t ≥ 0. The minimization problem of
N
(yi − βj xij )2
i=1 j
subject to j |βj | ≤ t can be seen as a least squares problem with 2p
inequality constraints.
Denote G an m × p matrix, corresponding to m linear inequality
constraints of the p-vector β. For our problem, m = 2p .
Presented by Ulcinaite Agne LASSO November 4, 2012 32 / 41
59. Algorithm for finding LASSO solutions
We fix t ≥ 0. The minimization problem of
N
(yi − βj xij )2
i=1 j
subject to j |βj | ≤ t can be seen as a least squares problem with 2p
inequality constraints.
Denote G an m × p matrix, corresponding to m linear inequality
constraints of the p-vector β. For our problem, m = 2p .
Denote g (β) = N (yi − j βj xij )2 .
i=1
Set E is the equality set corresponding to those constraints which are
exactly met.
Presented by Ulcinaite Agne LASSO November 4, 2012 32 / 41
60. Algorithm for finding LASSO solutions
Outline of the algorithm
Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
61. Algorithm for finding LASSO solutions
Outline of the algorithm
1 ˆ
Start with E = {i0 } where δi0 = sign(β o )
Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
62. Algorithm for finding LASSO solutions
Outline of the algorithm
1 ˆ
Start with E = {i0 } where δi0 = sign(β o )
2 ˆ
Find β to minimize g (β) subject to GE β ≤ t1
Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
63. Algorithm for finding LASSO solutions
Outline of the algorithm
1 ˆ
Start with E = {i0 } where δi0 = sign(β o )
2 ˆ
Find β to minimize g (β) subject to GE β ≤ t1
3 While ˆ
|βj | > t ,
Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
64. Algorithm for finding LASSO solutions
Outline of the algorithm
1 ˆ
Start with E = {i0 } where δi0 = sign(β o )
2 ˆ
Find β to minimize g (β) subject to GE β ≤ t1
3 While ˆ
|βj | > t ,
4 ˆ ˆ
add i to the set E where δi = sign(β). Find β to minimize
N
g (β) = (yi − βj xij )2
i=1 j
subject to GE β ≤ t1.
Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
65. Algorithm for finding LASSO solutions
Outline of the algorithm
1 ˆ
Start with E = {i0 } where δi0 = sign(β o )
2 ˆ
Find β to minimize g (β) subject to GE β ≤ t1
3 While ˆ
|βj | > t ,
4 ˆ ˆ
add i to the set E where δi = sign(β). Find β to minimize
N
g (β) = (yi − βj xij )2
i=1 j
subject to GE β ≤ t1.
This procedure must always converge to in a finite number of steps since
one element is added to the set E at each step, and there is a total of 2p
elements.
Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
66. Least angle regression algorithm (Efron 2004 )
Least Angle Regression Algorithm
Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
67. Least angle regression algorithm (Efron 2004 )
Least Angle Regression Algorithm
1 Standardize the predictors to have mean zero and unit norm. Start
with the residual r = y − y , β1 , . . . , βp = 0.
¯
Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
68. Least angle regression algorithm (Efron 2004 )
Least Angle Regression Algorithm
1 Standardize the predictors to have mean zero and unit norm. Start
with the residual r = y − y , β1 , . . . , βp = 0.
¯
2 Find the predictor xj most correlated with r.
Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
69. Least angle regression algorithm (Efron 2004 )
Least Angle Regression Algorithm
1 Standardize the predictors to have mean zero and unit norm. Start
with the residual r = y − y , β1 , . . . , βp = 0.
¯
2 Find the predictor xj most correlated with r.
3 Move βj from 0 towards its least-squares coefficient (xj , r ), until some
other competitor xk has as much correlation with the current residual
as does xj .
Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
70. Least angle regression algorithm (Efron 2004 )
Least Angle Regression Algorithm
1 Standardize the predictors to have mean zero and unit norm. Start
with the residual r = y − y , β1 , . . . , βp = 0.
¯
2 Find the predictor xj most correlated with r.
3 Move βj from 0 towards its least-squares coefficient (xj , r ), until some
other competitor xk has as much correlation with the current residual
as does xj .
4 Move βj and βk in the direction defined by their joint least squares
coefficient of the current residual on (xj , xk ), until some other
competitor xl has as much correlation with the current residual.
Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
71. Least angle regression algorithm (Efron 2004 )
Least Angle Regression Algorithm
1 Standardize the predictors to have mean zero and unit norm. Start
with the residual r = y − y , β1 , . . . , βp = 0.
¯
2 Find the predictor xj most correlated with r.
3 Move βj from 0 towards its least-squares coefficient (xj , r ), until some
other competitor xk has as much correlation with the current residual
as does xj .
4 Move βj and βk in the direction defined by their joint least squares
coefficient of the current residual on (xj , xk ), until some other
competitor xl has as much correlation with the current residual.
5 If a non-zero coefficient hits zero, drop its variable from the active set
of variables and recompute the current joint least squares direction.
Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
72. Least angle regression algorithm (Efron 2004 )
Least Angle Regression Algorithm
1 Standardize the predictors to have mean zero and unit norm. Start
with the residual r = y − y , β1 , . . . , βp = 0.
¯
2 Find the predictor xj most correlated with r.
3 Move βj from 0 towards its least-squares coefficient (xj , r ), until some
other competitor xk has as much correlation with the current residual
as does xj .
4 Move βj and βk in the direction defined by their joint least squares
coefficient of the current residual on (xj , xk ), until some other
competitor xl has as much correlation with the current residual.
5 If a non-zero coefficient hits zero, drop its variable from the active set
of variables and recompute the current joint least squares direction.
6 Continue in this way until all p predictors have been entered. After
min(N-1, p) steps, we arrive at the full least-squares solution.
Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
73. Table of Contents
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 35 / 41
74. Simulation
In the example, 50 data sets consisting of 20 observations from the model
y = βT + σ
were simulated, where β = (3, 1.5, 0, 0, 2, 0, 0, 0)T and is standard
normal.
Presented by Ulcinaite Agne LASSO November 4, 2012 36 / 41
75. Simulation
In the example, 50 data sets consisting of 20 observations from the model
y = βT + σ
were simulated, where β = (3, 1.5, 0, 0, 2, 0, 0, 0)T and is standard
normal.
Mean-squared errors over 200 simulations from the model
Presented by Ulcinaite Agne LASSO November 4, 2012 36 / 41
76. Simulation
Most frequent models selected by Most frequent models selected by
LASSO subset regression
Presented by Ulcinaite Agne LASSO November 4, 2012 37 / 41
77. Table of Contents
1 Introduction
2 OLS estimates
OLS critics
Standard improving techniques
3 LASSO
Definition
Motivation for LASSO
Orthonormal design case
Function forms
Example of prostate cancer
Prediction error and estimation of t
4 Algorithm for finding LASSO solutions
5 Simulation
6 Conclusions
Presented by Ulcinaite Agne LASSO November 4, 2012 38 / 41
78. Conclusions
LASSO - a worthy competitor to subset selection and ridge regression.
Performance in different scenarios:
Small number of large effects - Subset selection does best, LASSO
- not quite as well, ridge regression - quite poorly.
Small to moderate number of moderate-size effects - LASSO
does best, followed by ridge regression and then subset selection.
Large number of small effects - Ridge regression does best,
followed by LASSO and then subset selection.
Presented by Ulcinaite Agne LASSO November 4, 2012 39 / 41
79. References
Robert Tibshirani (1996)
Regression Shrinkage and Selection via the LASSO
Journal of the Royal Statistical Society 58(1), 267–288.
Travor Hastie, Robert Tibshirani, Jerome Friedman (2008)
The Elements of Statistical Learning
Springer-Verlag, 57–73.
Abhimanyu Das, David Kempe
Algorithms for Subset Selection in Linear Regression
Yizao Wang (2007)
A Note on the LASSO in Model Selection
Presented by Ulcinaite Agne LASSO November 4, 2012 40 / 41