SlideShare a Scribd company logo
1 of 80
Download to read offline
READING SEMINAR ON CLASSICS




   Regression Shrinkage and Selection via the LASSO
                                 By Robert Tibshirani


                              Presented by Ulcinaite Agne


                                  November 4, 2012




Presented by Ulcinaite Agne              LASSO              November 4, 2012   1 / 41
Outline
1   Introduction




    Presented by Ulcinaite Agne   LASSO   November 4, 2012   2 / 41
Outline
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques




    Presented by Ulcinaite Agne   LASSO   November 4, 2012   2 / 41
Outline
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t




    Presented by Ulcinaite Agne     LASSO    November 4, 2012   2 / 41
Outline
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions




    Presented by Ulcinaite Agne     LASSO    November 4, 2012   2 / 41
Outline
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation


    Presented by Ulcinaite Agne     LASSO    November 4, 2012   2 / 41
Outline
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   2 / 41
Table of Contents
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   3 / 41
Introduction




The Article
     Regression Shrinkage and Selection via the LASSO by Robert
     Tibshirani
     Published in 1996 for the Royal Statistical Society. Series B
     (Methodological), vol. 58, No.1




  Presented by Ulcinaite Agne       LASSO                  November 4, 2012   4 / 41
Table of Contents
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   5 / 41
OLS estimates



We consider the usual regression situation.
The data: ( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )T and yi
are the regressors and the response for the ith observation.
The ordinary least square (OLS) estimates minimize the residual sum of
squares (RSS):
                                         N                  p
                                 RSS =         (yi − βo −         xij βj )2
                                         i=1                j=1




   Presented by Ulcinaite Agne                   LASSO                        November 4, 2012   6 / 41
Table of Contents
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   7 / 41
OLS critics




The two reasons why data analysts are often not satisfied with OLS
estimates:

     Prediction accuracy: OLS estimates having low bias but large variance




  Presented by Ulcinaite Agne      LASSO                 November 4, 2012   8 / 41
OLS critics




The two reasons why data analysts are often not satisfied with OLS
estimates:

     Prediction accuracy: OLS estimates having low bias but large variance
     Iterpretation: when having too much predictors, it would be better to
     have smaller subset exhibiting stronger effects




  Presented by Ulcinaite Agne      LASSO                  November 4, 2012   8 / 41
Table of Contents
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   9 / 41
Standard improving techniques


     Subset selection: small changes in data can result in very different
     models




  Presented by Ulcinaite Agne      LASSO                  November 4, 2012   10 / 41
Standard improving techniques


     Subset selection: small changes in data can result in very different
     models
     Ridge regression:
                                                                                     
                                                    N                                
                          ˆ
                          β ridge = argmin               (yi − β0 −       βj xij )2
                                                                                     
                                                 i=1                  j

     subject to
                                                         βj2 ≤ t
                                                 j

     Does not set any of the coefficients to 0 and hence does not give an
     easily interpretable model


  Presented by Ulcinaite Agne                LASSO                             November 4, 2012   10 / 41
Table of Contents
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   11 / 41
Table of Contents
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   12 / 41
Definition


We are considering the same data as in OLS estimation case:
( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )T




  Presented by Ulcinaite Agne       LASSO                   November 4, 2012   13 / 41
Definition


We are considering the same data as in OLS estimation case:
( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )T

The LASSO (Least Absolute Shrinkage and Selection Operator) estimate
 α ˆ
(ˆ , β) is defined by
                                                                              
                                            N                                 
                       α ˆ
                      (ˆ , β) = argmin             (yi − α −       βj xij )2
                                                                              
                                             i=1               j




  Presented by Ulcinaite Agne                 LASSO                            November 4, 2012   13 / 41
Definition


We are considering the same data as in OLS estimation case:
( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )T

The LASSO (Least Absolute Shrinkage and Selection Operator) estimate
 α ˆ
(ˆ , β) is defined by
                                                                              
                                            N                                 
                       α ˆ
                      (ˆ , β) = argmin             (yi − α −       βj xij )2
                                                                              
                                             i=1               j

subject to
                                              |βj | ≤ t
                                         j




  Presented by Ulcinaite Agne                 LASSO                            November 4, 2012   13 / 41
Definition



The amount of shrinkage is controlled by parameter t ≥ 0 which is applied
to the estimates.




  Presented by Ulcinaite Agne     LASSO                 November 4, 2012   14 / 41
Definition



The amount of shrinkage is controlled by parameter t ≥ 0 which is applied
to the estimates.
    ˆ                                                     ˆ
Let βjo be the full least square estimates and let t0 = |βjo |.
Values t < t0 will shrink the solutions towards 0, some coefficients making
equal to 0.




  Presented by Ulcinaite Agne     LASSO                 November 4, 2012   14 / 41
Definition



The amount of shrinkage is controlled by parameter t ≥ 0 which is applied
to the estimates.
    ˆ                                                     ˆ
Let βjo be the full least square estimates and let t0 = |βjo |.
Values t < t0 will shrink the solutions towards 0, some coefficients making
equal to 0.
For example, taking t = t0 /2, we will have the effect roughly similar to
finding the best subset of size p/2.




  Presented by Ulcinaite Agne      LASSO                  November 4, 2012   14 / 41
Table of Contents
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   15 / 41
Motivation for LASSO



LASSO came from the proposal of Breiman (1993).
Breiman’s non-negative garotte minimizes
                                N
                                      (yi − α −        cj βjo xij )2
                                i=1               j

subject to
                                      cj ≥ 0,        cj ≤ t.




  Presented by Ulcinaite Agne                LASSO                     November 4, 2012   16 / 41
Table of Contents
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   17 / 41
Orthonormal design case



Let X the n × p design matrix with ijth entry xij and XT X = I.
The solution of previous minimization problem is
                                ˆ         ˆ      ˆ
                                βj = sign(βjo )(|βjo | − γ)+




  Presented by Ulcinaite Agne              LASSO               November 4, 2012   18 / 41
Orthonormal design case



Let X the n × p design matrix with ijth entry xij and XT X = I.
The solution of previous minimization problem is
                                ˆ         ˆ      ˆ
                                βj = sign(βjo )(|βjo | − γ)+


     Best subset selection (of size k)




  Presented by Ulcinaite Agne              LASSO               November 4, 2012   18 / 41
Orthonormal design case



Let X the n × p design matrix with ijth entry xij and XT X = I.
The solution of previous minimization problem is
                                ˆ         ˆ      ˆ
                                βj = sign(βjo )(|βjo | − γ)+


     Best subset selection (of size k)
     Ridge regression solutions: 1 β oˆ
                                        1+γ j




  Presented by Ulcinaite Agne              LASSO               November 4, 2012   18 / 41
Orthonormal design case



Let X the n × p design matrix with ijth entry xij and XT X = I.
The solution of previous minimization problem is
                                ˆ         ˆ      ˆ
                                βj = sign(βjo )(|βjo | − γ)+


     Best subset selection (of size k)
     Ridge regression solutions: 1 β oˆ
                                       1+γ j
     Garotte estimates: (1 −          ˆ       ˆ
                                    γ/βjo2 )+ βjo




  Presented by Ulcinaite Agne              LASSO               November 4, 2012   18 / 41
Table of Contents
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   19 / 41
Function forms




(a) Subset regression, (b) ridge regression, (c) the LASSO, (d) the garrotte
  Presented by Ulcinaite Agne      LASSO                  November 4, 2012   20 / 41
Estimation picture for (a) the LASSO and (b) ridge regression


  Presented by Ulcinaite Agne     LASSO                 November 4, 2012   21 / 41
Table of Contents
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   22 / 41
Example of prostate cancer

Data examined: from a study by
Stamey (1989)
The factors:
     log(cancer volume) lcavol
     log(prostate weigth) lweigth
     age
     log(benign prostatic hyperplasia
     amount) lbph
     seminal vesicle invasion svi
     log(capsular penetration) lcp
     Gleason score gleason
     percentage Gleason scores pgg45


   Presented by Ulcinaite Agne       LASSO   November 4, 2012   23 / 41
Example of prostate cancer
                                        Linear model to log(prostate specific
Data examined: from a study by
                                        antigen) lpsa
Stamey (1989)
The factors:
     log(cancer volume) lcavol
     log(prostate weigth) lweigth
     age
     log(benign prostatic hyperplasia
     amount) lbph
     seminal vesicle invasion svi
     log(capsular penetration) lcp
     Gleason score gleason
     percentage Gleason scores pgg45


   Presented by Ulcinaite Agne       LASSO                 November 4, 2012   23 / 41
Statistics of the example




Estimated coefficients and test error results, for different subset and
shrinkage methods applied to the prostate data. The blank entries
correspond to variables omitted.

  Presented by Ulcinaite Agne      LASSO                 November 4, 2012   24 / 41
Table of Contents
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   25 / 41
Prediction error and estimation of t




Methods for the estimation of the LASSO parameter t:




  Presented by Ulcinaite Agne    LASSO                 November 4, 2012   26 / 41
Prediction error and estimation of t




Methods for the estimation of the LASSO parameter t:
     Cross-validation




  Presented by Ulcinaite Agne    LASSO                 November 4, 2012   26 / 41
Prediction error and estimation of t




Methods for the estimation of the LASSO parameter t:
     Cross-validation
     Generalized cross-validation




  Presented by Ulcinaite Agne       LASSO              November 4, 2012   26 / 41
Prediction error and estimation of t




Methods for the estimation of the LASSO parameter t:
     Cross-validation
     Generalized cross-validation
     Analytical unbiased estimate of risk




  Presented by Ulcinaite Agne       LASSO              November 4, 2012   26 / 41
Prediction error and estimation of t




Methods for the estimation of the LASSO parameter t:
     Cross-validation
     Generalized cross-validation
     Analytical unbiased estimate of risk

Strictly speaking the first two methods are applicable in the ’X-random’
case, and the third method applies to the X-fixed case.




  Presented by Ulcinaite Agne       LASSO               November 4, 2012   26 / 41
Prediction error and estimation of t



Suppose that
                                Y = η(X) + ε
where E (ε) = 0 and var (ε) = σ 2




  Presented by Ulcinaite Agne       LASSO      November 4, 2012   27 / 41
Prediction error and estimation of t



Suppose that
                                    Y = η(X) + ε
where E (ε) = 0 and var (ε) = σ 2

                                ME = E {ˆ(X) − η(X)}2
                                        η




  Presented by Ulcinaite Agne           LASSO           November 4, 2012   27 / 41
Prediction error and estimation of t



Suppose that
                                        Y = η(X) + ε
where E (ε) = 0 and var (ε) = σ 2

                                   ME = E {ˆ(X) − η(X)}2
                                           η


                                PE = E {Y − η (X)}2 = ME + σ 2
                                            ˆ




  Presented by Ulcinaite Agne                LASSO               November 4, 2012   27 / 41
Cross-validation



The Prediction Error (PE) is estimated by fivefold cross-validation. The
                                                                   ˆ
LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE
is estimated over a grid of values of s from 0 to 1 inclusive.




  Presented by Ulcinaite Agne     LASSO                November 4, 2012   28 / 41
Cross-validation



The Prediction Error (PE) is estimated by fivefold cross-validation. The
                                                                   ˆ
LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE
is estimated over a grid of values of s from 0 to 1 inclusive.
     Create a 5-fold partition of the dataset




  Presented by Ulcinaite Agne       LASSO              November 4, 2012   28 / 41
Cross-validation



The Prediction Error (PE) is estimated by fivefold cross-validation. The
                                                                   ˆ
LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE
is estimated over a grid of values of s from 0 to 1 inclusive.
     Create a 5-fold partition of the dataset
     For each fold, all-but-one of the chunks are used for training and the
     remaining chunk - for testing.




  Presented by Ulcinaite Agne       LASSO                 November 4, 2012   28 / 41
Cross-validation



The Prediction Error (PE) is estimated by fivefold cross-validation. The
                                                                   ˆ
LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE
is estimated over a grid of values of s from 0 to 1 inclusive.
     Create a 5-fold partition of the dataset
     For each fold, all-but-one of the chunks are used for training and the
     remaining chunk - for testing.
     Repeat 5 times so that each chunk is used once for testing.




  Presented by Ulcinaite Agne       LASSO                 November 4, 2012   28 / 41
Cross-validation



The Prediction Error (PE) is estimated by fivefold cross-validation. The
                                                                   ˆ
LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE
is estimated over a grid of values of s from 0 to 1 inclusive.
     Create a 5-fold partition of the dataset
     For each fold, all-but-one of the chunks are used for training and the
     remaining chunk - for testing.
     Repeat 5 times so that each chunk is used once for testing.
     Value s yielding the lowest estimated PE is selected.
           ˆ




  Presented by Ulcinaite Agne       LASSO                    November 4, 2012   28 / 41
Generalized Cross-validation

The constrained is re-written as   βj2 /|βj | ≤ t. So the constrained
         ˜
solution β can be expressed as the ridge regression estimator

                                β = (XT X + λW− )−1 XT y
                                ˜

where W = diag (|βj |) and W− denotes a generalized inverse. The number
                  ˜
of effective parameters in the constrained fit β may be approximated by

                            p(t) = tr X(XT X + λW− )−1 XT )

The generalised cross-validation style statistic

                                            1    RSS(t)
                                GCV (t) =
                                            N {1 − p(t)/N}2



  Presented by Ulcinaite Agne               LASSO             November 4, 2012   29 / 41
Unbiased estimate of risk
This method is based on Stein’s (1981) unbiased estimate of risk.
                                                     √
                                       ˆ
Denote the estimated standard error of βjo by τ = σ / N, where
                                              ˆ ˆ
σ 2 = (yi − yi )2 /(N − p). Then the formula is derived
ˆ             ˆ
                                                                    
                                               p                    
        ˆ                        ˆ τ
     R β(γ) ≈ τ 2 p − 2#(j; |βjo /ˆ| < γ) +
                 ˆ                                      ˆ τ
                                                   max(|βjo /ˆ|, γ)2
                                                                    
                                                            j=1

as an approximately unbiased estimate of the risk . Hence an estimate of
                                          ˆ
γ can be obtained as the minimizer of R β(γ) :

                                                 ˆ
                                γ = argminγ≥0 [R β(γ) ].
                                ˆ

From this we obtain an estimate of the LASSO parameter t:

                                   ˆ
                                   t=      ˆ
                                         (|βjo | − γ )+ .
                                                   ˆ

  Presented by Ulcinaite Agne             LASSO                   November 4, 2012   30 / 41
Table of Contents
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   31 / 41
Algorithm for finding LASSO solutions


We fix t ≥ 0. The minimization problem of
                                N
                                      (yi −       βj xij )2
                                i=1           j

subject to j |βj | ≤ t can be seen as a least squares problem with 2p
inequality constraints.




  Presented by Ulcinaite Agne            LASSO                November 4, 2012   32 / 41
Algorithm for finding LASSO solutions


We fix t ≥ 0. The minimization problem of
                                N
                                      (yi −       βj xij )2
                                i=1           j

subject to j |βj | ≤ t can be seen as a least squares problem with 2p
inequality constraints.
Denote G an m × p matrix, corresponding to m linear inequality
constraints of the p-vector β. For our problem, m = 2p .




  Presented by Ulcinaite Agne            LASSO                November 4, 2012   32 / 41
Algorithm for finding LASSO solutions


We fix t ≥ 0. The minimization problem of
                                N
                                      (yi −       βj xij )2
                                i=1           j

subject to j |βj | ≤ t can be seen as a least squares problem with 2p
inequality constraints.
Denote G an m × p matrix, corresponding to m linear inequality
constraints of the p-vector β. For our problem, m = 2p .
Denote g (β) = N (yi − j βj xij )2 .
                   i=1
Set E is the equality set corresponding to those constraints which are
exactly met.



  Presented by Ulcinaite Agne            LASSO                November 4, 2012   32 / 41
Algorithm for finding LASSO solutions

Outline of the algorithm




  Presented by Ulcinaite Agne   LASSO   November 4, 2012   33 / 41
Algorithm for finding LASSO solutions

Outline of the algorithm
  1                                         ˆ
      Start with E = {i0 } where δi0 = sign(β o )




  Presented by Ulcinaite Agne        LASSO          November 4, 2012   33 / 41
Algorithm for finding LASSO solutions

Outline of the algorithm
  1                                         ˆ
      Start with E = {i0 } where δi0 = sign(β o )
  2         ˆ
      Find β to minimize g (β) subject to GE β ≤ t1




  Presented by Ulcinaite Agne      LASSO              November 4, 2012   33 / 41
Algorithm for finding LASSO solutions

Outline of the algorithm
  1                                         ˆ
      Start with E = {i0 } where δi0 = sign(β o )
  2         ˆ
      Find β to minimize g (β) subject to GE β ≤ t1
  3   While             ˆ
                       |βj | > t ,




  Presented by Ulcinaite Agne        LASSO            November 4, 2012   33 / 41
Algorithm for finding LASSO solutions

Outline of the algorithm
  1                                         ˆ
      Start with E = {i0 } where δi0 = sign(β o )
  2         ˆ
      Find β to minimize g (β) subject to GE β ≤ t1
  3   While             ˆ
                       |βj | > t ,
  4                                      ˆ        ˆ
      add i to the set E where δi = sign(β). Find β to minimize
                                           N
                                 g (β) =         (yi −       βj xij )2
                                           i=1           j

      subject to GE β ≤ t1.




  Presented by Ulcinaite Agne              LASSO                         November 4, 2012   33 / 41
Algorithm for finding LASSO solutions

Outline of the algorithm
  1                                         ˆ
      Start with E = {i0 } where δi0 = sign(β o )
  2         ˆ
      Find β to minimize g (β) subject to GE β ≤ t1
  3   While             ˆ
                       |βj | > t ,
  4                                      ˆ        ˆ
      add i to the set E where δi = sign(β). Find β to minimize
                                           N
                                 g (β) =         (yi −       βj xij )2
                                           i=1           j

      subject to GE β ≤ t1.

This procedure must always converge to in a finite number of steps since
one element is added to the set E at each step, and there is a total of 2p
elements.
  Presented by Ulcinaite Agne              LASSO                         November 4, 2012   33 / 41
Least angle regression algorithm (Efron 2004 )

Least Angle Regression Algorithm




  Presented by Ulcinaite Agne   LASSO        November 4, 2012   34 / 41
Least angle regression algorithm (Efron 2004 )

Least Angle Regression Algorithm
  1   Standardize the predictors to have mean zero and unit norm. Start
      with the residual r = y − y , β1 , . . . , βp = 0.
                                ¯




  Presented by Ulcinaite Agne      LASSO                 November 4, 2012   34 / 41
Least angle regression algorithm (Efron 2004 )

Least Angle Regression Algorithm
  1   Standardize the predictors to have mean zero and unit norm. Start
      with the residual r = y − y , β1 , . . . , βp = 0.
                                ¯
  2   Find the predictor xj most correlated with r.




  Presented by Ulcinaite Agne        LASSO               November 4, 2012   34 / 41
Least angle regression algorithm (Efron 2004 )

Least Angle Regression Algorithm
  1   Standardize the predictors to have mean zero and unit norm. Start
      with the residual r = y − y , β1 , . . . , βp = 0.
                                ¯
  2   Find the predictor xj most correlated with r.
  3   Move βj from 0 towards its least-squares coefficient (xj , r ), until some
      other competitor xk has as much correlation with the current residual
      as does xj .




  Presented by Ulcinaite Agne        LASSO                  November 4, 2012   34 / 41
Least angle regression algorithm (Efron 2004 )

Least Angle Regression Algorithm
  1   Standardize the predictors to have mean zero and unit norm. Start
      with the residual r = y − y , β1 , . . . , βp = 0.
                                ¯
  2   Find the predictor xj most correlated with r.
  3   Move βj from 0 towards its least-squares coefficient (xj , r ), until some
      other competitor xk has as much correlation with the current residual
      as does xj .
  4   Move βj and βk in the direction defined by their joint least squares
      coefficient of the current residual on (xj , xk ), until some other
      competitor xl has as much correlation with the current residual.




  Presented by Ulcinaite Agne        LASSO                  November 4, 2012   34 / 41
Least angle regression algorithm (Efron 2004 )

Least Angle Regression Algorithm
  1   Standardize the predictors to have mean zero and unit norm. Start
      with the residual r = y − y , β1 , . . . , βp = 0.
                                ¯
  2   Find the predictor xj most correlated with r.
  3   Move βj from 0 towards its least-squares coefficient (xj , r ), until some
      other competitor xk has as much correlation with the current residual
      as does xj .
  4   Move βj and βk in the direction defined by their joint least squares
      coefficient of the current residual on (xj , xk ), until some other
      competitor xl has as much correlation with the current residual.
  5   If a non-zero coefficient hits zero, drop its variable from the active set
      of variables and recompute the current joint least squares direction.



  Presented by Ulcinaite Agne        LASSO                  November 4, 2012   34 / 41
Least angle regression algorithm (Efron 2004 )

Least Angle Regression Algorithm
  1   Standardize the predictors to have mean zero and unit norm. Start
      with the residual r = y − y , β1 , . . . , βp = 0.
                                ¯
  2   Find the predictor xj most correlated with r.
  3   Move βj from 0 towards its least-squares coefficient (xj , r ), until some
      other competitor xk has as much correlation with the current residual
      as does xj .
  4   Move βj and βk in the direction defined by their joint least squares
      coefficient of the current residual on (xj , xk ), until some other
      competitor xl has as much correlation with the current residual.
  5   If a non-zero coefficient hits zero, drop its variable from the active set
      of variables and recompute the current joint least squares direction.
  6   Continue in this way until all p predictors have been entered. After
      min(N-1, p) steps, we arrive at the full least-squares solution.

  Presented by Ulcinaite Agne        LASSO                  November 4, 2012   34 / 41
Table of Contents
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   35 / 41
Simulation

In the example, 50 data sets consisting of 20 observations from the model

                                y = βT + σ

were simulated, where β = (3, 1.5, 0, 0, 2, 0, 0, 0)T and   is standard
normal.




  Presented by Ulcinaite Agne       LASSO                   November 4, 2012   36 / 41
Simulation

In the example, 50 data sets consisting of 20 observations from the model

                                y = βT + σ

were simulated, where β = (3, 1.5, 0, 0, 2, 0, 0, 0)T and   is standard
normal.




           Mean-squared errors over 200 simulations from the model


  Presented by Ulcinaite Agne       LASSO                   November 4, 2012   36 / 41
Simulation



Most frequent models selected by      Most frequent models selected by
LASSO                                 subset regression




   Presented by Ulcinaite Agne     LASSO                November 4, 2012   37 / 41
Table of Contents
1   Introduction
2   OLS estimates
     OLS critics
     Standard improving techniques
3   LASSO
      Definition
      Motivation for LASSO
      Orthonormal design case
      Function forms
      Example of prostate cancer
      Prediction error and estimation of t
4   Algorithm for finding LASSO solutions
5   Simulation
6   Conclusions
    Presented by Ulcinaite Agne     LASSO    November 4, 2012   38 / 41
Conclusions



LASSO - a worthy competitor to subset selection and ridge regression.
Performance in different scenarios:
     Small number of large effects - Subset selection does best, LASSO
     - not quite as well, ridge regression - quite poorly.
     Small to moderate number of moderate-size effects - LASSO
     does best, followed by ridge regression and then subset selection.
     Large number of small effects - Ridge regression does best,
     followed by LASSO and then subset selection.




  Presented by Ulcinaite Agne        LASSO               November 4, 2012   39 / 41
References


    Robert Tibshirani (1996)
    Regression Shrinkage and Selection via the LASSO
    Journal of the Royal Statistical Society 58(1), 267–288.

    Travor Hastie, Robert Tibshirani, Jerome Friedman (2008)
    The Elements of Statistical Learning
    Springer-Verlag, 57–73.

    Abhimanyu Das, David Kempe
    Algorithms for Subset Selection in Linear Regression

    Yizao Wang (2007)
    A Note on the LASSO in Model Selection




  Presented by Ulcinaite Agne           LASSO                  November 4, 2012   40 / 41
The End




Presented by Ulcinaite Agne    LASSO    November 4, 2012   41 / 41

More Related Content

What's hot

What's hot (20)

Regression
RegressionRegression
Regression
 
Roc auc curve
Roc auc curveRoc auc curve
Roc auc curve
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
Reporting point biserial correlation in apa
Reporting point biserial correlation in apaReporting point biserial correlation in apa
Reporting point biserial correlation in apa
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
 
Linear models for data science
Linear models for data scienceLinear models for data science
Linear models for data science
 
Introduction to correlation and regression analysis
Introduction to correlation and regression analysisIntroduction to correlation and regression analysis
Introduction to correlation and regression analysis
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Principal Component Analysis and Clustering
Principal Component Analysis and ClusteringPrincipal Component Analysis and Clustering
Principal Component Analysis and Clustering
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 
Logistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerationsLogistic regression: topological and geometric considerations
Logistic regression: topological and geometric considerations
 
10 Years of Multi-Label Learning
10 Years of Multi-Label Learning10 Years of Multi-Label Learning
10 Years of Multi-Label Learning
 
Chap03 numerical descriptive measures
Chap03 numerical descriptive measuresChap03 numerical descriptive measures
Chap03 numerical descriptive measures
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
4.5. logistic regression
4.5. logistic regression4.5. logistic regression
4.5. logistic regression
 
Hessian Matrices in Statistics
Hessian Matrices in StatisticsHessian Matrices in Statistics
Hessian Matrices in Statistics
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Chap15 analysis of variance
Chap15 analysis of varianceChap15 analysis of variance
Chap15 analysis of variance
 
Introduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R StudioIntroduction to Data Analysis With R and R Studio
Introduction to Data Analysis With R and R Studio
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
 

Viewers also liked

Seminar on Robust Regression Methods
Seminar on Robust Regression MethodsSeminar on Robust Regression Methods
Seminar on Robust Regression Methods
Sumon Sdb
 
A_Study_on_the_Medieval_Kerala_School_of_Mathematics
A_Study_on_the_Medieval_Kerala_School_of_MathematicsA_Study_on_the_Medieval_Kerala_School_of_Mathematics
A_Study_on_the_Medieval_Kerala_School_of_Mathematics
Sumon Sdb
 
Gelfand and Smith (1990), read by
Gelfand and Smith (1990), read byGelfand and Smith (1990), read by
Gelfand and Smith (1990), read by
Christian Robert
 

Viewers also liked (20)

ISBA 2016: Foundations
ISBA 2016: FoundationsISBA 2016: Foundations
ISBA 2016: Foundations
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Lasso regression
Lasso regressionLasso regression
Lasso regression
 
Reading the Lindley-Smith 1973 paper on linear Bayes estimators
Reading the Lindley-Smith 1973 paper on linear Bayes estimatorsReading the Lindley-Smith 1973 paper on linear Bayes estimators
Reading the Lindley-Smith 1973 paper on linear Bayes estimators
 
Ridge regression, lasso and elastic net
Ridge regression, lasso and elastic netRidge regression, lasso and elastic net
Ridge regression, lasso and elastic net
 
Apprentissage automatique, Régression Ridge et LASSO
Apprentissage automatique, Régression Ridge et LASSOApprentissage automatique, Régression Ridge et LASSO
Apprentissage automatique, Régression Ridge et LASSO
 
Asymptotics for discrete random measures
Asymptotics for discrete random measuresAsymptotics for discrete random measures
Asymptotics for discrete random measures
 
Species sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian NonparametricsSpecies sampling models in Bayesian Nonparametrics
Species sampling models in Bayesian Nonparametrics
 
Presentation of Bassoum Abou on Stein's 1981 AoS paper
Presentation of Bassoum Abou on Stein's 1981 AoS paperPresentation of Bassoum Abou on Stein's 1981 AoS paper
Presentation of Bassoum Abou on Stein's 1981 AoS paper
 
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketingBayesian Nonparametrics, Applications to biology, ecology, and marketing
Bayesian Nonparametrics, Applications to biology, ecology, and marketing
 
Dependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian NonparametricsDependent processes in Bayesian Nonparametrics
Dependent processes in Bayesian Nonparametrics
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
Lasso
LassoLasso
Lasso
 
Seminar on Robust Regression Methods
Seminar on Robust Regression MethodsSeminar on Robust Regression Methods
Seminar on Robust Regression Methods
 
4thchannel conference poster_freedom_gumedze
4thchannel conference poster_freedom_gumedze4thchannel conference poster_freedom_gumedze
4thchannel conference poster_freedom_gumedze
 
Bayesian Classics
Bayesian ClassicsBayesian Classics
Bayesian Classics
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
A_Study_on_the_Medieval_Kerala_School_of_Mathematics
A_Study_on_the_Medieval_Kerala_School_of_MathematicsA_Study_on_the_Medieval_Kerala_School_of_Mathematics
A_Study_on_the_Medieval_Kerala_School_of_Mathematics
 
Gelfand and Smith (1990), read by
Gelfand and Smith (1990), read byGelfand and Smith (1990), read by
Gelfand and Smith (1990), read by
 

More from Christian Robert

More from Christian Robert (20)

Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
Asymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de FranceAsymptotics of ABC, lecture, Collège de France
Asymptotics of ABC, lecture, Collège de France
 
Workshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael MartinWorkshop in honour of Don Poskitt and Gael Martin
Workshop in honour of Don Poskitt and Gael Martin
 
discussion of ICML23.pdf
discussion of ICML23.pdfdiscussion of ICML23.pdf
discussion of ICML23.pdf
 
How many components in a mixture?
How many components in a mixture?How many components in a mixture?
How many components in a mixture?
 
restore.pdf
restore.pdfrestore.pdf
restore.pdf
 
Testing for mixtures at BNP 13
Testing for mixtures at BNP 13Testing for mixtures at BNP 13
Testing for mixtures at BNP 13
 
Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?Inferring the number of components: dream or reality?
Inferring the number of components: dream or reality?
 
CDT 22 slides.pdf
CDT 22 slides.pdfCDT 22 slides.pdf
CDT 22 slides.pdf
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
discussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihooddiscussion on Bayesian restricted likelihood
discussion on Bayesian restricted likelihood
 
NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)NCE, GANs & VAEs (and maybe BAC)
NCE, GANs & VAEs (and maybe BAC)
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
eugenics and statistics
eugenics and statisticseugenics and statistics
eugenics and statistics
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
asymptotics of ABC
asymptotics of ABCasymptotics of ABC
asymptotics of ABC
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Likelihood-free Design: a discussion
Likelihood-free Design: a discussionLikelihood-free Design: a discussion
Likelihood-free Design: a discussion
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Recently uploaded (20)

Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 

Reading the Lasso 1996 paper by Robert Tibshirani

  • 1. READING SEMINAR ON CLASSICS Regression Shrinkage and Selection via the LASSO By Robert Tibshirani Presented by Ulcinaite Agne November 4, 2012 Presented by Ulcinaite Agne LASSO November 4, 2012 1 / 41
  • 2. Outline 1 Introduction Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
  • 3. Outline 1 Introduction 2 OLS estimates OLS critics Standard improving techniques Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
  • 4. Outline 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
  • 5. Outline 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
  • 6. Outline 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
  • 7. Outline 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 2 / 41
  • 8. Table of Contents 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 3 / 41
  • 9. Introduction The Article Regression Shrinkage and Selection via the LASSO by Robert Tibshirani Published in 1996 for the Royal Statistical Society. Series B (Methodological), vol. 58, No.1 Presented by Ulcinaite Agne LASSO November 4, 2012 4 / 41
  • 10. Table of Contents 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 5 / 41
  • 11. OLS estimates We consider the usual regression situation. The data: ( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )T and yi are the regressors and the response for the ith observation. The ordinary least square (OLS) estimates minimize the residual sum of squares (RSS): N p RSS = (yi − βo − xij βj )2 i=1 j=1 Presented by Ulcinaite Agne LASSO November 4, 2012 6 / 41
  • 12. Table of Contents 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 7 / 41
  • 13. OLS critics The two reasons why data analysts are often not satisfied with OLS estimates: Prediction accuracy: OLS estimates having low bias but large variance Presented by Ulcinaite Agne LASSO November 4, 2012 8 / 41
  • 14. OLS critics The two reasons why data analysts are often not satisfied with OLS estimates: Prediction accuracy: OLS estimates having low bias but large variance Iterpretation: when having too much predictors, it would be better to have smaller subset exhibiting stronger effects Presented by Ulcinaite Agne LASSO November 4, 2012 8 / 41
  • 15. Table of Contents 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 9 / 41
  • 16. Standard improving techniques Subset selection: small changes in data can result in very different models Presented by Ulcinaite Agne LASSO November 4, 2012 10 / 41
  • 17. Standard improving techniques Subset selection: small changes in data can result in very different models Ridge regression:    N  ˆ β ridge = argmin (yi − β0 − βj xij )2   i=1 j subject to βj2 ≤ t j Does not set any of the coefficients to 0 and hence does not give an easily interpretable model Presented by Ulcinaite Agne LASSO November 4, 2012 10 / 41
  • 18. Table of Contents 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 11 / 41
  • 19. Table of Contents 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 12 / 41
  • 20. Definition We are considering the same data as in OLS estimation case: ( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )T Presented by Ulcinaite Agne LASSO November 4, 2012 13 / 41
  • 21. Definition We are considering the same data as in OLS estimation case: ( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )T The LASSO (Least Absolute Shrinkage and Selection Operator) estimate α ˆ (ˆ , β) is defined by    N  α ˆ (ˆ , β) = argmin (yi − α − βj xij )2   i=1 j Presented by Ulcinaite Agne LASSO November 4, 2012 13 / 41
  • 22. Definition We are considering the same data as in OLS estimation case: ( xi , y i ), i = 1, . . . , N, where xi = (xi1 , . . . , xip )T The LASSO (Least Absolute Shrinkage and Selection Operator) estimate α ˆ (ˆ , β) is defined by    N  α ˆ (ˆ , β) = argmin (yi − α − βj xij )2   i=1 j subject to |βj | ≤ t j Presented by Ulcinaite Agne LASSO November 4, 2012 13 / 41
  • 23. Definition The amount of shrinkage is controlled by parameter t ≥ 0 which is applied to the estimates. Presented by Ulcinaite Agne LASSO November 4, 2012 14 / 41
  • 24. Definition The amount of shrinkage is controlled by parameter t ≥ 0 which is applied to the estimates. ˆ ˆ Let βjo be the full least square estimates and let t0 = |βjo |. Values t < t0 will shrink the solutions towards 0, some coefficients making equal to 0. Presented by Ulcinaite Agne LASSO November 4, 2012 14 / 41
  • 25. Definition The amount of shrinkage is controlled by parameter t ≥ 0 which is applied to the estimates. ˆ ˆ Let βjo be the full least square estimates and let t0 = |βjo |. Values t < t0 will shrink the solutions towards 0, some coefficients making equal to 0. For example, taking t = t0 /2, we will have the effect roughly similar to finding the best subset of size p/2. Presented by Ulcinaite Agne LASSO November 4, 2012 14 / 41
  • 26. Table of Contents 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 15 / 41
  • 27. Motivation for LASSO LASSO came from the proposal of Breiman (1993). Breiman’s non-negative garotte minimizes N (yi − α − cj βjo xij )2 i=1 j subject to cj ≥ 0, cj ≤ t. Presented by Ulcinaite Agne LASSO November 4, 2012 16 / 41
  • 28. Table of Contents 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 17 / 41
  • 29. Orthonormal design case Let X the n × p design matrix with ijth entry xij and XT X = I. The solution of previous minimization problem is ˆ ˆ ˆ βj = sign(βjo )(|βjo | − γ)+ Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
  • 30. Orthonormal design case Let X the n × p design matrix with ijth entry xij and XT X = I. The solution of previous minimization problem is ˆ ˆ ˆ βj = sign(βjo )(|βjo | − γ)+ Best subset selection (of size k) Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
  • 31. Orthonormal design case Let X the n × p design matrix with ijth entry xij and XT X = I. The solution of previous minimization problem is ˆ ˆ ˆ βj = sign(βjo )(|βjo | − γ)+ Best subset selection (of size k) Ridge regression solutions: 1 β oˆ 1+γ j Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
  • 32. Orthonormal design case Let X the n × p design matrix with ijth entry xij and XT X = I. The solution of previous minimization problem is ˆ ˆ ˆ βj = sign(βjo )(|βjo | − γ)+ Best subset selection (of size k) Ridge regression solutions: 1 β oˆ 1+γ j Garotte estimates: (1 − ˆ ˆ γ/βjo2 )+ βjo Presented by Ulcinaite Agne LASSO November 4, 2012 18 / 41
  • 33. Table of Contents 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 19 / 41
  • 34. Function forms (a) Subset regression, (b) ridge regression, (c) the LASSO, (d) the garrotte Presented by Ulcinaite Agne LASSO November 4, 2012 20 / 41
  • 35. Estimation picture for (a) the LASSO and (b) ridge regression Presented by Ulcinaite Agne LASSO November 4, 2012 21 / 41
  • 36. Table of Contents 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 22 / 41
  • 37. Example of prostate cancer Data examined: from a study by Stamey (1989) The factors: log(cancer volume) lcavol log(prostate weigth) lweigth age log(benign prostatic hyperplasia amount) lbph seminal vesicle invasion svi log(capsular penetration) lcp Gleason score gleason percentage Gleason scores pgg45 Presented by Ulcinaite Agne LASSO November 4, 2012 23 / 41
  • 38. Example of prostate cancer Linear model to log(prostate specific Data examined: from a study by antigen) lpsa Stamey (1989) The factors: log(cancer volume) lcavol log(prostate weigth) lweigth age log(benign prostatic hyperplasia amount) lbph seminal vesicle invasion svi log(capsular penetration) lcp Gleason score gleason percentage Gleason scores pgg45 Presented by Ulcinaite Agne LASSO November 4, 2012 23 / 41
  • 39. Statistics of the example Estimated coefficients and test error results, for different subset and shrinkage methods applied to the prostate data. The blank entries correspond to variables omitted. Presented by Ulcinaite Agne LASSO November 4, 2012 24 / 41
  • 40. Table of Contents 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 25 / 41
  • 41. Prediction error and estimation of t Methods for the estimation of the LASSO parameter t: Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
  • 42. Prediction error and estimation of t Methods for the estimation of the LASSO parameter t: Cross-validation Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
  • 43. Prediction error and estimation of t Methods for the estimation of the LASSO parameter t: Cross-validation Generalized cross-validation Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
  • 44. Prediction error and estimation of t Methods for the estimation of the LASSO parameter t: Cross-validation Generalized cross-validation Analytical unbiased estimate of risk Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
  • 45. Prediction error and estimation of t Methods for the estimation of the LASSO parameter t: Cross-validation Generalized cross-validation Analytical unbiased estimate of risk Strictly speaking the first two methods are applicable in the ’X-random’ case, and the third method applies to the X-fixed case. Presented by Ulcinaite Agne LASSO November 4, 2012 26 / 41
  • 46. Prediction error and estimation of t Suppose that Y = η(X) + ε where E (ε) = 0 and var (ε) = σ 2 Presented by Ulcinaite Agne LASSO November 4, 2012 27 / 41
  • 47. Prediction error and estimation of t Suppose that Y = η(X) + ε where E (ε) = 0 and var (ε) = σ 2 ME = E {ˆ(X) − η(X)}2 η Presented by Ulcinaite Agne LASSO November 4, 2012 27 / 41
  • 48. Prediction error and estimation of t Suppose that Y = η(X) + ε where E (ε) = 0 and var (ε) = σ 2 ME = E {ˆ(X) − η(X)}2 η PE = E {Y − η (X)}2 = ME + σ 2 ˆ Presented by Ulcinaite Agne LASSO November 4, 2012 27 / 41
  • 49. Cross-validation The Prediction Error (PE) is estimated by fivefold cross-validation. The ˆ LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE is estimated over a grid of values of s from 0 to 1 inclusive. Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
  • 50. Cross-validation The Prediction Error (PE) is estimated by fivefold cross-validation. The ˆ LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE is estimated over a grid of values of s from 0 to 1 inclusive. Create a 5-fold partition of the dataset Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
  • 51. Cross-validation The Prediction Error (PE) is estimated by fivefold cross-validation. The ˆ LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE is estimated over a grid of values of s from 0 to 1 inclusive. Create a 5-fold partition of the dataset For each fold, all-but-one of the chunks are used for training and the remaining chunk - for testing. Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
  • 52. Cross-validation The Prediction Error (PE) is estimated by fivefold cross-validation. The ˆ LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE is estimated over a grid of values of s from 0 to 1 inclusive. Create a 5-fold partition of the dataset For each fold, all-but-one of the chunks are used for training and the remaining chunk - for testing. Repeat 5 times so that each chunk is used once for testing. Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
  • 53. Cross-validation The Prediction Error (PE) is estimated by fivefold cross-validation. The ˆ LASSO is indexed in terms of the normalised parameter s = t/ βjo , PE is estimated over a grid of values of s from 0 to 1 inclusive. Create a 5-fold partition of the dataset For each fold, all-but-one of the chunks are used for training and the remaining chunk - for testing. Repeat 5 times so that each chunk is used once for testing. Value s yielding the lowest estimated PE is selected. ˆ Presented by Ulcinaite Agne LASSO November 4, 2012 28 / 41
  • 54. Generalized Cross-validation The constrained is re-written as βj2 /|βj | ≤ t. So the constrained ˜ solution β can be expressed as the ridge regression estimator β = (XT X + λW− )−1 XT y ˜ where W = diag (|βj |) and W− denotes a generalized inverse. The number ˜ of effective parameters in the constrained fit β may be approximated by p(t) = tr X(XT X + λW− )−1 XT ) The generalised cross-validation style statistic 1 RSS(t) GCV (t) = N {1 − p(t)/N}2 Presented by Ulcinaite Agne LASSO November 4, 2012 29 / 41
  • 55. Unbiased estimate of risk This method is based on Stein’s (1981) unbiased estimate of risk. √ ˆ Denote the estimated standard error of βjo by τ = σ / N, where ˆ ˆ σ 2 = (yi − yi )2 /(N − p). Then the formula is derived ˆ ˆ    p  ˆ ˆ τ R β(γ) ≈ τ 2 p − 2#(j; |βjo /ˆ| < γ) + ˆ ˆ τ max(|βjo /ˆ|, γ)2   j=1 as an approximately unbiased estimate of the risk . Hence an estimate of ˆ γ can be obtained as the minimizer of R β(γ) : ˆ γ = argminγ≥0 [R β(γ) ]. ˆ From this we obtain an estimate of the LASSO parameter t: ˆ t= ˆ (|βjo | − γ )+ . ˆ Presented by Ulcinaite Agne LASSO November 4, 2012 30 / 41
  • 56. Table of Contents 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 31 / 41
  • 57. Algorithm for finding LASSO solutions We fix t ≥ 0. The minimization problem of N (yi − βj xij )2 i=1 j subject to j |βj | ≤ t can be seen as a least squares problem with 2p inequality constraints. Presented by Ulcinaite Agne LASSO November 4, 2012 32 / 41
  • 58. Algorithm for finding LASSO solutions We fix t ≥ 0. The minimization problem of N (yi − βj xij )2 i=1 j subject to j |βj | ≤ t can be seen as a least squares problem with 2p inequality constraints. Denote G an m × p matrix, corresponding to m linear inequality constraints of the p-vector β. For our problem, m = 2p . Presented by Ulcinaite Agne LASSO November 4, 2012 32 / 41
  • 59. Algorithm for finding LASSO solutions We fix t ≥ 0. The minimization problem of N (yi − βj xij )2 i=1 j subject to j |βj | ≤ t can be seen as a least squares problem with 2p inequality constraints. Denote G an m × p matrix, corresponding to m linear inequality constraints of the p-vector β. For our problem, m = 2p . Denote g (β) = N (yi − j βj xij )2 . i=1 Set E is the equality set corresponding to those constraints which are exactly met. Presented by Ulcinaite Agne LASSO November 4, 2012 32 / 41
  • 60. Algorithm for finding LASSO solutions Outline of the algorithm Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
  • 61. Algorithm for finding LASSO solutions Outline of the algorithm 1 ˆ Start with E = {i0 } where δi0 = sign(β o ) Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
  • 62. Algorithm for finding LASSO solutions Outline of the algorithm 1 ˆ Start with E = {i0 } where δi0 = sign(β o ) 2 ˆ Find β to minimize g (β) subject to GE β ≤ t1 Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
  • 63. Algorithm for finding LASSO solutions Outline of the algorithm 1 ˆ Start with E = {i0 } where δi0 = sign(β o ) 2 ˆ Find β to minimize g (β) subject to GE β ≤ t1 3 While ˆ |βj | > t , Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
  • 64. Algorithm for finding LASSO solutions Outline of the algorithm 1 ˆ Start with E = {i0 } where δi0 = sign(β o ) 2 ˆ Find β to minimize g (β) subject to GE β ≤ t1 3 While ˆ |βj | > t , 4 ˆ ˆ add i to the set E where δi = sign(β). Find β to minimize N g (β) = (yi − βj xij )2 i=1 j subject to GE β ≤ t1. Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
  • 65. Algorithm for finding LASSO solutions Outline of the algorithm 1 ˆ Start with E = {i0 } where δi0 = sign(β o ) 2 ˆ Find β to minimize g (β) subject to GE β ≤ t1 3 While ˆ |βj | > t , 4 ˆ ˆ add i to the set E where δi = sign(β). Find β to minimize N g (β) = (yi − βj xij )2 i=1 j subject to GE β ≤ t1. This procedure must always converge to in a finite number of steps since one element is added to the set E at each step, and there is a total of 2p elements. Presented by Ulcinaite Agne LASSO November 4, 2012 33 / 41
  • 66. Least angle regression algorithm (Efron 2004 ) Least Angle Regression Algorithm Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
  • 67. Least angle regression algorithm (Efron 2004 ) Least Angle Regression Algorithm 1 Standardize the predictors to have mean zero and unit norm. Start with the residual r = y − y , β1 , . . . , βp = 0. ¯ Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
  • 68. Least angle regression algorithm (Efron 2004 ) Least Angle Regression Algorithm 1 Standardize the predictors to have mean zero and unit norm. Start with the residual r = y − y , β1 , . . . , βp = 0. ¯ 2 Find the predictor xj most correlated with r. Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
  • 69. Least angle regression algorithm (Efron 2004 ) Least Angle Regression Algorithm 1 Standardize the predictors to have mean zero and unit norm. Start with the residual r = y − y , β1 , . . . , βp = 0. ¯ 2 Find the predictor xj most correlated with r. 3 Move βj from 0 towards its least-squares coefficient (xj , r ), until some other competitor xk has as much correlation with the current residual as does xj . Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
  • 70. Least angle regression algorithm (Efron 2004 ) Least Angle Regression Algorithm 1 Standardize the predictors to have mean zero and unit norm. Start with the residual r = y − y , β1 , . . . , βp = 0. ¯ 2 Find the predictor xj most correlated with r. 3 Move βj from 0 towards its least-squares coefficient (xj , r ), until some other competitor xk has as much correlation with the current residual as does xj . 4 Move βj and βk in the direction defined by their joint least squares coefficient of the current residual on (xj , xk ), until some other competitor xl has as much correlation with the current residual. Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
  • 71. Least angle regression algorithm (Efron 2004 ) Least Angle Regression Algorithm 1 Standardize the predictors to have mean zero and unit norm. Start with the residual r = y − y , β1 , . . . , βp = 0. ¯ 2 Find the predictor xj most correlated with r. 3 Move βj from 0 towards its least-squares coefficient (xj , r ), until some other competitor xk has as much correlation with the current residual as does xj . 4 Move βj and βk in the direction defined by their joint least squares coefficient of the current residual on (xj , xk ), until some other competitor xl has as much correlation with the current residual. 5 If a non-zero coefficient hits zero, drop its variable from the active set of variables and recompute the current joint least squares direction. Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
  • 72. Least angle regression algorithm (Efron 2004 ) Least Angle Regression Algorithm 1 Standardize the predictors to have mean zero and unit norm. Start with the residual r = y − y , β1 , . . . , βp = 0. ¯ 2 Find the predictor xj most correlated with r. 3 Move βj from 0 towards its least-squares coefficient (xj , r ), until some other competitor xk has as much correlation with the current residual as does xj . 4 Move βj and βk in the direction defined by their joint least squares coefficient of the current residual on (xj , xk ), until some other competitor xl has as much correlation with the current residual. 5 If a non-zero coefficient hits zero, drop its variable from the active set of variables and recompute the current joint least squares direction. 6 Continue in this way until all p predictors have been entered. After min(N-1, p) steps, we arrive at the full least-squares solution. Presented by Ulcinaite Agne LASSO November 4, 2012 34 / 41
  • 73. Table of Contents 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 35 / 41
  • 74. Simulation In the example, 50 data sets consisting of 20 observations from the model y = βT + σ were simulated, where β = (3, 1.5, 0, 0, 2, 0, 0, 0)T and is standard normal. Presented by Ulcinaite Agne LASSO November 4, 2012 36 / 41
  • 75. Simulation In the example, 50 data sets consisting of 20 observations from the model y = βT + σ were simulated, where β = (3, 1.5, 0, 0, 2, 0, 0, 0)T and is standard normal. Mean-squared errors over 200 simulations from the model Presented by Ulcinaite Agne LASSO November 4, 2012 36 / 41
  • 76. Simulation Most frequent models selected by Most frequent models selected by LASSO subset regression Presented by Ulcinaite Agne LASSO November 4, 2012 37 / 41
  • 77. Table of Contents 1 Introduction 2 OLS estimates OLS critics Standard improving techniques 3 LASSO Definition Motivation for LASSO Orthonormal design case Function forms Example of prostate cancer Prediction error and estimation of t 4 Algorithm for finding LASSO solutions 5 Simulation 6 Conclusions Presented by Ulcinaite Agne LASSO November 4, 2012 38 / 41
  • 78. Conclusions LASSO - a worthy competitor to subset selection and ridge regression. Performance in different scenarios: Small number of large effects - Subset selection does best, LASSO - not quite as well, ridge regression - quite poorly. Small to moderate number of moderate-size effects - LASSO does best, followed by ridge regression and then subset selection. Large number of small effects - Ridge regression does best, followed by LASSO and then subset selection. Presented by Ulcinaite Agne LASSO November 4, 2012 39 / 41
  • 79. References Robert Tibshirani (1996) Regression Shrinkage and Selection via the LASSO Journal of the Royal Statistical Society 58(1), 267–288. Travor Hastie, Robert Tibshirani, Jerome Friedman (2008) The Elements of Statistical Learning Springer-Verlag, 57–73. Abhimanyu Das, David Kempe Algorithms for Subset Selection in Linear Regression Yizao Wang (2007) A Note on the LASSO in Model Selection Presented by Ulcinaite Agne LASSO November 4, 2012 40 / 41
  • 80. The End Presented by Ulcinaite Agne LASSO November 4, 2012 41 / 41