Regularization and variable selection via elastic net
Ch9 variable selection
1. 9 Variable Selection
Variable selection is fundamental in statistical modeling. Many methods have been developed
to select variables which significantly explain the response variable. Some of the methods
are backward, forward, or their combination so called stepwise, and best subset variable
selection. Recently, new approaches are introduced such as ridge regression, Least Absolute
Shrinkage and Selection Operator (LASSO), and their combination so called Elastic Net.
The other new variable selection method is Smoothly Clipped Absolute Deviation (SCAD).
This chapter describes the application of LASSO, Elastic Net, and SCAD in linear regression
model and logit model which are described in the Chapter 8.
9.1 LASSO
9.1.1 LASSO in linear regression model
We have already introduced a linear regression model such as in Chapter 3, Chapter 7 and
also in (3.50) as
y = X β + ε,
where y(n × 1) is the vector of observation for the response variable, X (n × p) is the data
matrix of the p explanatory variables and ε are the errors.
ˆ
Suppose E (y|X ) = X β and β = {β1 , . . . , βp }, the LASSO estimate β is defined by
n
2
ˆ
β = argmin y i − xi β
i=1
p
s.t. = |βj | ≤ t. (9.1)
j=1
9.1.2 LASSO in logit model
As described in Chapter 8 and (8.19), the logit model (with intercept) for binary response
is defined as p
p (xi )
log = β0 + βj xij ,
1 − p (xi ) j=1
and its log likelihood function is
n
log L(β0 , β) = [yi log p (xi ) + (1 − yi ) log{1 − p (xi )}] . (9.2)
i=1
Penalized log likelihood for logit model using LASSO is as follow
n p
1
max (β0 , β) − λ |βj | , (9.3)
β0 ,β n i=1 j=1